[Foundation-l] Wolf Mountain MediaWiki Appliances Released

Gregory Maxwell gmaxwell at gmail.com
Thu Jul 27 01:19:24 UTC 2006


On 7/26/06, Domas Mituzas <midom.lists at gmail.com> wrote:
[snip]
> > With all revision pages its around 3 TB total.
>
> That really requires advanced tech. At Wikipedia revision pages are
> compressed, and a proper compression run contracts whole dataset into
> 0.5T or so (or less).

A minor nit.. with braindead stupid compression (toasted columns in
PGsql which use a modified LZ algo which gets less compression than
gzip -3 but is much faster and compresses a single row at a time) you
can get the whole of english wikipedia into 0.4TB including the needed
indexes and the (not insubstantial) DB overhead.

With state of the art compression (lzma) you can get all the revisions
into 6gb, but you lose random access.  At wikimania tech days I'll be
presenting a system which achieves similar compression perform ace but
preserves random access... Which is at least a mildly interesting
subject, although perhaps without practical implications for wikimedia
until the disk/cpu performance gap widens a bit further. :)



More information about the foundation-l mailing list