Kevin Carillo wrote:
20050421_old_table.sql.gz ----> around 31
gigabytes
and
20050421_old_table.sql ----> 34 201 362 bytes (compression factor of around
1.1)
That's entirely normal, as stored text in the old table is usually
compressed.
In the current tables, there are three possible states for a row in the
old table.
(default): uncompressed single item. You probably won't find many of
these in the Wikipedia dumps.
gzip: An individual text revision compressed with PHP's gzdeflate()
function, to be uncompressed with PHP's gzinflate() function. These wrap
zlib functions with some specific settings. If you for some reason don't
want to use MediaWiki or PHP to retrieve data from the dump, see Erik
Zachte's stats script for example Perl code.
object: A serialized PHP object which either contains multiple revisions
of a page blobbed and compressed together, or references a particular
row in which this revision can be found blobbed and compressed with
others. This provides a better overall compression ratio in the database
than individual compression. See includes/HistoryBlob.php
gzip and object rows are indicated by the presence of those flags in the
old_flags field.
-- brion vibber (brion @
pobox.com)