Nathan Wong wrote:
Magnus Manske wrote:
> The DVD contains de.wikipedia in a special format
for the DirectMedia
> Reader (Windows, Linux, Mac?), XML, and several PDA formats
> (TomeRaider), AFAIK. It also is filled up with images.
I think that that's a good format, which accounts
for most circumstances
imaginable; it's unlikely then the the english wikipedia would be able
to fit on in this format then - being about twice the size of the German
one.
The English current text is still just under CD size as one big SQL dump
bzipped. Presumably some sort of inelegant hackery could be performed to
cram it all onto a CD of gzipped/bzipped articles and a viewer app for
several OSes (since modern web browsers will accept gzipped HTTP but won't
gunzip files by default).
Are there compression algorithms good for text that go smaller than bzip2
and are fast to uncompress even though they may be slow to compress?
- d.