Gregory Maxwell said:
"Is there a useful way to subset
wikipedia?". I
believe that there are many useful ways, but I believe that all of the
best ways require us to have some idea of the level of notability of
each article.
If you can define it algorithmically, have at it! If not, hauling
perfectly verifiable, NPOV articles before VfD and saying "does not
establish notability" may be one way of doing it, but you may not get a
particularly accurate answer because the threat of execution tends to
force opinions. Perhaps a notifiability project might be more popular, as
long as notifiability is divorced from deletability (if not, the
notifiability project would acquire a rather bad reputation).
We're more limited by distribution media, DVD gives us 4.5 gigs. Of
course we could span multiple disks, but then the user will likely have
to copy it all onto the computer. The more disks we span the
greater the cost at reproducing the material.
I believe the DVD versions that have been mooted tend to stick to the
opening section, and omit the detail. With categories, it's possible to
select interesting categories, and there are various other projects in
progress to subset the data. They adopt a "build up" approach rather than
"start with the entire database and delete stuff item by item".