Andy Sisson wrote:
It occurred to me that the 500k+ english language
article topics would
by themselves be a fairly broad index of the vocabulary and zeitgeist
of the english speaking world - which would, in turn, potentially be
quite handy as word list for constructing US-style crosswords and the
like.
While I can presumably generate such a list from the backup sql dumps,
that's obviously the hamfisted, bandwidth-sucking way to do it.
Particularly since I'd actually prefer a simple, big tar'd and gz'd
text file.
Lucky for you, we've got just such a file available:
http://download.wikimedia.org/wikipedia/en/all_titles_in_ns0.gz
A link probably should be added to the dumps download page.
-- brion vibber (brion @
pobox.com)