Hi,
Here at CU we work with corpora of text to train models that 'understand'
language (see, e.g.,
LSA.colorado.edu). We wanted to use Wikipedia to
create a copyright-free corpus of text that anyone in the scientific
community could use. To do that we downloaded the DB dumps a while ago
( about 2 billion words), but due to a computer problem, we lost them.
I have noticed that the link to the full english database (2280MB):
http://download.wikipedia.org/archives/en/20031125_old_table.sql.bz2
doesn't work anymore; it returns a Forbidden error, says that
you don't have permission to access
/archives/en/20031125_old_table.sql.bz2 on this server
Could you please grant us access to the file?
Thanks a lot in advance,
-Jose
--
Jose Quesada, PhD.
quesadaj(a)psych.colorado.edu Research associate
http://lsa.colorado.edu/~quesadaj Institute of Cognitive Science
University of Colorado (Boulder)
Muenzinger psychology building Phone:303 492 1522
office D447A Fax: 303 492 7177
Campus Box 344
University of Colorado at Boulder
Boulder, CO 80309-0344