Nick Jenkins wrote:
> 29,000 pages (113.361/sec), 29,000 revs
(113.361/sec)
> ERROR 1062 at line 459: Duplicate entry '0-1_E0_m?' for key 1
Looking in the database now, I see three pages with similar titles in
that range:
id ns title
35982 0 1_E0_m
36017 0 1_E0_m²
36019 0 1_E0_m³
None of them should conflict, being quite distinct, which makes me
suspect garbled input or output, or a garbled index configuration on MySQL.
I can confirm that I can import the first 50k pages or so of this dump
without the reported problem ocurring. I'll run the rest when it's done
downloading.
* Ubuntu Linux (Breezy Badger, x86)
* en_US.UTF-8 locale
* MySQL 4.0.24
* table definitions from MediaWiki 1.4.11
* mwdumper current CVS (shouldn't be any different in this regard from
the last uploaded snapshot)
* Sun J2SE 1.5.0_05-b05
On some quick testing it looks like there are some encoding problems if
UTF-8 isn't the locale charset; I'll try and get those worked out.
In the meantime, try setting LANG=en_US.UTF-8 and rerunning it.
-- brion vibber (brion @