Jeffrey V. Merkey wrote:
The default in 1.9.x is:
$wgLegalTitleChars =
" %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\\x80-\\xFF+";
This includes all multibyte characters due to the \x80-\xFF range
near the
end, including your example "²". The value used on Wikimedia is
identical
to the default except for the order of characters in the class:
$wgLegalTitleChars =
"+ %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\\x80-\\xFF";
Did you perhaps accidentally remove the \x80-\xFF range at some stage?
No, I did not remove it. I am re-running importDump with the debug
logging enabled and a debugger. It appears the
problem is more involved than previously reported, which is why I
delayed on updating the
fix to the data dumps page on meta. I am re-reunning the program to
debug further, modifying the title chars fixed one title
only for another to crash further down in the dump. It takes several
hours to get to the point in the dump I am seeing the corruption and
error,
Should crash in another 30 minutes or so again so I can post morten it
again.
Jeff
Confirmed precise location of the failure. The number on the left
hnd side is the article number.
2698244:Dog adenovirus
2698245:David A. Caputo
2698246:Famous Detective Conan (Case Closed
2698247:William Hughes Mulligan
THIS TITLE PRODUCES THE IMPORT DUMP FAILURE.
2698248:Wikipedia:Articles for deletion/Wikipedia:Articles for
deletion/Wikipedia:Articles for deletion/Wikipedia:Articles for
deletion/Wikipedia:Articles for deletion/Wikipedia:Articles for
deletion/Wikipedia:Articles for deletion/Greatest Hits Volume One (The
Byrds)
2698249:Image:Tripitaka storage2.jpg
2698250:Natalie Golda
2698251:Image:Big Passage outside Ampleforth College Library.jpg
2698252:Canine infectious hepatitis
2698253:P²-irreducible
2698254:Image:Zebra sideview.jpg
2698255:Amy Freed
Jeff