Sorry, now correctly cross posted.
Emmanuel
-------- Original Message --------
Subject: WMF XML dump title case problem
Date: Sun, 26 Jun 2011 17:07:19 +0200
From: Emmanuel Engelhart <emmanuel(a)engelhart.org>
To: Mailing list for Wikimedia CH <wikimediach-l(a)lists.wikimedia.org>rg>,
offline-l(a)lists.wikimedia.org
Hi
Titles should be stored in the table "page" with a first letter uppercased.
http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_rest…
Unfortunately, it seems that we have XML dumps (and consequently
mwdumper generated SQL) containing titles with a first letter lowercased.
For example:
$wget
http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-p…
$bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep
"<title>"| grep tationery | more
<title>stationery</title>
<title>stationery shop</title>
Is that a bug?
Regards
Emmanuel