>/ The 10,218,632 number includes redirects, if
that helps. There must
/> >/ be more than 2.6 million pages by now, as just
articles accounts for
/> >/ 2 million, so my guess is 5,654,236 is the number of non-redirect
/> >/ pages.
/>
/> Oddly, with my (slightly-modified) mwdumper, I get the exact number of
rows that I expect, with the latest enwiki input. It
takes about 30
minutes to import the whole thing from a cold "drop if exists" on the
relevant tables and import it.
Perhaps Matt is running out of some resource on his machine? MySQL
limit? RAM limit? Something else?
/
The server 8 gigs of RAM, so I don't think that can be it.
I was actually mistaken on the 10 million rows - that's for pages-meta-current.xml,
but I'm inserting pages-articles.xml. The number of rows that is supposed to have is
5,654,236, which is the same number mwdumper says it inserted.
But the actual MySQL text table only shows the 2.615 million rows. (note that the md5
checksum for the download file is correct, so it's not corrupt or anything) I redone
this multiple times and it's always 2.615 million rows in the text table.
Another issue I noticed is that the number of rows (and cardinality of the indexes) in the
page and revision tables keep changing every time I look - the number goes up and down by
thousands, sometimes varying by over 100,000. It might go down one time, then up another
time. The number of rows in text table stays constant. I couldn't think of any reason
for this. Note that the table sizes don't seem to change - the page table is 581,696
KiB and the revision table is 1,046 MiB.
Also, if I go to the end of each table in PHPMyAdmin, both the page and revision tables
always show as having 2,614,000 total rows. But the number of rows for these tables given
by SHOW TABLE STATUS is often greater than this number.
Anyone know whether the 2.615 million rows is the right number that enwiki should have,
and why MySQL would keep changing its mind about how many rows the page and revision
tables have? Here's my /etc/my.cnf file if that helps:
[mysqld]
set-variable = max_connections=1000
safe-show-database
log_slow_queries
long_query_time=5
max_allowed_packet=64M
ft_min_word_len=3
query_cache_limit=2M
query_cache_size=64M
default-collation=UTF8_general_ci
default-character-set=UTF8
Thanks.