I wrote earlier:
Okay, well, so that we have some quantified data and
can think make a firm
decision one way or the other about the suitability of a MyISAM table for
category intersections, I'll take a random sample of edits (of some
statistically significant size - doesn't take much) and see how many involve
a category change and report back.
So I took a sample of 50 randomly selected edits (looked at every nth edit on
recentchanges) and looked at the diff for that edit to see how many had category
changes. In 50 edits, 3 of them had category changes. This is 6% of edits
changing categories. The number of categories changed per edit doesn't matter,
as the row in the proposed table is per page, so there is only one update
operation per edit (to the proposed table) regardless of the number of
categories changed in that edit.
In a much less scientific sample, I looked at recentchanges to see how many
edits we're having per hour. From
http://stats.wikimedia.org/EN/TablesDatabaseEdits.htm I come up with an average
of 83 per minute, from June, but I'm sure it's gone up - taking a quick look at
recentchanges, I see about 300 in the past minute. Even at 300/minute, that's
5/second, and it's unlikely that more than one of those would write to the
fulltext indexed MyISAM categoryintersections table in that one second.
So, I think this is a pretty good solution. I think, if we're concerned about a
MyISAM table crashing, perhaps it makes send to have the data in it be redundant
(store the categories in the text table as well), so rebuilding it becomes a
more straightforward operation of copying selected rows to the innoDB table.
Thoughts?
Aerik