I stated:
No, it shouldn't be a problem in normal
circumstances.
An HTML cache should only be changed if
the EXISTENCE of a cached page changes. If an article is
linked to by 1000s+ of pages, it almost certainly already exists,
so no cache (other than the one edited article) would be invalidated.
Neil Harris said:
By "hitting" in this case, I meant
"deleting or creating".
Sorry for the ambiguity.
Take a look at something like [[census]], which is linked by about
36.000 articles.
It sounds like we're in agreement. [[census]] already
exists, and LOTS of articles link to it. If someone completely
DELETED [[census]] (and didn't just edit it), then
that would cause a lot of caches to be affected. But that should
be a warning signal - perhaps [[census]] should be edited, but
it should probably still exist as an article (and NOT deleted).
... but there are lots of
data integrity and race condition / transaction issues to be thought
about before any of this can be implemented.
Yes, anytime there are multiple front-ends, there are
potentials for race conditions. I heartily agree.
Let's finish splitting the
system into two machines, DB and WWW, before any re-architecture is
performed.
Fair enough.
However, if all wikitext is removed from MySQL
and placed in the filesystem, splitting out the MySQL database on
a separate machine may not buy much. Then, most of the work would be
then done by the filesystem, with only housekeeping metainformation
being accessed through MySQL, and only when editing or accessing
special pages.
Unfortunately, we can't really separate processor architecture from
software architecture - the goal is to maximize performance while
minimizing hardware & development-time cost. Hopefully, this
kind of free-flowing discussion of alternatives will yield
that perfect combination - or at least a good one.