https://bugzilla.wikimedia.org/show_bug.cgi?id=46014
The common thread here seems to be that sometimes when a vandal edit is
reverted within seconds, sometimes the page content from the vandal edit is
being used while oldids in the skin data (JS, print footer, etc) indicate
the revert.
This doesn't seem to have anything to do with missed cache invalidation via
the API:
* A UI edit calls EditPage::attemptSave, which calls
EditPage::internalAttemptSave and then processes the status without doing
any additional cache checks. EditPage::internalAttemptSave itself calls
WikiPage::doEditContent, then just checks the return value and doesn't do
any additional cache checks.
* An API edit calls EditPage::internalAttemptSave and then processes the
status without doing any additional cache checks.
* An API rollback calls WikiPage::doRollback, which calls
WikiPage::commitRollback, which calls WikiPage::doEditContent and doesn't
do any other cache checks.
The fact that the page content is from an old revision but the skin data is
new makes me suspect a parser cache issue. I suspect Gerrit change 85917
probably improved the situation but didn't completely eliminate it, and
with the number of reversions ClueBot does it probably manages to hit some
race occasionally.
Looking at the parser cache handling:
* WikiPage::doEditContent updates page_touched, then calls
WikiPage::doEditUpdates which saves the just-parsed revision to the parser
cache.
* PoolWorkArticleView also saves its parsed data into the parser cache.
* ApiPurge with forcelinksupdate saves its parsed data into the parser
cache.
* RefreshLinksJob saves its parsed data into the parser cache, if it took
over 1 second to parse.
I'm suspecting there's a race somehow where the vandal saves their edit,
then gets redirected back to action=view on the article, and makes it into
the branch of Article::view that uses PoolWorkArticleView to reparse the
text. Article pre-fetches the content to be parsed and passes it into
PoolWorkArticleView. Then before the PoolWorkArticleView gets a chance to
run, ClueBot comes along and reverts, updating the parser cache in the
process. Then the PoolWorkArticleView gets to run, re-parses the vandal's
content, and then saves that into the parser cache replacing the newer
version.
I'm not sure how to test this theory, though.
If this really is what's happening, it might be enough to just have
PoolWorkArticleView store a timestamp in __construct (maybe only when
$content is passed) instead of calculating the timestamp in doWork. Or we
could look at storing the revid in the parser cache (fetch it from the
$page passed to ParserCache::save if it's not explicitly provided, for BC)
and consider the parser cache entry stale if the stored revid doesn't match
the current revid, just like we do with the page_touched timestamp now.
--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation