Aryeh Gregor wrote:
The logs are taken from the Squids, long before
MediaWiki touches
them, so they shouldn't be normalized at all.
Search isn't cached, so it may be easier to just log it at the backend.
I expect many people using things like "please tell me how many people
live in China", as revealed by such titles being created.
My conclusion is that some people (10%?) don't know how to search in a
encyclopedia. I mean, we have an article called [[China]] with a proper
Population section...
While reading this thread I have deleted a page called "Why do ghosts
manifest themselves?" with content
"fogcpijkñldjlkcmvlkmc.,vmblcjgmlkjglkjmf,.mfdgfdolfgdjk" [1].
I'm thinking in an extension to feed with regex extracting the actual
title they may be loking for.
Sampled search logs are unlikely to reveal them though, since what they
are repeating are the non-keywords, not the full query.
1-http://es.wikipedia.org/w/index.php?title=Special:Log&page=Por_que_se…