Wikitech-l February 2003

wikitech-l@lists.wikimedia.org

54 participants
118 discussions

by Brion Vibber

Slow query log from Friday morning to present: http://www.wikipedia.org/tools/slowquerylog.gz -- brion vibber (brion @ pobox.com)

21 years, 3 months

Slow query log

by Jan Hidders

Ian Gilfiallan wrote: > - it's not the volume of queries, it's often the query that starts the > backing up. So look at the first query in the log, and the first queries > after a period of time running OK. Two that I saw this way are: > > SELECT cur_id,cur_namespace,cur_title,cur_text > FROM cur,searchindex > WHERE cur_id=si_page AND (MATCH (si_text) AGAINST ('math characters') AND > (cur_is_redirect=0) ) AND (cur_namespace=0) LIMIT 0, 20; This query looks extremely suspect to me because it seems to combine the worst of two worlds: the locking overhead of InnoDB tables (for table 'cur') and large locking granulatity of MyISAM tables (for table 'searchindex' which needs to be updated every time an article is saved) . As I said before, I really don't think we should be using InnoDB tables, but perhaps Ian has some good arguments why we should. -- Jan Hidders

21 years, 3 months

New random page code

by Brion Vibber

I just made one more tweak to the random page selection; I had been getting duplicates more frequently than I would expect, this should spread out the results better: $sqlget = "SELECT cur_id,cur_title FROM cur USE INDEX (cur_random) WHERE cur_namespace=0 AND cur_is_redirect=0 + AND cur_random>RAND() ORDER BY cur_random LIMIT 1"; Please check it experimentally; currently on test and English. -- brion vibber (brion @ pobox.com)

21 years, 3 months

Banning signed in users

by erik_moeller＠gmx.de

Allowing sysops to generally ban signed in users would reverse our existing policy that this can only be done with Jimbo's blessing. But we don't have to go that far. It would be relatively simple to do the following: 1) Rename the "Block IP" special page to "Block user" 2) Add an additional textfield on that page to ban a signed in user 3) If you try to ban a signed in user, the CUR and OLD table are queried to check whether that user has made any edits that are older than 14 days 4) If this is the case, the user is not banned and a warning message is displayed. Otherwise, the user is banned. By doing this, we raise the trust level required to be "immune" from sysop bans, while making it possible to quickly block persistent vandals. As an additional refinement, we could add a past_ban flag to the user table. If a user has been banned in the past, he can be banned again by any sysop. I don't know if we can retrieve the IP address from our tables - I don't see how since we do not store it in CUR. But if we can, we could also provide sysops with the IP address of the blocked user, after they have successfully blocked them. Regards, Erik

21 years, 3 months

Glitch restoring deleted page; lost history

by Toby Bartels

I have a trick for combining the history of badly moved pages, which I've applied successfully a few times over the past month. When I tried it just now, it failed miserably. I will describe my actions: First, I checked that the histories of [[Pennsylvania Dutch]] and [[Pennsylvania German]] didn't overlap, except for redirecting. Then I deleted [[Pennsylvania Dutch]] (the current redirect). Then I moved [[Pennsylvania German]] to [[Pennsylvania Dutch]]. Then I restored the deletion of [[Pennsylvania Dutch]]. Et voila! the entire history is combined at [[Pennsylvania Dutch]] (except for the unavoidable intrusion of redirects at the moving point), ready to be moved back to [[Pennsylvania German]]. Well, that's how it had always worked before. But in fact, the deleted history of [[Pennsylvania Dutch]] was *not* restored! I restored it again, to see if that would help, but it didn't. So I sadly moved it back to [[Pennsylvania German]], and now there seems to be no way to get the history of [[Pennsylvania Dutch]]. Where did it go? History shouldn't be permanently deletable with our nice soft delete. I suspect a database bug, but I don't know how. Further hints of a database bug: Thinking that I might have misspelled "Pennsylvania" at some point, I did a search on the term "Dutch". Although I got several redirects in the page title matches and these matches easily fit onto a single page of results, [[Pennsylvania Dutch]] never appeared on the list. Yet the page undoubtedly exists, historyless though it may be. So there is a bug somewhere that lost track of [[Pennsylvania Dutch]]; and there is some history that was truly deleted forever, but which we might be able to dredge from the database (or yesterday's log), and which it would be good to restore to its proper place in the history of [[Pennsylvania German]]. -- Toby Bartels

21 years, 3 months

Wikipedia scheduled downtime tonight

by Brion Vibber

The English-language Wikipedia will be in _read-only mode_ for a few hours tonight (Feb 9->Feb 10) while the database structure is altered to eliminate some of the remaining performance drags before the heavy weekday traffic gets into full swing. This will also fix the out-of-order page history bug, which was caused by a temporary workaround to the slow history problem. The downtime window will cover roughly 04:00-08:00 UTC (UK time). For us USians, that's 8pm-midnight PST / 11pm-3am EST. Australians & others, please check http://www.wikipedia.org/wiki/Time_zone for the appropriate offset to your local time. (I'm not certain of the end time; it may run shorter, and hopefully won't run longer. It's a big database, and mass modifications take a *long* time.) During that time the wiki should remain accessible for reading, but no one will be able to log in or make edits. I'll also be upgrading various other languages to the current software revision; those should go relatively quickly, and will be staggered. Generally, meta, wiktionary, and Wikipedia's other languages should remain accessible for both reading and editing through the night except for some brief read-only outages. -- brion vibber (brion @ pobox.com) ps Tech note: I figure we should hold off on the mysql 4 upgrade until they can release a version that isn't followed within a day by a dire warning about a new bug that directly affects a type of query that we use a lot for Wikipedia. :) In the meantime, I'm adding the inverse timestamp column which can be sorted on the index by mysql 3 (thus speeding and fixing bugs in history, contribs, & recentchangeslinked), and a random queue index column to replace the separate table and its insanely slow refilling operation.

21 years, 3 months

FYI test.wikipedia.org

by Brion Vibber

I've overwritten the experimental stuff in test.wikipedia.org with the current work code (including the new timestamp & random code which will be going live in a few hours -- test fast! :) Copies of the recent state should be in the 'backup2' subdirectory. -- brion vibber (brion @ pobox.com)

21 years, 3 months

Bug reports auto-notification

by Brion Vibber

Question: would anyone else like to receive automatic e-mail notification of bug reports and feature requests filed at the tracker on our SourceForge project? If so, I can set up an alias address which will reflect to interested parties and tell SF to send notices there instead of to me directly. (Yes, people *do* file bug reports at the tracker.) -- brion vibber (brion @ pobox.com)

21 years, 3 months

Read-only mirror

by Jimmy Wales

I'd like to setup a read-only mirror of wikipedia on ross.bomis.com, and point a URL at it. Possibly we can think of a good automatic redirection scheme or something. Well, I say "I'd like to...", but actually that means "I'm willing to pay Jason to..." Jason posted the other day about a zero budget, but that's not *exactly* true. :-) We have some loose hardware sitting around that we might as well press into service. Ideally, we would have a way to _automatically_ redirect requests to the read-only site when the full site is dead. If throwing hardware at the problem is likely to help, I'll do it. --Jimbo

21 years, 3 months

Random page queue filling is darn slow

by Brion Vibber

Earlier today I noticed that it was taking quite a long time to refill the random page queue; this currently works by: INSERT INTO random(ra_current,ra_title) SELECT 0,cur_title FROM cur WHERE cur_namespace=0 AND cur_is_redirect=0 ORDER BY RAND() LIMIT 1000 Is it actually trying to reorder all 100,000+ entries randomly and then taking the first 1000, or does it just _seem_ that slow? And is there actually a faster way of doing this? Off the top of my head; what if the random queue listed *every* page; each page would be queued when it was created (and removed if deleted or changed to non-"article" state), and associated with a random index number. When asked to view a random page, we sort on the random index column (which would be indexed!) and take the lowest number; then assign that same article a new random index. This wouldn't require the occasional delay to refill the queue, and since the random index would be indexed, retrieval should be quick enough even with a large number of articles. Thoughts? -- brion vibber (brion @ pobox.com)

21 years, 3 months

← Newer
1
...
4
5
6
7
8
9
10
11
12
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2003