Wikitech-l May 2005

wikitech-l@lists.wikimedia.org

136 participants
194 discussions

Re: Thoughts about a future board budget meeting (or budget board meeting ?)

by Daniel Mayer

--- Anthere <anthere9(a)yahoo.com> wrote: > I think the board should do a budget/financial meeting soon. The last one was > done a trimester ago, so it is high time. Yep it is. > It seems to me we probably largely differed from our first trimester budget > planning (but I may be wrong :-)). If so, I presume it will be due to the > various hosting offers we got... In any cases, I think it would be most > informative to see where we are going... I was planning on just asking the board to extend last quarter�s budget to also cover this quarter, since that is what actually is happening. I had hoped that we would *finally* be able to buy much more than we needed and thus get ahead of traffic increases but that did not seem to happen. A supplemental budget could be added along with a supplemental fund drive if needed. > Mav > I have been wondering in the past few days if you could provide us an update > on our financial situation ? Both general report (quick update on values) and > later one of your nice quaterly report on the foundation website would be > most welcome... I suppose you will answer me that you did not receive banking > statements yet to do so... but I guess Terry could help there more than he > could before. You suppose right. :) The problem has been on my end (my fax machine was not working). As of this week it should be working now, so I�ll remind Terry about that. > Question for developers : > I'd like us to be informed on a non technical level of the future technical > requirements to expect and possibly to be reminded how much we can expect > from partners on this. Ie, how much is planned in the future to spend on > servers, hosting and so on. I am sure you will answer it is tough to say, I > know :-) Yes - this is very, very much needed. I asked for a needs/wants/wish list for the last budget but that never materialized. Thus we brought in a whole bunch of money with no concrete plan on how to spend it. That is not something I want to repeat. > The general idea would be to try to evaluate how long the money we have in > the bank will last... and if we can envision spending some of it on more > creative ways than servers :-) > Opinion on this topic ? We need some numbers on that first. > PS : I must mention that we are currently stuck in a rather ridiculous > situation. The french squids needs some RAM... this RAM can only be purchased > in a few websites... which refuse american credit cards... so we are > currently trying to have it bought by the german association... which legally > is not really authorized to do this... so, either the french association > quickly has cash to pay for such expenses, or the foundation should have a > credit card in europe to take care of such matters... opinion ? We could wire some money to the French chapter�s bank account. -- mav __________________________________ Do you Yahoo!? Make Yahoo! your home page http://www.yahoo.com/r/hs

19 years

Our own IRC server

by Tim Starling

We now have an IRC server set up, for the sole purpose of delivering recent changes information. All discussion channels will stay at Freenode. The problem with using Freenode was that it was hard to maintain, both for us and for them. We had to use 10 different connections to avoid flooding off, and if they all connected at once, the server would be K: lined. Despite the best efforts of the Freenode staff, the exemptions they set up were not robust, especially when the configuration changed on either end. With our own IRC server, we can have a single connection for all channels. The bot is exempted from flood controls. This makes it much easier to maintain. We can also create as many channels as we like -- one for every wiki. We don't want to have to administer a general-purpose IRC server. That's something Freenode does a good job of and we don't want to reproduce. So we've patched ircd such that non-opers can't send messages to channels. Keeping our discussion channels on another network is also useful for coordination in the event of failure on our network. The server hostname is irc.wikimedia.org, that's 207.142.131.229 for those people still having DNS cache problems. -- Tim Starling

19 years

Year 2025 bug

by Brion Vibber

This morning a bug was filed about some pages showing timestamps in the year 2025: http://bugzilla.wikimedia.org/show_bug.cgi?id=2138 By the time I was awake and looked at them, there was no trace: everything said 2005 as expected. A few minutes ago somebody piped up on IRC about another one on fr.wikinews, which I was able to confirm. I fixed that one item, and recorded this fact in the server admin log (http://wp.wikidev.net/Server_admin_log#11_May) There are basically two ways this could happen: either some machine had its clock incorrectly set to 2025 instead of 2005, and was subsequently corrected, or there's a horrible mysterious bug that corrupts one particular digit of the timestamp sometimes. I can't find any trace of a machine with its clock incorrectly set now. A few of the earlier machines (srv2-srv4) had their hardware clocks set in the year 2003, but the live system clocks were correct on current time. (I've synchronized the hardware clocks in the hopes they will boot up right next time, too). If there was a machine with its hardware clock set to 2025, it's either not listed in the Apache node group -- in which case we have a rogue machine we can't account for and can't control through the regular means -- or it was corrected and nobody recorded this fact. If we had a machine with correct hardware clock time that somehow got the wrong system time temporarily, well who knows... Does anybody know anything about this? Did anybody set, fix, or change a clock? Did somebody correct the reported database entries without telling anybody about it? If you did it, please give a whistle so I'll be able to sleep at night again. :) -- brion vibber (brion @ pobox.com)

19 years

[patch] Category headers change

by Brent 'Dax' Royal-Gordon

The attached patch (6KB, unified diff) against CategoryPage.php in MediaWiki 1.4.4 changes the way headings are inserted into category pages. Previously, each letter would get its own heading. This was a pet peeve of mine, as on some pages you'd get several headings with only one item: A * Apple B * Banana F * Food * Fruit M * Melon O * Orange T * Tangerine The patch combines consecutive sections with only one item in them, and rewrites the header to the form "A–B". So with the patch applied, the above would be more like: A-B * Apple * Banana F * Food * Fruit M-T * Melon * Orange * Tangerine The patch also changes how pages are inserted into each column. Previously, categories with six pages or less were given one column, while those with more than six had three columns; pages were then divided evenly between the three columns. With this change, all categories (notionally) have three columns, but each column is guaranteed at least six pages before the next column gets any. The columns are also represented as fixed-width <div>s instead of tables; this could be trivially changed. The patch is 6KB, likely mainly because I'm not very familiar with PHP; it would probably need to be made more idiomatic before it could be accepted. It's in use on one of my wikis if you want to see the result: <http://mt.brentdax.com/wiki/Special:Categories> -- Brent 'Dax' Royal-Gordon <brent(a)brentdax.com> Perl and Parrot hacker

19 years

Longterm hosting strategy

by Jimmy Wales

I'd like to open a discussion, very practical, about our medium-term hosting strategy. If you happen to be privy to any "inside information" (there isn't that much) then please be sure to anonymize the names of organizations that might be involved, since this is a public list. :-) What I need to do is prepare an internal document to important interested parties within the community (for example the leaders of the German verein, who have money to spend on their charitable goals, servers is fine for this), and I want to make sure that my overall plans are sensible from a tech point of view. Additionally, I am in talks with a great many organizations who are offering free hosting at various levels, and I need to know what makes sense for us in terms of how small or large each datacenter should be to meet our needs sensibly. While I of course welcome "big picture visionary" statements from people who don't know a lot about our actual network and challenges (we can learn from that), I'm mostly interested in extremely practical *here and now* feedback. The people who are actually involved in the day to day running of the site have a better feel for what makes sense than anyone else -- it is their advice which will carry the day. Here is the current situation -- Florida - 44-ish servers now, 20 more apache/squid machines to be installed within the next two weeks (10 already arrived and Chad will install very soon). 2 more database servers have been order, as well as a JBOD-thingy. Paris - 3 squids, soon to be 6 Yahoo -- datacenter in South Korea, I was just contacted by the tech person and he seeks guidance from us on the exact parameters. We're looking at 20U of apache/squid and 6U of db servers there -- this can probably be online in one or two months, maybe faster if we move fast and they move fast. :-) Belnet/Belgium -- 1 rack of space, unlimited bandwidth, they are ready to go Monday, they can do full hands-on, etc., including replacing borken hard drives and so on like that. They are excited to move forward quickly. In this case, we must supply the hardware. We can either buy hardware (with the German money?) or I can ask someone to buy it for us (see Big Company X, below). Amsterdam - a large NGO wants to do a big press announcement when I'm there in Holland at the end of this month. They are providing a set of servers which have already been ordered. I do not know the exact specifications, perhaps someone else can tell me? Big Company X - this company is prepared to make a very major commitment to hosting for us. Originally there were discussions of us being hosted in their facility, but now we are seeking bids on outsourcing this. The exact parameters are at this time very open ended, and I have told them that basically we need time to figure out what we need. They assure me there is no time pressure and they don't care about PR or anything else -- it's a pure charitable donation, no strings attached. But still I prefer to try to move quickly to take advantage of us just in case the corporate mood changes (you never know with big companies!). The parameters being discussed would be in the range of 2 full racks of servers, essentially a full replication of our Florida data center as of a couple of months ago. They are flexible on the amounts -- it is essentially whatever we need and can honestly justify. They are tech savvy and fully agree with my view that just throwing random money and servers at us is not the best use of our resources -- rather, they prefer that we ask for what we can really use in a way that optimizes the use of their money -- best strategy for everyone, obviously. SOOOOOOOOO...... Feedback? --Jimbo

19 years

Wikipedia and mysql 4.1

by Kevin Carillo

Hi. I'd like to know whether it is possible to run Wikipedia with mysql 4.1. Before installing the whole environment with mysql 4.0.20a, I tried to set it up with mysql 4.1 and I remember having problems installing mediawiki which did not accept this version of mysql. However, my research project requires a lot of queries that contain subqueries, and it is almost impossible to rewrite them all as JOINS (one of the reason is that I not only use SELECT queries but also UPDATE and DELETE). Using Mysql 4.1 would help me considerably. In case, it is possible to use mysql 4.1, is there any way to upgrade the version of mysql by keeping my current wikipedia database? Thank you. Kevin Carillo _____ From: wikitech-l-bounces(a)wikimedia.org [mailto:wikitech-l-bounces@wikimedia.org] On Behalf Of wikitech-l(a)wikimedia.org Sent: May 10, 2005 4:34 AM To: wikitech-l(a)wikimedia.org Subject: Wikitech-l Digest, Vol 22, Issue 26 Importance: Low Send Wikitech-l mailing list submissions to wikitech-l(a)wikimedia.org <mailto:> To subscribe or unsubscribe via the World Wide Web, visit http://mail.wikipedia.org/mailman/listinfo/wikitech-l or, via email, send a message with subject or body 'help' to wikitech-l-request(a)wikimedia.org <mailto:> You can reach the person managing the list at wikitech-l-owner(a)wikimedia.org <mailto:> When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikitech-l digest..." Today's Topics: 1. Re: Parser (was Re: Longterm hosting strategy) (Tim Starling) 2. Re: Parser (was Re: Longterm hosting strategy) (Lee Daniel Crocker) 3. Re: Longterm software strategy (Tim Starling) 4. Re: Parser (was Re: Longterm hosting strategy) (David A. Desrosiers) 5. Re: Parser (was Re: Longterm hosting strategy) (?var Arnfj?r? Bjarmason) 6. New machines installed, killed in record 9.5 hours (Brion Vibber) 7. Re: Longterm software strategy (Brion Vibber) 8. Link table updates (was Re: Longterm software strategy) (Tim Starling) ---------------------------------------------------------------------- Message: 1 Date: Tue, 10 May 2005 13:55:48 +1000 From: Tim Starling <t.starling(a)physics.unimelb.edu.au <mailto:> > Subject: [Wikitech-l] Re: Parser (was Re: Longterm hosting strategy) To: wikitech-l(a)wikimedia.org <mailto:> Message-ID: <d5pau9$7q4$1(a)sea.gmane.org <mailto:> > Content-Type: text/plain; charset=ISO-8859-1 Lee Daniel Crocker wrote: > I agree, I don't think the parser's a big issue, although it would be > nice for a bit snappier response. In hindsight, storing the wikitext in > a database was a mistake. There's already a wonderful piece of software > highly optimized and scalable for storing randomly accessed variable- > sized chunks of text with lots of tools for backup, replication, and so > on; it's called a file system. Storing the wikitext itself in something > like Reiserfs would probably speed it up, and also speed up access to > the rest of the metadata in the database which would become much > smaller. That's what ExternalStore is for. Moving the bulk out of the database, or at least to a different database, is a pressing need. We need to separate bulk, rarely accessed data from hot data, so that we can save the highly redundant storage on the DB master for hot data. Domas has been working on it. We're running out of disk space on Ariel again, and another compression round is obviously only a stopgap solution. -- Tim Starling ------------------------------ Message: 2 Date: Mon, 09 May 2005 21:04:55 -0700 From: Lee Daniel Crocker <lee(a)piclab.com <mailto:> > Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy) To: ?var Arnfj?r? Bjarmason <avarab(a)gmail.com <mailto:> >, Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> > Message-ID: <1115697895.5779.25.camel(a)shuttle.piclab.com <mailto:> > Content-Type: text/plain; charset=utf-8 On Tue, 2005-05-10 at 03:44 +0000, Cvar ArnfjC6rC0 Bjarmason wrote: > > like Reiserfs would probably speed it up, and also speed up access to > > the rest of the metadata in the database which would become much > > smaller. > > How about something like a version control system, subversion for > example, I don't know how it would do speed wise for something like > this but with that you'd get Waaaay too slow (have you ever used Subversion?) But it might not be a bad idea to put a WebDAV/DeltaV front end on whatever we create to make it possible for third-party tools to access it. -- Lee Daniel Crocker <lee(a)piclab.com <mailto:> > <http://creativecommons.org/licenses/publicdomain/> ------------------------------ Message: 3 Date: Tue, 10 May 2005 14:35:03 +1000 From: Tim Starling <t.starling(a)physics.unimelb.edu.au <mailto:> > Subject: [Wikitech-l] Re: Longterm software strategy To: wikitech-l(a)wikimedia.org <mailto:> Message-ID: <d5pd7q$cie$1(a)sea.gmane.org <mailto:> > Content-Type: text/plain; charset=ISO-8859-1 Lee Daniel Crocker wrote: > Yes! There's only one tricky part for which we may have to consider > creative implementations: I tried as much as possible to take style > markup (especially skin-specific) out of the rendered wikitext to > allow it to be cached, but there's one case that's still a problem: > red links (i.e., links to non-existent pages). Users shouted at me > that this was a sine-qua-non feature, and so I had to leave it in. > But it makes caching rendered wikitext hard, and slows down rendering. > One alternative is to simply tolerate them being out of date for the > life of the cache. Another is to possibly update the cache in some > cheaper way. Yet another is to optimize the hell out of discovering > the simple existence of a page, so that it's not a bottleneck in > rendering (say, by having a daemon that keeps a one-bit field for > every page using a spell-checker data structure) We already optimised it, didn't we? In the last public profiling run: http://meta.wikimedia.org/wiki/Profiling/20050328 ....it came in at 2.6% for the non-stub bundled query and 0.4% for the stub query. I'd hardly call that a bottleneck. Individual link existence tests came in at 5.9%, mostly due to special pages, but I've largely fixed that in 1.5 by bundling the existence tests for commonly requested special pages. It wasn't so long ago it was taking 15% for individual queries and 15% for LinkCache::preFill(): http://meta.wikimedia.org/wiki/Profiling/Live_aggregate_20040604 ....so we've come a long way. > I'm all for your method, and I agree it's not an urgent need. But I > think we can slip the timeline even more. The existing codebase will > eventually be a liability, but I think we can throw hardware at it for > a year or two. Also, if we go the route of making independent daemons > linked into the existing UI code, we don't have to deploy all at once. > We could, for example, make and deploy the math daemon as a proof-of- > concept, work out bugs with that, then do the others afterward. We've already got two proof-of-concept daemons: the Chinese word segmenter and Lucene. There is a technical problem with Lucene at the moment: it uses file() to fetch the result over HTTP, but that has an unconfigurable 3 minute timeout. If the search daemon goes down, we hit apache connection limits within a minute and the site stops working. We can either patch PHP to use default_socket_timeout in this case, or switch to another method like DIY pfsockopen or curl. > Another thing to consider: at least some of the wikipedia-driven > development will be totally unnecessary for mediawiki as a general- > purpose open source project. We may want to decouple those projects > at some point. Brion doesn't want to. -- Tim Starling ------------------------------ Message: 4 Date: Tue, 10 May 2005 00:50:18 -0400 (EDT) From: "David A. Desrosiers" <desrod(a)gnu-designs.com <mailto:> > Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy) To: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> > Message-ID: <Pine.LNX.4.62.0505100049060.8023(a)angst.gnu-designs.com <mailto:> > Content-Type: TEXT/PLAIN; charset=US-ASCII > Waaaay too slow (have you ever used Subversion?) But it might not > be a bad idea to put a WebDAV/DeltaV front end on whatever we create > to make it possible for third-party tools to access it. Only if you're using it with the default (and horribly slow) bdb backend. If you use fsfs, you'll see performance several orders of magnitude faster (and it also doesn't wedge or break like bdbd does all the time). http://svn.collab.net/repos/svn/trunk/notes/fsfs David A. Desrosiers desrod(a)gnu-designs.com <mailto:> http://gnu-designs.com ------------------------------ Message: 5 Date: Tue, 10 May 2005 07:22:10 +0000 From: ?var Arnfj?r? Bjarmason <avarab(a)gmail.com <mailto:> > Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy) To: Lee Daniel Crocker <lee(a)piclab.com <mailto:> > Cc: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> > Message-ID: <51dd1af805051000224ffb81c7(a)mail.gmail.com <mailto:> > Content-Type: text/plain; charset=ISO-8859-1 > Waaaay too slow (have you ever used Subversion?) But it might not be > a bad idea to put a WebDAV/DeltaV front end on whatever we create to > make it possible for third-party tools to access it. I've used it since early betas, however not in a production enviroment, just my personal repository, so I haven't felt much need for speed, however as David pointed out you might get more speed out of fsfs than dbd and I just mentioned svn as an example, there are more version control systems in the world. Regardless, using a VCS would bring diff-based storage, and when it's all said and done making a custom implementation of vcs-like features might end up in something not much faster or even slower than a "real" version control system. ------------------------------ Message: 6 Date: Tue, 10 May 2005 00:54:47 -0700 From: Brion Vibber <brion(a)pobox.com <mailto:> > Subject: [Wikitech-l] New machines installed, killed in record 9.5 hours To: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> > Message-ID: <428068C7.30203(a)pobox.com <mailto:> > Content-Type: text/plain; charset="iso-8859-1" All the new boxen (srv11-srv30) died mysteriously while Domas and I were trying to restart Apache after installing PHP's CURL library extension so a proper timeout could be used on the Lucene search. By dead, I mean "Destination Host Unreachable". They're off the network, kaput. That _shouldn't_ happen. :) All the other machines seem just fine; only the spanking new ones exploded, and the reason for it is not too clear. (Freak library incompatibility -> killing machines? That _shouldn't_ happen.) It may be necessary for somebody to flip the switches and reboot. -- brion vibber (brion @ <mailto:> pobox.com)

19 years

DNS weirdness today?

by John Fader

Twice today I've requested a page (in one case the search page) on en.wikipedia and ended up at some page in what I think is ja.wikipedia. I can't reproduce it reliably. I can mail a screenshot, but it's not really useful for debugging. Perhaps we have a DNS (or squid) mixup ? -- John Fader

19 years

New machines installed, killed in record 9.5 hours

by Brion Vibber

All the new boxen (srv11-srv30) died mysteriously while Domas and I were trying to restart Apache after installing PHP's CURL library extension so a proper timeout could be used on the Lucene search. By dead, I mean "Destination Host Unreachable". They're off the network, kaput. That _shouldn't_ happen. :) All the other machines seem just fine; only the spanking new ones exploded, and the reason for it is not too clear. (Freak library incompatibility -> killing machines? That _shouldn't_ happen.) It may be necessary for somebody to flip the switches and reboot. -- brion vibber (brion @ pobox.com)

19 years

Human-Computer assisted editing

by Tomer Chachamu

Hello, I had an idea where the computer makes (invents) the edits and the human then approves them. The edits would technically be made on the user's account (making the edit their responsibility) but the summary would link to, say, [[User:Templatefixer]], which would link to a website. The website would have a list (in a database) of articles to be corrected (taken from a DB dump) and whenever somebody committed one of these edits it would cross it off the list. I noticed that if I don't send wpEditToken a preview is shown instead of the edit being committed. I would like to suggest that for 1.5 instead of the preview being shown, the diff is shown, like pressing the "Show changes" button. With this modification, I think I could make a PHP script for this, with a top frame controlling things (on an external website) and the main frame always showing something on Wikipedia. I have a rough idea of the user interface. This could be used for all those cases where a computer can create suggestions but overall creates too many false positives. IMO this could work very well to semi-automate some of the more tedious work, such as stub sorting (where the article is shown and a list of stub templates, but no physical typing is required) and fixing simple punctuation errors. Yours, Tomer Chachamu (15)

19 years

Reloading WIKI database

by Joe Flowers

I am trying to use mysql.exe --username=xxx -password=xxx myDatabaseName < LoadFile.sql to restore the wikipedia database from the .sql files that they provide. However, I am running into issues restoring the data. I've been able to restore from a file as large as 70mb but anything bigger than that seems to fail. It doesn't give any notification of failure but if it runs overnight it appears to just stop, and I have to manually close the command prompt window. This is on a windows server 2003 box. Does anybody know of any tools that can help handle larger .sql files? I am planning on trying to load the cur database, which is a 2.5gb .sql file, and I'd rather not have to split it into 50 different files. Thanks!

19 years

← Newer
1
...
12
13
14
15
16
17
18
19
20
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l May 2005