--- Anthere <anthere9(a)yahoo.com> wrote:
> I think the board should do a budget/financial meeting soon. The last one was
> done a trimester ago, so it is high time.
Yep it is.
> It seems to me we probably largely differed from our first trimester budget
> planning (but I may be wrong :-)). If so, I presume it will be due to the
> various hosting offers we got... In any cases, I think it would be most
> informative to see where we are going...
I was planning on just asking the board to extend last quarter�s budget to also
cover this quarter, since that is what actually is happening. I had hoped that
we would *finally* be able to buy much more than we needed and thus get ahead
of traffic increases but that did not seem to happen. A supplemental budget
could be added along with a supplemental fund drive if needed.
> Mav
> I have been wondering in the past few days if you could provide us an update
> on our financial situation ? Both general report (quick update on values) and
> later one of your nice quaterly report on the foundation website would be
> most welcome... I suppose you will answer me that you did not receive banking
> statements yet to do so... but I guess Terry could help there more than he
> could before.
You suppose right. :) The problem has been on my end (my fax machine was not
working). As of this week it should be working now, so I�ll remind Terry about
that.
> Question for developers :
> I'd like us to be informed on a non technical level of the future technical
> requirements to expect and possibly to be reminded how much we can expect
> from partners on this. Ie, how much is planned in the future to spend on
> servers, hosting and so on. I am sure you will answer it is tough to say, I
> know :-)
Yes - this is very, very much needed. I asked for a needs/wants/wish list for
the last budget but that never materialized. Thus we brought in a whole bunch
of money with no concrete plan on how to spend it. That is not something I want
to repeat.
> The general idea would be to try to evaluate how long the money we have in
> the bank will last... and if we can envision spending some of it on more
> creative ways than servers :-)
> Opinion on this topic ?
We need some numbers on that first.
> PS : I must mention that we are currently stuck in a rather ridiculous
> situation. The french squids needs some RAM... this RAM can only be purchased
> in a few websites... which refuse american credit cards... so we are
> currently trying to have it bought by the german association... which legally
> is not really authorized to do this... so, either the french association
> quickly has cash to pay for such expenses, or the foundation should have a
> credit card in europe to take care of such matters... opinion ?
We could wire some money to the French chapter�s bank account.
-- mav
__________________________________
Do you Yahoo!?
Make Yahoo! your home page
http://www.yahoo.com/r/hs
We now have an IRC server set up, for the sole purpose of delivering
recent changes information. All discussion channels will stay at Freenode.
The problem with using Freenode was that it was hard to maintain, both
for us and for them. We had to use 10 different connections to avoid
flooding off, and if they all connected at once, the server would be K:
lined. Despite the best efforts of the Freenode staff, the exemptions
they set up were not robust, especially when the configuration changed
on either end.
With our own IRC server, we can have a single connection for all
channels. The bot is exempted from flood controls. This makes it much
easier to maintain. We can also create as many channels as we like --
one for every wiki.
We don't want to have to administer a general-purpose IRC server. That's
something Freenode does a good job of and we don't want to reproduce. So
we've patched ircd such that non-opers can't send messages to channels.
Keeping our discussion channels on another network is also useful for
coordination in the event of failure on our network.
The server hostname is irc.wikimedia.org, that's 207.142.131.229 for
those people still having DNS cache problems.
-- Tim Starling
This morning a bug was filed about some pages showing timestamps in the
year 2025:
http://bugzilla.wikimedia.org/show_bug.cgi?id=2138
By the time I was awake and looked at them, there was no trace:
everything said 2005 as expected.
A few minutes ago somebody piped up on IRC about another one on
fr.wikinews, which I was able to confirm. I fixed that one item, and
recorded this fact in the server admin log
(http://wp.wikidev.net/Server_admin_log#11_May)
There are basically two ways this could happen: either some machine had
its clock incorrectly set to 2025 instead of 2005, and was subsequently
corrected, or there's a horrible mysterious bug that corrupts one
particular digit of the timestamp sometimes.
I can't find any trace of a machine with its clock incorrectly set now.
A few of the earlier machines (srv2-srv4) had their hardware clocks set
in the year 2003, but the live system clocks were correct on current
time. (I've synchronized the hardware clocks in the hopes they will boot
up right next time, too). If there was a machine with its hardware clock
set to 2025, it's either not listed in the Apache node group -- in which
case we have a rogue machine we can't account for and can't control
through the regular means -- or it was corrected and nobody recorded
this fact.
If we had a machine with correct hardware clock time that somehow got
the wrong system time temporarily, well who knows...
Does anybody know anything about this? Did anybody set, fix, or change a
clock? Did somebody correct the reported database entries without
telling anybody about it? If you did it, please give a whistle so I'll
be able to sleep at night again. :)
-- brion vibber (brion @ pobox.com)
The attached patch (6KB, unified diff) against CategoryPage.php in
MediaWiki 1.4.4 changes the way headings are inserted into category
pages.
Previously, each letter would get its own heading. This was a pet
peeve of mine, as on some pages you'd get several headings with only
one item:
A
* Apple
B
* Banana
F
* Food
* Fruit
M
* Melon
O
* Orange
T
* Tangerine
The patch combines consecutive sections with only one item in them,
and rewrites the header to the form "A–B". So with the patch
applied, the above would be more like:
A-B
* Apple
* Banana
F
* Food
* Fruit
M-T
* Melon
* Orange
* Tangerine
The patch also changes how pages are inserted into each column.
Previously, categories with six pages or less were given one column,
while those with more than six had three columns; pages were then
divided evenly between the three columns. With this change, all
categories (notionally) have three columns, but each column is
guaranteed at least six pages before the next column gets any. The
columns are also represented as fixed-width <div>s instead of tables;
this could be trivially changed.
The patch is 6KB, likely mainly because I'm not very familiar with
PHP; it would probably need to be made more idiomatic before it could
be accepted. It's in use on one of my wikis if you want to see the
result:
<http://mt.brentdax.com/wiki/Special:Categories>
--
Brent 'Dax' Royal-Gordon <brent(a)brentdax.com>
Perl and Parrot hacker
I'd like to open a discussion, very practical, about our medium-term
hosting strategy. If you happen to be privy to any "inside information"
(there isn't that much) then please be sure to anonymize the names of
organizations that might be involved, since this is a public list. :-)
What I need to do is prepare an internal document to important
interested parties within the community (for example the leaders of the
German verein, who have money to spend on their charitable goals,
servers is fine for this), and I want to make sure that my overall plans
are sensible from a tech point of view.
Additionally, I am in talks with a great many organizations who are
offering free hosting at various levels, and I need to know what makes
sense for us in terms of how small or large each datacenter should be to
meet our needs sensibly.
While I of course welcome "big picture visionary" statements from people
who don't know a lot about our actual network and challenges (we can
learn from that), I'm mostly interested in extremely practical *here and
now* feedback. The people who are actually involved in the day to day
running of the site have a better feel for what makes sense than anyone
else -- it is their advice which will carry the day.
Here is the current situation --
Florida - 44-ish servers now, 20 more apache/squid machines to be
installed within the next two weeks (10 already arrived and Chad will
install very soon). 2 more database servers have been order, as well as
a JBOD-thingy.
Paris - 3 squids, soon to be 6
Yahoo -- datacenter in South Korea, I was just contacted by the tech
person and he seeks guidance from us on the exact parameters. We're
looking at 20U of apache/squid and 6U of db servers there -- this can
probably be online in one or two months, maybe faster if we move fast
and they move fast. :-)
Belnet/Belgium -- 1 rack of space, unlimited bandwidth, they are ready
to go Monday, they can do full hands-on, etc., including replacing
borken hard drives and so on like that. They are excited to move
forward quickly. In this case, we must supply the hardware. We can
either buy hardware (with the German money?) or I can ask someone to buy
it for us (see Big Company X, below).
Amsterdam - a large NGO wants to do a big press announcement when I'm
there in Holland at the end of this month. They are providing a set of
servers which have already been ordered. I do not know the exact
specifications, perhaps someone else can tell me?
Big Company X - this company is prepared to make a very major commitment
to hosting for us. Originally there were discussions of us being hosted
in their facility, but now we are seeking bids on outsourcing this. The
exact parameters are at this time very open ended, and I have told them
that basically we need time to figure out what we need. They assure me
there is no time pressure and they don't care about PR or anything else
-- it's a pure charitable donation, no strings attached. But still I
prefer to try to move quickly to take advantage of us just in case the
corporate mood changes (you never know with big companies!). The
parameters being discussed would be in the range of 2 full racks of
servers, essentially a full replication of our Florida data center as of
a couple of months ago.
They are flexible on the amounts -- it is essentially whatever we need
and can honestly justify. They are tech savvy and fully agree with my
view that just throwing random money and servers at us is not the best
use of our resources -- rather, they prefer that we ask for what we can
really use in a way that optimizes the use of their money -- best
strategy for everyone, obviously.
SOOOOOOOOO......
Feedback?
--Jimbo
Hi.
I'd like to know whether it is possible to run Wikipedia with mysql 4.1.
Before installing the whole environment with mysql 4.0.20a, I tried to set
it up with mysql 4.1 and I remember having problems installing mediawiki
which did not accept this version of mysql.
However, my research project requires a lot of queries that contain
subqueries, and it is almost impossible to rewrite them all as JOINS (one of
the reason is that I not only use SELECT queries but also UPDATE and
DELETE). Using Mysql 4.1 would help me considerably.
In case, it is possible to use mysql 4.1, is there any way to upgrade the
version of mysql by keeping my current wikipedia database?
Thank you.
Kevin Carillo
_____
From: wikitech-l-bounces(a)wikimedia.org
[mailto:wikitech-l-bounces@wikimedia.org] On Behalf Of
wikitech-l(a)wikimedia.org
Sent: May 10, 2005 4:34 AM
To: wikitech-l(a)wikimedia.org
Subject: Wikitech-l Digest, Vol 22, Issue 26
Importance: Low
Send Wikitech-l mailing list submissions to
wikitech-l(a)wikimedia.org <mailto:>
To subscribe or unsubscribe via the World Wide Web, visit
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
or, via email, send a message with subject or body 'help' to
wikitech-l-request(a)wikimedia.org <mailto:>
You can reach the person managing the list at
wikitech-l-owner(a)wikimedia.org <mailto:>
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikitech-l digest..."
Today's Topics:
1. Re: Parser (was Re: Longterm hosting strategy) (Tim Starling)
2. Re: Parser (was Re: Longterm hosting strategy)
(Lee Daniel Crocker)
3. Re: Longterm software strategy (Tim Starling)
4. Re: Parser (was Re: Longterm hosting strategy)
(David A. Desrosiers)
5. Re: Parser (was Re: Longterm hosting strategy)
(?var Arnfj?r? Bjarmason)
6. New machines installed, killed in record 9.5 hours (Brion Vibber)
7. Re: Longterm software strategy (Brion Vibber)
8. Link table updates (was Re: Longterm software strategy)
(Tim Starling)
----------------------------------------------------------------------
Message: 1
Date: Tue, 10 May 2005 13:55:48 +1000
From: Tim Starling <t.starling(a)physics.unimelb.edu.au <mailto:> >
Subject: [Wikitech-l] Re: Parser (was Re: Longterm hosting strategy)
To: wikitech-l(a)wikimedia.org <mailto:>
Message-ID: <d5pau9$7q4$1(a)sea.gmane.org <mailto:> >
Content-Type: text/plain; charset=ISO-8859-1
Lee Daniel Crocker wrote:
> I agree, I don't think the parser's a big issue, although it would be
> nice for a bit snappier response. In hindsight, storing the wikitext in
> a database was a mistake. There's already a wonderful piece of software
> highly optimized and scalable for storing randomly accessed variable-
> sized chunks of text with lots of tools for backup, replication, and so
> on; it's called a file system. Storing the wikitext itself in something
> like Reiserfs would probably speed it up, and also speed up access to
> the rest of the metadata in the database which would become much
> smaller.
That's what ExternalStore is for. Moving the bulk out of the database,
or at least to a different database, is a pressing need. We need to
separate bulk, rarely accessed data from hot data, so that we can save
the highly redundant storage on the DB master for hot data. Domas has
been working on it. We're running out of disk space on Ariel again, and
another compression round is obviously only a stopgap solution.
-- Tim Starling
------------------------------
Message: 2
Date: Mon, 09 May 2005 21:04:55 -0700
From: Lee Daniel Crocker <lee(a)piclab.com <mailto:> >
Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy)
To: ?var Arnfj?r? Bjarmason <avarab(a)gmail.com <mailto:> >, Wikimedia
developers
<wikitech-l(a)wikimedia.org <mailto:> >
Message-ID: <1115697895.5779.25.camel(a)shuttle.piclab.com <mailto:> >
Content-Type: text/plain; charset=utf-8
On Tue, 2005-05-10 at 03:44 +0000, Cvar ArnfjC6rC0 Bjarmason wrote:
> > like Reiserfs would probably speed it up, and also speed up access to
> > the rest of the metadata in the database which would become much
> > smaller.
>
> How about something like a version control system, subversion for
> example, I don't know how it would do speed wise for something like
> this but with that you'd get
Waaaay too slow (have you ever used Subversion?) But it might not be
a bad idea to put a WebDAV/DeltaV front end on whatever we create to
make it possible for third-party tools to access it.
--
Lee Daniel Crocker <lee(a)piclab.com <mailto:> >
<http://creativecommons.org/licenses/publicdomain/>
------------------------------
Message: 3
Date: Tue, 10 May 2005 14:35:03 +1000
From: Tim Starling <t.starling(a)physics.unimelb.edu.au <mailto:> >
Subject: [Wikitech-l] Re: Longterm software strategy
To: wikitech-l(a)wikimedia.org <mailto:>
Message-ID: <d5pd7q$cie$1(a)sea.gmane.org <mailto:> >
Content-Type: text/plain; charset=ISO-8859-1
Lee Daniel Crocker wrote:
> Yes! There's only one tricky part for which we may have to consider
> creative implementations: I tried as much as possible to take style
> markup (especially skin-specific) out of the rendered wikitext to
> allow it to be cached, but there's one case that's still a problem:
> red links (i.e., links to non-existent pages). Users shouted at me
> that this was a sine-qua-non feature, and so I had to leave it in.
> But it makes caching rendered wikitext hard, and slows down rendering.
> One alternative is to simply tolerate them being out of date for the
> life of the cache. Another is to possibly update the cache in some
> cheaper way. Yet another is to optimize the hell out of discovering
> the simple existence of a page, so that it's not a bottleneck in
> rendering (say, by having a daemon that keeps a one-bit field for
> every page using a spell-checker data structure)
We already optimised it, didn't we? In the last public profiling run:
http://meta.wikimedia.org/wiki/Profiling/20050328
....it came in at 2.6% for the non-stub bundled query and 0.4% for the
stub query. I'd hardly call that a bottleneck. Individual link existence
tests came in at 5.9%, mostly due to special pages, but I've largely
fixed that in 1.5 by bundling the existence tests for commonly requested
special pages. It wasn't so long ago it was taking 15% for individual
queries and 15% for LinkCache::preFill():
http://meta.wikimedia.org/wiki/Profiling/Live_aggregate_20040604
....so we've come a long way.
> I'm all for your method, and I agree it's not an urgent need. But I
> think we can slip the timeline even more. The existing codebase will
> eventually be a liability, but I think we can throw hardware at it for
> a year or two. Also, if we go the route of making independent daemons
> linked into the existing UI code, we don't have to deploy all at once.
> We could, for example, make and deploy the math daemon as a proof-of-
> concept, work out bugs with that, then do the others afterward.
We've already got two proof-of-concept daemons: the Chinese word
segmenter and Lucene.
There is a technical problem with Lucene at the moment: it uses file()
to fetch the result over HTTP, but that has an unconfigurable 3 minute
timeout. If the search daemon goes down, we hit apache connection limits
within a minute and the site stops working. We can either patch PHP to
use default_socket_timeout in this case, or switch to another method
like DIY pfsockopen or curl.
> Another thing to consider: at least some of the wikipedia-driven
> development will be totally unnecessary for mediawiki as a general-
> purpose open source project. We may want to decouple those projects
> at some point.
Brion doesn't want to.
-- Tim Starling
------------------------------
Message: 4
Date: Tue, 10 May 2005 00:50:18 -0400 (EDT)
From: "David A. Desrosiers" <desrod(a)gnu-designs.com <mailto:> >
Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy)
To: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> >
Message-ID: <Pine.LNX.4.62.0505100049060.8023(a)angst.gnu-designs.com
<mailto:> >
Content-Type: TEXT/PLAIN; charset=US-ASCII
> Waaaay too slow (have you ever used Subversion?) But it might not
> be a bad idea to put a WebDAV/DeltaV front end on whatever we create
> to make it possible for third-party tools to access it.
Only if you're using it with the default (and horribly slow)
bdb backend. If you use fsfs, you'll see performance several orders of
magnitude faster (and it also doesn't wedge or break like bdbd does
all the time).
http://svn.collab.net/repos/svn/trunk/notes/fsfs
David A. Desrosiers
desrod(a)gnu-designs.com <mailto:>
http://gnu-designs.com
------------------------------
Message: 5
Date: Tue, 10 May 2005 07:22:10 +0000
From: ?var Arnfj?r? Bjarmason <avarab(a)gmail.com <mailto:> >
Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy)
To: Lee Daniel Crocker <lee(a)piclab.com <mailto:> >
Cc: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> >
Message-ID: <51dd1af805051000224ffb81c7(a)mail.gmail.com <mailto:> >
Content-Type: text/plain; charset=ISO-8859-1
> Waaaay too slow (have you ever used Subversion?) But it might not be
> a bad idea to put a WebDAV/DeltaV front end on whatever we create to
> make it possible for third-party tools to access it.
I've used it since early betas, however not in a production
enviroment, just my personal repository, so I haven't felt much need
for speed, however as David pointed out you might get more speed out
of fsfs than dbd and I just mentioned svn as an example, there are
more version control systems in the world.
Regardless, using a VCS would bring diff-based storage, and when it's
all said and done making a custom implementation of vcs-like features
might end up in something not much faster or even slower than a "real"
version control system.
------------------------------
Message: 6
Date: Tue, 10 May 2005 00:54:47 -0700
From: Brion Vibber <brion(a)pobox.com <mailto:> >
Subject: [Wikitech-l] New machines installed, killed in record 9.5
hours
To: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> >
Message-ID: <428068C7.30203(a)pobox.com <mailto:> >
Content-Type: text/plain; charset="iso-8859-1"
All the new boxen (srv11-srv30) died mysteriously while Domas and I were
trying to restart Apache after installing PHP's CURL library extension
so a proper timeout could be used on the Lucene search.
By dead, I mean "Destination Host Unreachable". They're off the network,
kaput. That _shouldn't_ happen. :) All the other machines seem just
fine; only the spanking new ones exploded, and the reason for it is not
too clear. (Freak library incompatibility -> killing machines? That
_shouldn't_ happen.)
It may be necessary for somebody to flip the switches and reboot.
-- brion vibber (brion @ <mailto:> pobox.com)
Twice today I've requested a page (in one case the search page) on
en.wikipedia and ended up at some page in what I think is
ja.wikipedia.
I can't reproduce it reliably. I can mail a screenshot, but it's not
really useful for debugging.
Perhaps we have a DNS (or squid) mixup ?
--
John Fader
All the new boxen (srv11-srv30) died mysteriously while Domas and I were
trying to restart Apache after installing PHP's CURL library extension
so a proper timeout could be used on the Lucene search.
By dead, I mean "Destination Host Unreachable". They're off the network,
kaput. That _shouldn't_ happen. :) All the other machines seem just
fine; only the spanking new ones exploded, and the reason for it is not
too clear. (Freak library incompatibility -> killing machines? That
_shouldn't_ happen.)
It may be necessary for somebody to flip the switches and reboot.
-- brion vibber (brion @ pobox.com)
Hello,
I had an idea where the computer makes (invents) the edits and the
human then approves them. The edits would technically be made on the
user's account (making the edit their responsibility) but the summary
would link to, say, [[User:Templatefixer]], which would link to a
website. The website would have a list (in a database) of articles to
be corrected (taken from a DB dump) and whenever somebody committed
one of these edits it would cross it off the list.
I noticed that if I don't send wpEditToken a preview is shown instead
of the edit being committed.
I would like to suggest that for 1.5 instead of the preview being
shown, the diff is shown, like pressing the "Show changes" button.
With this modification, I think I could make a PHP script for this,
with a top frame controlling things (on an external website) and the
main frame always showing something on Wikipedia. I have a rough idea
of the user interface.
This could be used for all those cases where a computer can create
suggestions but overall creates too many false positives. IMO this
could work very well to semi-automate some of the more tedious work,
such as stub sorting (where the article is shown and a list of stub
templates, but no physical typing is required) and fixing simple
punctuation errors.
Yours,
Tomer Chachamu (15)
I am trying to use
mysql.exe --username=xxx -password=xxx myDatabaseName < LoadFile.sql
to restore the wikipedia database from the .sql files that they provide.
However, I am running into issues restoring the data. I've been able to
restore from a file as large as 70mb but anything bigger than that seems to
fail. It doesn't give any notification of failure but if it runs overnight it
appears to just stop, and I have to manually close the command prompt window.
This is on a windows server 2003 box.
Does anybody know of any tools that can help handle larger .sql files? I am
planning on trying to load the cur database, which is a 2.5gb .sql file, and
I'd rather not have to split it into 50 different files.
Thanks!