Hello,
I'm both user of the toolserver and member of the board of Wikimedia
Germany e.V. Wikimedia Germany is the provider of the server but we have
no knowledge nor time to maintain it. So first of all a big thanks to
all of you who keep the toolserver and the database running for various
tools that support the Wikimedia projects - Gregory summarized very well
the profit of it. I'm afraid Wikimedia Germany cannot easily bring
replication, OAI-access and other stuff, I also doubt that communication
via Toolserver users -> Wikimedia Germany -> Wikimedia Foundation ->
Wikimedia technical admins is easier than endurance at wikitech-l and
IRC. But the e.V. may declare "official partnerships" (this can make
differences in the real world where people try to get funding) and we
can buy hardware. I hope that self-organization still works although
it's often frustrating. Maybe we can also manage the account policy in a
self-organized way: it was planned to revise all accounts in April. I'd
like to switch off inactive accounts to limit the users to people who
are really working with the toolserver - what do you think about it?
Greetings,
Jakob
I found out about this list a few days ago, and I've read back through
some of the archives. I have a few comments.
Why are the list archives private? Why isn't it listed on the
mail.wikipedia.org index page? Why isn't it in gmane? Why have I never
heard of it before? Gregory Maxwell has been whinging that nobody is
listening to him on a list that nobody can read.
Kate can be a bit secretive at times, and this was at least at one time
her pet project, but maybe now that she seems to have abandoned it, then
it's time to change the structure.
Neither the e.V. nor Kate made any particular attempt to involve the
other Wikimedia system administrators in this project from its
conception. I was certainly sceptical about zedler's value as a tool
server compared to the use we could have made of it as part of the core
cluster. I've now heard about one project that I'm interested in, and I
have an open mind about the rest, but you still have to make the case.
Specifically: how does your project benefit Wikipedia? Why should I
support it?
Daniel Kinzler wrote:
> Yesterday, Kate told me that the problem with replication from the Asian
> cluster is that mysql can only connect to one replication master. I have
> googeled a bit, and it appears that that is not true (at least for MySQL
> 5.1): http://dev.mysql.com/doc/refman/5.1/en/replication-intro.html says:
>
> Multiple-master replication is possible, but raises issues not present
> in single-master replication. See Section 6.15, “Auto-Increment in
> Multiple-Master Replication”.
Multiple-master replication in this context could more aptly be called
circular replication. This is where you have say 3 servers, A
replicating B, B replicating C, C replicating A. Then you can write to
any of the three servers, and the writes will be propagated to the other
2 servers. This is quite useless for the toolserver, where we have 5
masters which will never replicate from each other in a circle.
It should be possible to set up 5 MySQL instances and have each of them
replicating from a different master. Is anyone volunteering to set up
those instances? Maybe we need to give root access to someone who
actually cares about this stuff.
It would be easier if we had a VLAN, so that we didn't have to set up 5
ssh tunnels. Does anyone know anything about VLANs? Does anyone care
enough about this project to research it?
Regarding Daniel's WikiProxy: I have reviewed the code, and I have the
following comments:
* use curl, not file_get_contents()
* With curl you can set a short timeout, with file_get_contents() it
will be 3 minutes. Set a timeout of a few seconds, and then use
exponential backoff. Requests get lost sometimes, retries help.
* Tell curl to proxy the request via rr.pmtpa.wikimedia.org:80. This
will skip the knams squid cluster and save a few milliseconds
For applications using it: if it's too slow, use a few parallel threads.
Anything up to about 5 requests per second should be OK.
Who here needs more than 5 requests per second? Who needs a latency of
less than a few hundred milliseconds? What exactly do you want full text
replication for?
-- Tim Starling
Hi all,
For a toolserver-project I will read all Wikipedia (pwiki_de) articles and
parse them for geoinformation. After some troubles I've fixed now nearly all
bugs, but I have still some troubles with opening the articles.
I open the article with the help of the mediawiki functions in the following
way:
$title = Title::newFromID($page_id);
$art = new Article($title);
$text = $art->getContent(true);
For some articles this work quite well, but for some it doesn't return text. I
think there's a problem with the compresion of the database (in a local
enviroment with a wikipedia dump it works), but I could't find out a
workaround. Any suggestions?
Thanks
Leo
Arg!
Sorry, folx! do *NOT* use the script at
<http://tools.wikimedia.de/~daniel/foo/WikiProxy.php>! In fact never use
anything having "foo" or "test" or "play" in the path, that stuff is
bound to be frequently broken. The "good" version of my tools is in the
WikiSense directory, so use
<http://tools.wikimedia.de/~daniel/WikiSense/WikiProxy.php>
sorry for the confusion!
The "live" version of WikiProxy does not yet check for IP and access
token. It will start to do that in a few days (when I next update the
WikiSense directory).
-- Daniel
--
Homepage: http://brightbyte.de
Could we get ipblocks table visible on toolserver minus the ipb_address column?
This column needs to be omitted because autoblock IPs are stored in
it. Without this column the table contains no information which isn't
available to the general public, as far as I can tell.
Ideally we'd keep that column and use a view which nulls it for rows
where ipb_auto is 1. However I understand that views in mysql 5 are
still pretty limited and we lose indexes... For my applications I'd
rather lose the ability to see IP blocks entirely than lose indexes.
Thanks.
Hi all
apperently, we don't have up to date copied of the wikis on the asia
cluster (Japanese, etc). It seems like some stale copied of those is
replicated. The last change to the japanese wikipedia is from 2005-10-30 :(
this kind of sucks for stuff like checkusage - people are relying on up
do date databases... Can this be fixed soon, by setting up direct
replication from the asia cluster?
Regards,
Daniel
--
Homepage: http://brightbyte.de
What is the status of getting text access back on Toolserver?
Is there anything I can do to make it happen?
The lack of text access is killing most of my projects other than toy
statistics gathering.