Toolserver-l March 2006

toolserver-l@lists.wikimedia.org

14 participants
7 discussions

by Jakob Voss

Hello, I'm both user of the toolserver and member of the board of Wikimedia Germany e.V. Wikimedia Germany is the provider of the server but we have no knowledge nor time to maintain it. So first of all a big thanks to all of you who keep the toolserver and the database running for various tools that support the Wikimedia projects - Gregory summarized very well the profit of it. I'm afraid Wikimedia Germany cannot easily bring replication, OAI-access and other stuff, I also doubt that communication via Toolserver users -> Wikimedia Germany -> Wikimedia Foundation -> Wikimedia technical admins is easier than endurance at wikitech-l and IRC. But the e.V. may declare "official partnerships" (this can make differences in the real world where people try to get funding) and we can buy hardware. I hope that self-organization still works although it's often frustrating. Maybe we can also manage the account policy in a self-organized way: it was planned to revise all accounts in April. I'd like to switch off inactive accounts to limit the users to people who are really working with the toolserver - what do you think about it? Greetings, Jakob

18 years, 1 month

Some comments

by Tim Starling

I found out about this list a few days ago, and I've read back through some of the archives. I have a few comments. Why are the list archives private? Why isn't it listed on the mail.wikipedia.org index page? Why isn't it in gmane? Why have I never heard of it before? Gregory Maxwell has been whinging that nobody is listening to him on a list that nobody can read. Kate can be a bit secretive at times, and this was at least at one time her pet project, but maybe now that she seems to have abandoned it, then it's time to change the structure. Neither the e.V. nor Kate made any particular attempt to involve the other Wikimedia system administrators in this project from its conception. I was certainly sceptical about zedler's value as a tool server compared to the use we could have made of it as part of the core cluster. I've now heard about one project that I'm interested in, and I have an open mind about the rest, but you still have to make the case. Specifically: how does your project benefit Wikipedia? Why should I support it? Daniel Kinzler wrote: > Yesterday, Kate told me that the problem with replication from the Asian > cluster is that mysql can only connect to one replication master. I have > googeled a bit, and it appears that that is not true (at least for MySQL > 5.1): http://dev.mysql.com/doc/refman/5.1/en/replication-intro.html says: > > Multiple-master replication is possible, but raises issues not present > in single-master replication. See Section 6.15, “Auto-Increment in > Multiple-Master Replication”. Multiple-master replication in this context could more aptly be called circular replication. This is where you have say 3 servers, A replicating B, B replicating C, C replicating A. Then you can write to any of the three servers, and the writes will be propagated to the other 2 servers. This is quite useless for the toolserver, where we have 5 masters which will never replicate from each other in a circle. It should be possible to set up 5 MySQL instances and have each of them replicating from a different master. Is anyone volunteering to set up those instances? Maybe we need to give root access to someone who actually cares about this stuff. It would be easier if we had a VLAN, so that we didn't have to set up 5 ssh tunnels. Does anyone know anything about VLANs? Does anyone care enough about this project to research it? Regarding Daniel's WikiProxy: I have reviewed the code, and I have the following comments: * use curl, not file_get_contents() * With curl you can set a short timeout, with file_get_contents() it will be 3 minutes. Set a timeout of a few seconds, and then use exponential backoff. Requests get lost sometimes, retries help. * Tell curl to proxy the request via rr.pmtpa.wikimedia.org:80. This will skip the knams squid cluster and save a few milliseconds For applications using it: if it's too slow, use a few parallel threads. Anything up to about 5 requests per second should be OK. Who here needs more than 5 requests per second? Who needs a latency of less than a few hundred milliseconds? What exactly do you want full text replication for? -- Tim Starling

18 years, 1 month

Troubles with reading Articles

by Leo Büttiker

Hi all, For a toolserver-project I will read all Wikipedia (pwiki_de) articles and parse them for geoinformation. After some troubles I've fixed now nearly all bugs, but I have still some troubles with opening the articles. I open the article with the help of the mediawiki functions in the following way: $title = Title::newFromID($page_id); $art = new Article($title); $text = $art->getContent(true); For some articles this work quite well, but for some it doesn't return text. I think there's a problem with the compresion of the database (in a local enviroment with a wikipedia dump it works), but I could't find out a workaround. Any suggestions? Thanks Leo

18 years, 1 month

Re: [Toolserver-l] WikiProxy

by Daniel Kinzler

Arg! Sorry, folx! do *NOT* use the script at <http://tools.wikimedia.de/~daniel/foo/WikiProxy.php>! In fact never use anything having "foo" or "test" or "play" in the path, that stuff is bound to be frequently broken. The "good" version of my tools is in the WikiSense directory, so use <http://tools.wikimedia.de/~daniel/WikiSense/WikiProxy.php> sorry for the confusion! The "live" version of WikiProxy does not yet check for IP and access token. It will start to do that in a few days (when I next update the WikiSense directory). -- Daniel -- Homepage: http://brightbyte.de

18 years, 1 month

Access to ipblocks table.

by Gregory Maxwell

Could we get ipblocks table visible on toolserver minus the ipb_address column? This column needs to be omitted because autoblock IPs are stored in it. Without this column the table contains no information which isn't available to the general public, as far as I can tell. Ideally we'd keep that column and use a view which nulls it for rows where ipb_auto is 1. However I understand that views in mysql 5 are still pretty limited and we lose indexes... For my applications I'd rather lose the ability to see IP blocks entirely than lose indexes. Thanks.

18 years, 1 month

Yahoo-Cluster and Replication

by Daniel Kinzler

Hi all apperently, we don't have up to date copied of the wikis on the asia cluster (Japanese, etc). It seems like some stale copied of those is replicated. The last change to the japanese wikipedia is from 2005-10-30 :( this kind of sucks for stuff like checkusage - people are relying on up do date databases... Can this be fixed soon, by setting up direct replication from the asia cluster? Regards, Daniel -- Homepage: http://brightbyte.de

18 years, 1 month

Text access on toolserver?

by Gregory Maxwell

What is the status of getting text access back on Toolserver? Is there anything I can do to make it happen? The lack of text access is killing most of my projects other than toy statistics gathering.

18 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Toolserver-l March 2006