Wikitech-l September 2010

wikitech-l@lists.wikimedia.org

102 participants
93 discussions

Unforseen linking on LocalFile->upload()

by David Raison

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Good day, This is my first post to this list and having read the last couple of posts, my question might be a bit too low-level. But I still hope that you might point me in the right direction. I've been working on this extension that generates qrcode bitmaps and displays them on a wiki page [0]. In that extension, I'm using the upload() method made available by the LocalFile object, as documented on [1]. In my specific case, the relevant code looks like this: $ft = Title::makeTitleSafe( NS_FILE, $this->_dstFileName ); $localfile = wfLocalFile( $ft ); $saveName = $localfile->getName(); $pageText = 'QrCode [...]'; $status = $localfile->upload( $tmpName, $this->_label, $pageText, File::DELETE_SOURCE, false, false, $this->_getBot() ); The extension is implemented as a parser function hooked into ParserFirstCallInit. Now, I haven't found any other explanation, so I suppose this use of the upload() method leads to a peculiar behaviour on my wiki installation, exhibited by these things: 1. QrCodes are generated for pages that do not have or transclude a {{#qrcode:}} function call, in this case properties [2,3,4]. 2. These uploaded files have properties [5] and they belong to a category, which means they get linked in the categorylinks table. A common result of this is that qrcode images turn up in i.e. semantic queries [9,10]. 3. Qrcodes are even generated for existing qrcodes [6,7]. One way to trigger than behaviour is to visit a File's page and click on the Delete link, without actually deleting the file. This leads to situations such as [8]. 4. The files get linked from several pages as this example shows [11]. None of the pages said to link to the file actually do include that file, also those pages vary (2 days ago, 14 pages linked, today only 7 link) 5. Browsing the properties of the above file [12], you can see that it got somehow mixed up with a completely different event. 6. Looking at the database, the mixup hypothesis is confirmed: SELECT page_id,page_title,cl_sortkey FROM `page` INNER JOIN `categorylinks` FORCE INDEX (cl_sortkey) ON ((cl_from = page_id)) LEFT JOIN `category` ON ((cat_title = page_title AND page_namespace = 14)) WHERE (1 = 1) AND cl_to = 'Project' ORDER BY cl_sortkey gives (among other data): page_id page_title cl_sortkey 1403 SMS2Space File:QR-Ask.png 1244 Syn2Sat File:QR-LetzHack.png 1251 ChillyChill File:QR-Syn2cat-radio-ara.png.png This behaviour occurs in both mw 1.15.5 and 1.16. I would be very grateful if someone more experienced could have a look at this situation. Maybe I'm using the upload() method in a way I should not. sincerely, David Raison [0] http://www.mediawiki.org/wiki/Extension:QrCode [1] http://svn.wikimedia.org/doc/classLocalFile.html#4b626952ae0390a7fa453a4bfe… [2] https://www.hackerspace.lu/wiki/File:QR-Is_U19.png [3] https://www.hackerspace.lu/wiki/File:QR-Has_SingleIssuePrice.png [4] https://www.hackerspace.lu/wiki/File:QR-Has_Issues.png [5] https://www.hackerspace.lu/wiki/Property:Has_SingleIssuePrice [6] https://www.hackerspace.lu/wiki/File:QR-QR-Location.png.png [7] https://www.hackerspace.lu/w/index.php?title=Special:RecentChanges&hidebots… [8] https://www.hackerspace.lu/wiki/File:QR-QR-QR-QR-Location.png.png.png.png [9] https://www.hackerspace.lu/wiki/Projects#Concluded_Projects [10] https://www.hackerspace.lu/wiki/Special:BrowseData#Q [11] https://www.hackerspace.lu/wiki/File:QR-Syn2cat.png [12] https://www.hackerspace.lu/wiki/Special:Browse/File:QR-2DSyn2cat.png - -- The Hackerspace in Luxembourg! syn2cat a.s.b.l. - Promoting social and technical innovations 11, rue du cimetière | Pavillon "Am Hueflach" L-8018 Strassen | Luxembourg http://www.hackerspace.lu - ---- mailto:david@hackerspace.lu xmpp:kwisatz@jabber.hackerspaces.org mobile: +43 650 73 63 834 | +352 691 44 23 24 ++++++++++++++++++++++++++++++++++++++++++++ Wear your geek: http://syn2cat.spreadshirt.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkydFFkACgkQYTtdUdP5zDe3ygCePV0b6tG9QZjQ8ZuytNWHQjR3 99IAn1e5mAP/k139J/yuzUPHMTTBjl85 =JAUx -----END PGP SIGNATURE-----

13 years, 7 months

Re: [Wikitech-l] [Toolserver-l] Static dump of German Wikipedia

by Platonides

Ariel T. Glenn wrote: > Στις 23-09-2010, ημέρα Πεμ, και ώρα 21:27 -0500, ο/η Q έγραψε: >>> Given the fact that static dumps have been broken for *years* now, >>> static dumps are on the bottom of WMFs priority list; I thought it >>> would be the best if I just went ahead and built something that can be >>> used (and, of course, improved). >>> >>> Marco >> >> That's what I just said. Work with them to fix it, IE: volunteer. IE: >> you fix it. >> > > Actually it's not so much that they are on the bottom of the list as > that there are two people potentially looking at them, and they are > Tomasz (who is also doing mobile) and me (and I am doing the XML dumps > rather than the HTML ones, until they are reliable and happy). > > However if you are interested in working on these, I am *very* happy to > help with suggestions, testing, feedback, etc., even while I am still > woroking on the XML dumps. Do yuu have time and interest? > > Ariel Most (all?) articles should be already parsed in memcached. I think the bottleneck would be the compression. Note however that the ParserOutput would still need postprocessing, as would ?action=render. The first thing that comes to my mind is to remove the edit links (this use case alone seems enough for implementing editsection stripping). Sadly, we can't (easily) add the edit sections after the rendering.

13 years, 7 months

Is the $_SESSION secure?

by Neil Kandalgaonkar

I have been making the assumption that in MediaWiki, the $_SESSION is hidden from the user. While applications may use the session to obtain data that's later shown to the user, there should be no way for the user to obtain the entire $_SESSION contents. So, for instance, I can hide a temporary secret there. Is that a good assumption? -- Neil Kandalgaonkar ( ) <neilk(a)wikimedia.org>

13 years, 7 months

Re: [Wikitech-l] Acceptable use of API

by Robin Ryder

Hi, Thanks for the quick answers, and for the useful link. My previous e-mail was not detailed enough; sorry about that. Let me clarify: - I don't need to crawl the entire Wikipedia, only (for example) articles in a category. ~1,000 articles would be a good start, and I definitely won't be going above ~40,000 articles. - For every article in the data set, I need to follow every interlanguage link, and get the article creation date (i.e. creation date of [[en:Brad Pitt]], [[fr:Brad Pitt]], [[it:Brad Pitt]], etc). As far as I can tell, this means that I need one query for every language link. The data are reasonably easy to get through the API. If my queries risk overloading the server, I am obviously happy to go through the toolserver (once my account gets approved!). Robin Ryder ---- Postdoctoral researcher CEREMADE - Paris Dauphine and CREST - INSEE > On 24.09.2010, 14:32 Robin wrote: > >> I would like to collect data on interlanguage links for academic research >> purposes. I really do not want to use the dumps, since I would need to >> download dumps of all language Wikipedias, which would be huge. >> I have written a script which goes through the API, but I am wondering how >> often it is acceptable for me to query the API. Assuming I do not run >> parallel queries, do I need to wait between each query? If so, how long? > > Crawling all the Wikipedias is not an easy task either. Probably, > toolserver.org would be more suitable. What data do you need, exactly? > > -- > Best regards, > Max Semenik ([[User:MaxSem]]) > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >

13 years, 7 months

Acceptable use of API

by Robin Ryder

Hi, I would like to collect data on interlanguage links for academic research purposes. I really do not want to use the dumps, since I would need to download dumps of all language Wikipedias, which would be huge. I have written a script which goes through the API, but I am wondering how often it is acceptable for me to query the API. Assuming I do not run parallel queries, do I need to wait between each query? If so, how long? Thanks in advance for your answers, Robin Ryder ---- Postdoctoral researcher CEREMADE - Paris Dauphine and CREST - INSEE

13 years, 7 months

Re: [Wikitech-l] [Toolserver-l] Static dump of German Wikipedia

by Marco Schuster

On Fri, Sep 24, 2010 at 3:44 AM, Marcin Cieslak <saper(a)saper.info> wrote: >>> John Vandenberg <jayvdb(a)gmail.com> wrote: >>> >>> http://download.wikimedia.org/dewiki/ >>> >>> Is there any problem with using them? >> >> I think they are from June 2008. > > Are they? > > http://download.wikimedia.org/dewiki/20100903/ > These are the database dumps. In order to get any HTML out of it, you need to set up either MediaWiki and/or a replacement parser; not to mention the delicate things enWP folks did with template magic, which requires setting up ParserFunctions - these might even depend on whatever version is currently running live. That's why static dumps (or ?action=render output) are the thing you need when you want to create offline versions or things like Mobipocket Wikipedia (which is my actual goal with the static dump). Marco -- VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de

13 years, 7 months

Re: [Wikitech-l] [Toolserver-l] Static dump of German Wikipedia

by John Vandenberg

On 9/24/10, Marcin Cieslak <saper(a)saper.info> wrote: > There are static dumps available here: > > http://download.wikimedia.org/dewiki/ > > Is there any problem with using them? I think they are from June 2008. A fresh static dump would be good. -- John Vandenberg

13 years, 7 months

Static dump of German Wikipedia

by Marco Schuster

Hi all, I have made a list of all the 1.9M articles in NS0 (including redirects / short pages) using the Toolserver; now I have the list I'm going to download every single of 'em (after the trial period tonight, I want to see how this works out. I'd like to begin with downloading the whole thing in 3 or 4 days, if noone objects) and then publish a static dump of it. Data collection will be on the Toolserver (/mnt/user-store/dewiki-static/articles/); the request rate will be 1 article per second and I'll download the new files once or twice a day to my home PC, so there should be no problem with the TS or Wikimedia server load. When this is finished in ~ 21-22 days, I'm going to compress them and upload them to my private server (well, if Wikimedia has an archive server, that 'd be better) as a tgz file so others can play with it. Furthermore, though I have no idea if I'll succeed, I plan on hacking a static Vector skin file which will load the articles using jQuery's excellent .load() feature, so that everyone with JS can enjoy a truly offline Wikipedia. Marco PS: When trying to invoke /w/index.php?action=render with an invalid oldid, the server returns HTTP/1.1 200 OK and an error message, but shouldn't this be a 404 or 500? -- VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de

13 years, 7 months

MediaWiki 1.17 release target (Re: UsabilityInitiative extension...)

by Rob Lanphier

On Mon, Sep 20, 2010 at 4:01 PM, MZMcBride <z(a)mzmcbride.com> wrote: > Quite a few people are under the impression that MediaWiki 1.17 will be > released in October or November of this year. I don't think there's been many public references to this, but that is more or less the timeframe many of us have discussed. There is at least one public reference here: http://www.mediawiki.org/wiki/Meetings/Release/2010-07-14 Several of us have a desire to get back on a more regular release cadence with MediaWiki releases generally, and MediaWiki 1.17 seemed like a good place to start. The goal, as I recall, was branching October 15 or thereabouts, with first beta in November, and a release sometime after that (perhaps as late as January, depending on how well we do with the first beta). We really haven't had an organized discussion of the topic since that one meeting above, but maybe this email thread can be that conversation. Nothing about the schedule is carved in stone, so now is as good a time as any to bring up any objections to that timeline. Rob

13 years, 7 months

API architecture

by Dmitriy Sintsov

Hi! I am developing my API extension. It's enumerating revisions, but in a different way (not like in ApiQueryRevisions class). Also, it can optionally create xml dumps via WikiExporter, like API action=query&export&exportnowrap (so I need to change output printer to raw mode sometimes). Because it's enumerating the lists (however not titles) I've choosed "list in non-generator mode" (derived from ApiQueryBase). However, after initial development I've figured out that I cannot change default printer in such case, because it's not being called at all. The reason is explained in following message by Roan Kattouw: https://bugzilla.wikimedia.org/show_bug.cgi?id=25232 > Query submodules can be called in conjunction with other query submodules > In this case, if your module would switch to a custom printer, the others would quite likely freak out. But, I don't need to query in conjunction. Ok, I've derived from ApiBase class (my own API action). Then, it starts to fail on huge amount of generally useful methods, like SQL query building methods (addJoinConds, addWhereRange) and list continuation methods (setContinueEnumParameter), because these are not defined in ApiBase. I understand that not every ApiBase derived class needs these, but many could have. Why not make inheritance chain like this: ApiBase -> ApiListBase -> ApiQueryBase where ApiListBase would have at least these nice SQL query buliding methods but also the possibility to override the default printer? Why these methods are limited to action=query lists and generators only? Dmitriy

13 years, 7 months

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l September 2010