-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Good day,
This is my first post to this list and having read the last couple of
posts, my question might be a bit too low-level. But I still hope that
you might point me in the right direction.
I've been working on this extension that generates qrcode bitmaps and
displays them on a wiki page [0].
In that extension, I'm using the upload() method made available by the
LocalFile object, as documented on [1]. In my specific case, the
relevant code looks like this:
$ft = Title::makeTitleSafe( NS_FILE, $this->_dstFileName );
$localfile = wfLocalFile( $ft );
$saveName = $localfile->getName();
$pageText = 'QrCode [...]';
$status = $localfile->upload( $tmpName, $this->_label, $pageText,
File::DELETE_SOURCE, false, false, $this->_getBot() );
The extension is implemented as a parser function hooked into
ParserFirstCallInit.
Now, I haven't found any other explanation, so I suppose this use of the
upload() method leads to a peculiar behaviour on my wiki installation,
exhibited by these things:
1. QrCodes are generated for pages that do not have or transclude a
{{#qrcode:}} function call, in this case properties [2,3,4].
2. These uploaded files have properties [5] and they belong to a
category, which means they get linked in the categorylinks table. A
common result of this is that qrcode images turn up in i.e. semantic
queries [9,10].
3. Qrcodes are even generated for existing qrcodes [6,7]. One way to
trigger than behaviour is to visit a File's page and click on the Delete
link, without actually deleting the file. This leads to situations such
as [8].
4. The files get linked from several pages as this example shows [11].
None of the pages said to link to the file actually do include that
file, also those pages vary (2 days ago, 14 pages linked, today only 7 link)
5. Browsing the properties of the above file [12], you can see that it
got somehow mixed up with a completely different event.
6. Looking at the database, the mixup hypothesis is confirmed:
SELECT page_id,page_title,cl_sortkey FROM `page` INNER JOIN
`categorylinks` FORCE INDEX (cl_sortkey) ON ((cl_from = page_id)) LEFT
JOIN `category` ON ((cat_title = page_title AND page_namespace = 14))
WHERE (1 = 1) AND cl_to = 'Project' ORDER BY cl_sortkey
gives (among other data):
page_id page_title cl_sortkey
1403 SMS2Space File:QR-Ask.png
1244 Syn2Sat File:QR-LetzHack.png
1251 ChillyChill File:QR-Syn2cat-radio-ara.png.png
This behaviour occurs in both mw 1.15.5 and 1.16. I would be very
grateful if someone more experienced could have a look at this
situation. Maybe I'm using the upload() method in a way I should not.
sincerely,
David Raison
[0] http://www.mediawiki.org/wiki/Extension:QrCode
[1]
http://svn.wikimedia.org/doc/classLocalFile.html#4b626952ae0390a7fa453a4bfe…
[2] https://www.hackerspace.lu/wiki/File:QR-Is_U19.png
[3] https://www.hackerspace.lu/wiki/File:QR-Has_SingleIssuePrice.png
[4] https://www.hackerspace.lu/wiki/File:QR-Has_Issues.png
[5] https://www.hackerspace.lu/wiki/Property:Has_SingleIssuePrice
[6] https://www.hackerspace.lu/wiki/File:QR-QR-Location.png.png
[7]
https://www.hackerspace.lu/w/index.php?title=Special:RecentChanges&hidebots…
[8]
https://www.hackerspace.lu/wiki/File:QR-QR-QR-QR-Location.png.png.png.png
[9] https://www.hackerspace.lu/wiki/Projects#Concluded_Projects
[10] https://www.hackerspace.lu/wiki/Special:BrowseData#Q
[11] https://www.hackerspace.lu/wiki/File:QR-Syn2cat.png
[12] https://www.hackerspace.lu/wiki/Special:Browse/File:QR-2DSyn2cat.png
- --
The Hackerspace in Luxembourg!
syn2cat a.s.b.l. - Promoting social and technical innovations
11, rue du cimetière | Pavillon "Am Hueflach"
L-8018 Strassen | Luxembourg
http://www.hackerspace.lu
- ----
mailto:david@hackerspace.lu
xmpp:kwisatz@jabber.hackerspaces.org
mobile: +43 650 73 63 834 | +352 691 44 23 24
++++++++++++++++++++++++++++++++++++++++++++
Wear your geek: http://syn2cat.spreadshirt.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkydFFkACgkQYTtdUdP5zDe3ygCePV0b6tG9QZjQ8ZuytNWHQjR3
99IAn1e5mAP/k139J/yuzUPHMTTBjl85
=JAUx
-----END PGP SIGNATURE-----
Ariel T. Glenn wrote:
> Στις 23-09-2010, ημέρα Πεμ, και ώρα 21:27 -0500, ο/η Q έγραψε:
>>> Given the fact that static dumps have been broken for *years* now,
>>> static dumps are on the bottom of WMFs priority list; I thought it
>>> would be the best if I just went ahead and built something that can be
>>> used (and, of course, improved).
>>>
>>> Marco
>>
>> That's what I just said. Work with them to fix it, IE: volunteer. IE:
>> you fix it.
>>
>
> Actually it's not so much that they are on the bottom of the list as
> that there are two people potentially looking at them, and they are
> Tomasz (who is also doing mobile) and me (and I am doing the XML dumps
> rather than the HTML ones, until they are reliable and happy).
>
> However if you are interested in working on these, I am *very* happy to
> help with suggestions, testing, feedback, etc., even while I am still
> woroking on the XML dumps. Do yuu have time and interest?
>
> Ariel
Most (all?) articles should be already parsed in memcached. I think the
bottleneck would be the compression.
Note however that the ParserOutput would still need postprocessing, as
would ?action=render. The first thing that comes to my mind is to remove
the edit links (this use case alone seems enough for implementing
editsection stripping). Sadly, we can't (easily) add the edit sections
after the rendering.
I have been making the assumption that in MediaWiki, the $_SESSION is
hidden from the
user. While applications may use the session to obtain data that's later
shown to the user,
there should be no way for the user to obtain the entire $_SESSION
contents.
So, for instance, I can hide a temporary secret there.
Is that a good assumption?
--
Neil Kandalgaonkar ( ) <neilk(a)wikimedia.org>
Hi,
Thanks for the quick answers, and for the useful link.
My previous e-mail was not detailed enough; sorry about that. Let me
clarify:
- I don't need to crawl the entire Wikipedia, only (for example) articles in
a category. ~1,000 articles would be a good start, and I definitely won't be
going above ~40,000 articles.
- For every article in the data set, I need to follow every interlanguage
link, and get the article creation date (i.e. creation date of [[en:Brad
Pitt]], [[fr:Brad Pitt]], [[it:Brad Pitt]], etc). As far as I can tell, this
means that I need one query for every language link.
The data are reasonably easy to get through the API. If my queries risk
overloading the server, I am obviously happy to go through the toolserver
(once my account gets approved!).
Robin Ryder
----
Postdoctoral researcher
CEREMADE - Paris Dauphine and CREST - INSEE
> On 24.09.2010, 14:32 Robin wrote:
>
>> I would like to collect data on interlanguage links for academic research
>> purposes. I really do not want to use the dumps, since I would need to
>> download dumps of all language Wikipedias, which would be huge.
>> I have written a script which goes through the API, but I am wondering
how
>> often it is acceptable for me to query the API. Assuming I do not run
>> parallel queries, do I need to wait between each query? If so, how long?
>
> Crawling all the Wikipedias is not an easy task either. Probably,
> toolserver.org would be more suitable. What data do you need, exactly?
>
> --
> Best regards,
> Max Semenik ([[User:MaxSem]])
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
Hi,
I would like to collect data on interlanguage links for academic research
purposes. I really do not want to use the dumps, since I would need to
download dumps of all language Wikipedias, which would be huge.
I have written a script which goes through the API, but I am wondering how
often it is acceptable for me to query the API. Assuming I do not run
parallel queries, do I need to wait between each query? If so, how long?
Thanks in advance for your answers,
Robin Ryder
----
Postdoctoral researcher
CEREMADE - Paris Dauphine and CREST - INSEE
On Fri, Sep 24, 2010 at 3:44 AM, Marcin Cieslak <saper(a)saper.info> wrote:
>>> John Vandenberg <jayvdb(a)gmail.com> wrote:
>>>
>>> http://download.wikimedia.org/dewiki/
>>>
>>> Is there any problem with using them?
>>
>> I think they are from June 2008.
>
> Are they?
>
> http://download.wikimedia.org/dewiki/20100903/
>
These are the database dumps. In order to get any HTML out of it, you
need to set up either MediaWiki and/or a replacement parser; not to
mention the delicate things enWP folks did with template magic, which
requires setting up ParserFunctions - these might even depend on
whatever version is currently running live.
That's why static dumps (or ?action=render output) are the thing you
need when you want to create offline versions or things like
Mobipocket Wikipedia (which is my actual goal with the static dump).
Marco
--
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de
On 9/24/10, Marcin Cieslak <saper(a)saper.info> wrote:
> There are static dumps available here:
>
> http://download.wikimedia.org/dewiki/
>
> Is there any problem with using them?
I think they are from June 2008.
A fresh static dump would be good.
--
John Vandenberg
Hi all,
I have made a list of all the 1.9M articles in NS0 (including
redirects / short pages) using the Toolserver; now I have the list I'm
going to download every single of 'em (after the trial period tonight,
I want to see how this works out. I'd like to begin with downloading
the whole thing in 3 or 4 days, if noone objects) and then publish a
static dump of it. Data collection will be on the Toolserver
(/mnt/user-store/dewiki-static/articles/); the request rate will be 1
article per second and I'll download the new files once or twice a day
to my home PC, so there should be no problem with the TS or Wikimedia
server load.
When this is finished in ~ 21-22 days, I'm going to compress them and
upload them to my private server (well, if Wikimedia has an archive
server, that 'd be better) as a tgz file so others can play with it.
Furthermore, though I have no idea if I'll succeed, I plan on hacking
a static Vector skin file which will load the articles using jQuery's
excellent .load() feature, so that everyone with JS can enjoy a truly
offline Wikipedia.
Marco
PS: When trying to invoke /w/index.php?action=render with an invalid
oldid, the server returns HTTP/1.1 200 OK and an error message, but
shouldn't this be a 404 or 500?
--
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de
On Mon, Sep 20, 2010 at 4:01 PM, MZMcBride <z(a)mzmcbride.com> wrote:
> Quite a few people are under the impression that MediaWiki 1.17 will be
> released in October or November of this year.
I don't think there's been many public references to this, but that is
more or less the timeframe many of us have discussed. There is at
least one public reference here:
http://www.mediawiki.org/wiki/Meetings/Release/2010-07-14
Several of us have a desire to get back on a more regular release
cadence with MediaWiki releases generally, and MediaWiki 1.17 seemed
like a good place to start.
The goal, as I recall, was branching October 15 or thereabouts, with
first beta in November, and a release sometime after that (perhaps as
late as January, depending on how well we do with the first beta). We
really haven't had an organized discussion of the topic since that one
meeting above, but maybe this email thread can be that conversation.
Nothing about the schedule is carved in stone, so now is as good a
time as any to bring up any objections to that timeline.
Rob
Hi!
I am developing my API extension. It's enumerating revisions, but in a
different way (not like in ApiQueryRevisions class). Also, it can
optionally create xml dumps via WikiExporter, like API
action=query&export&exportnowrap (so I need to change output printer to
raw mode sometimes). Because it's enumerating the lists (however not
titles) I've choosed "list in non-generator mode" (derived from
ApiQueryBase). However, after initial development I've figured out that
I cannot change default printer in such case, because it's not being
called at all. The reason is explained in following message by Roan
Kattouw:
https://bugzilla.wikimedia.org/show_bug.cgi?id=25232
> Query submodules can be called in conjunction with other query
submodules
> In this case, if your module would switch to a custom printer, the
others would quite likely freak out.
But, I don't need to query in conjunction. Ok, I've derived from ApiBase
class (my own API action). Then, it starts to fail on huge amount of
generally useful methods, like SQL query building methods (addJoinConds,
addWhereRange) and list continuation methods (setContinueEnumParameter),
because these are not defined in ApiBase.
I understand that not every ApiBase derived class needs these, but many
could have. Why not make inheritance chain like this:
ApiBase -> ApiListBase -> ApiQueryBase
where ApiListBase would have at least these nice SQL query buliding
methods but also the possibility to override the default printer? Why
these methods are limited to action=query lists and generators only?
Dmitriy