[Labs-l] Backlinks counter for Wikipedia articles?

Navino Evans navino at histropedia.com
Tue Sep 9 15:05:13 UTC 2014


That's great, I'll get that set up from this end and post another message
with the details when it's ready.

Sincere thanks again :-)
On 9 Sep 2014 16:01, "John" <phoenixoverride at gmail.com> wrote:

> Plain text file would be best, one article title per row
>
>
> On Tue, Sep 9, 2014 at 10:43 AM, Navino Evans <navino at histropedia.com>
> wrote:
>
>> Most definitely! That would be absolutely fantastic.
>>
>> What format of list would be most useful for you to work with?
>> On 9 Sep 2014 15:38, "John" <phoenixoverride at gmail.com> wrote:
>>
>>> Its not that big of a deal, once i set the system up. is it possible to
>>> have you post the list in a static location on your webserver? I could then
>>> just have the bot grab and use that list.
>>>
>>>
>>> On Tue, Sep 9, 2014 at 10:35 AM, Navino Evans <navino at histropedia.com>
>>> wrote:
>>>
>>>> That's great to know, thank you.
>>>>
>>>> We'll make sure we only use the API within that limit - basically just
>>>> for individual calls when a user adds a new event to our database.
>>>>
>>>> For the bulk processing, we would need to update the backlinks
>>>> information as a monthly maintenance task, so I wouldn't want to trouble
>>>> you with this each time.
>>>>
>>>> Would you rather we stick with data dump processing for the large scale
>>>> stuff?
>>>>
>>>>
>>>>
>>>> On 9 Sep 2014 15:05, "John" <phoenixoverride at gmail.com> wrote:
>>>>
>>>>> If you want a report on that many pages drop me a list of those titles
>>>>> and and I can write a report for you given that volume of affected pages.
>>>>>
>>>>> I would say 1-2 seconds between quires should be reasonable for a
>>>>> moderate volume of quires. Any large scale request I will do server side
>>>>> and avoid hammering the web-servers for something that is better batched.
>>>>>
>>>>>
>>>>> On Tue, Sep 9, 2014 at 9:58 AM, Navino Evans <navino at histropedia.com>
>>>>> wrote:
>>>>>
>>>>>> Once again, a huge thank you for taking the time to do this John -
>>>>>> That's exactly what I was looking for!  - the helpfulness of this community
>>>>>> never ceases to amaze me :)
>>>>>>
>>>>>> Hopefully I haven't initiated a journey down the rabbit hole into a
>>>>>> fully fledged muliti-language counting machine ;)
>>>>>>
>>>>>>
>>>>>> Can I just ask what the limit of reasonable use would be for making
>>>>>> API calls to this new tool? (e.g. number of calls per day)
>>>>>>
>>>>>> It would be incredibly useful if we could use it to update the events
>>>>>> in our database once a month (we are using it to rank historical events by
>>>>>> 'importance'), but we are already have approximately 1.5 million events so
>>>>>> am aware this may be way beyond what would be acceptable.
>>>>>>
>>>>>> On Tue, Sep 9, 2014 at 2:56 PM, John <phoenixoverride at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That's doable, however it will require a little more time as I need
>>>>>>> to unearth some old code to handle multi-projects/languages
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 9, 2014 at 9:51 AM, Jan Ainali <jan.ainali at wikimedia.se>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Awesome John!
>>>>>>>>
>>>>>>>> Now I only wish that one could specify language code also ;)
>>>>>>>>
>>>>>>>>
>>>>>>>> *Med vänliga hälsningar,Jan Ainali*
>>>>>>>>
>>>>>>>> Verksamhetschef, Wikimedia Sverige
>>>>>>>> <http://se.wikimedia.org/wiki/Huvudsida>
>>>>>>>> 0729 - 67 29 48
>>>>>>>>
>>>>>>>>
>>>>>>>> *Tänk dig en värld där varje människa har fri tillgång till
>>>>>>>> mänsklighetens samlade kunskap. Det är det vi gör.*
>>>>>>>> Bli medlem. <http://blimedlem.wikimedia.se>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-09-09 15:34 GMT+02:00 John <phoenixoverride at gmail.com>:
>>>>>>>>
>>>>>>>>> Per request, its no frills but what you what you asked for:
>>>>>>>>> http://tools.wmflabs.org/betacommand-dev/cgi-bin/backlinks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 9, 2014 at 8:32 AM, Navino Evans <
>>>>>>>>> navino at histropedia.com> wrote:
>>>>>>>>>
>>>>>>>>>> That is fantastic news... I'm incredibly grateful for the help
>>>>>>>>>> and advice.
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 9, 2014 at 1:27 PM, John <phoenixoverride at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Given the overhead of the API and that he only needs a count
>>>>>>>>>>> getting that info should be fairly easy via a python cgi wrapper around an
>>>>>>>>>>> sql query.
>>>>>>>>>>>
>>>>>>>>>>> The only thing that I cannot do is #3 since the software does
>>>>>>>>>>> not differentiate between links in templates and links not in templates.
>>>>>>>>>>> Its a requested feature for years now.
>>>>>>>>>>>
>>>>>>>>>>> Give me a few hours and ill get you the tool you want. This
>>>>>>>>>>> should be less than 30 minutes work
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 9, 2014 at 7:55 AM, Jan Ainali <
>>>>>>>>>>> jan.ainali at wikimedia.se> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Related tip: In the API you can get a list of backlinks (but
>>>>>>>>>>>> you have to count them yourself) from the main namespace including all
>>>>>>>>>>>> redirects by a query like this:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://en.wikipedia.org/w/api.php?action=query&list=backlinks&format=json&bltitle=Example&blnamespace=0&blfilterredir=all&bllimit=250&blredirect=
>>>>>>>>>>>>
>>>>>>>>>>>> More info at: https://www.mediawiki.org/wiki/API:Backlinks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Med vänliga hälsningar,Jan Ainali*
>>>>>>>>>>>>
>>>>>>>>>>>> Verksamhetschef, Wikimedia Sverige
>>>>>>>>>>>> <http://se.wikimedia.org/wiki/Huvudsida>
>>>>>>>>>>>> 0729 - 67 29 48
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Tänk dig en värld där varje människa har fri tillgång till
>>>>>>>>>>>> mänsklighetens samlade kunskap. Det är det vi gör.*
>>>>>>>>>>>> Bli medlem. <http://blimedlem.wikimedia.se>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-09-09 13:41 GMT+02:00 Navino Evans <navino at histropedia.com
>>>>>>>>>>>> >:
>>>>>>>>>>>>
>>>>>>>>>>>>> Wow! That would be awesome :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> The API we are looking for can be as simple as sending a GET
>>>>>>>>>>>>> request to a url (
>>>>>>>>>>>>> http://www.somewhere.com/api/count?t=wikipedia_title_goes_here),
>>>>>>>>>>>>>  returning a number in "text/plain" format.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The actual count that we're interested is for English
>>>>>>>>>>>>> Wikipedia only, and would ideally include the following, all added up into
>>>>>>>>>>>>> a single number:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) All links from articles in Main Namespace only  (for our
>>>>>>>>>>>>> purpose it would be better to not include links from User pages, Talk pages
>>>>>>>>>>>>> etc if possible)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Including links from Redirect pages (e.g. counting a link
>>>>>>>>>>>>> from "Michel Jackson" redirect as part of the count from the article
>>>>>>>>>>>>> "Michael Jackson")
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3) Excluding links that are within a template transcluded in
>>>>>>>>>>>>> an article (so we don't need to count the links inside Navboxes within an
>>>>>>>>>>>>> article for example)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4) For our purpose, it doesn't really matter whether
>>>>>>>>>>>>> transclusions of the actual page that is called are included in the count
>>>>>>>>>>>>> (we generally won't be using it for checking templates, timeline and list
>>>>>>>>>>>>> articles).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just to give the full picture for this request -  my use of
>>>>>>>>>>>>> this tool will be for a company (www.histropedia.com), so I
>>>>>>>>>>>>> wouldn't want to take up your time with this unless it's something you feel
>>>>>>>>>>>>> should be available for wider use. My plan was to get the developer working
>>>>>>>>>>>>> on our site to make this tool for the community if it didn't exist
>>>>>>>>>>>>> somewhere, but we would be reliant on datadumps so could not get live
>>>>>>>>>>>>> information (which would be incredibly useful for us, and I hope many
>>>>>>>>>>>>> others).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Sep 8, 2014 at 8:10 PM, John <
>>>>>>>>>>>>> phoenixoverride at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> What numbers/data do you want? I can whip up a replacement
>>>>>>>>>>>>>> for it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Monday, September 8, 2014, Navino Evans <
>>>>>>>>>>>>>> navino at histropedia.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all, does anyone know if there is a tool currently
>>>>>>>>>>>>>>> available for counting backlinks to Wikipedia articles via an API? I have
>>>>>>>>>>>>>>> been using this tool
>>>>>>>>>>>>>>> http://dispenser.homenet.org/~dispenser/cgi-bin/backlinkscount.py
>>>>>>>>>>>>>>> - but it seems to have finally gone offline completely following some
>>>>>>>>>>>>>>> recent controversy with user:Dispenser.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any advice much appreciated!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Navino
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Labs-l mailing list
>>>>>>>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> ___________________________
>>>>>>>>>>>>>
>>>>>>>>>>>>> Histropedia
>>>>>>>>>>>>> The Timeline for all of History
>>>>>>>>>>>>> www.histropedia.com
>>>>>>>>>>>>>
>>>>>>>>>>>>> Follow us on:
>>>>>>>>>>>>> Twitter <https://twitter.com/Histropedia>     Facebo
>>>>>>>>>>>>> <https://www.facebook.com/Histropedia>ok
>>>>>>>>>>>>> <https://www.facebook.com/Histropedia>     Google +
>>>>>>>>>>>>> <https://plus.google.com/u/0/b/104484373317792180682/104484373317792180682/posts>
>>>>>>>>>>>>>    L <http://www.linkedin.com/company/histropedia-ltd>inke
>>>>>>>>>>>>> <http://www.linkedin.com/company/histropedia-ltd>dIn
>>>>>>>>>>>>> <http://www.linkedin.com/company/histropedia-ltd>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Labs-l mailing list
>>>>>>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Labs-l mailing list
>>>>>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Labs-l mailing list
>>>>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> ___________________________
>>>>>>>>>>
>>>>>>>>>> Histropedia
>>>>>>>>>> The Timeline for all of History
>>>>>>>>>> www.histropedia.com
>>>>>>>>>>
>>>>>>>>>> Follow us on:
>>>>>>>>>> Twitter <https://twitter.com/Histropedia>     Facebo
>>>>>>>>>> <https://www.facebook.com/Histropedia>ok
>>>>>>>>>> <https://www.facebook.com/Histropedia>     Google +
>>>>>>>>>> <https://plus.google.com/u/0/b/104484373317792180682/104484373317792180682/posts>
>>>>>>>>>>    L <http://www.linkedin.com/company/histropedia-ltd>inke
>>>>>>>>>> <http://www.linkedin.com/company/histropedia-ltd>dIn
>>>>>>>>>> <http://www.linkedin.com/company/histropedia-ltd>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Labs-l mailing list
>>>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Labs-l mailing list
>>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Labs-l mailing list
>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Labs-l mailing list
>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ___________________________
>>>>>>
>>>>>> Histropedia
>>>>>> The Timeline for all of History
>>>>>> www.histropedia.com
>>>>>>
>>>>>> Follow us on:
>>>>>> Twitter <https://twitter.com/Histropedia>     Facebo
>>>>>> <https://www.facebook.com/Histropedia>ok
>>>>>> <https://www.facebook.com/Histropedia>     Google +
>>>>>> <https://plus.google.com/u/0/b/104484373317792180682/104484373317792180682/posts>
>>>>>>    L <http://www.linkedin.com/company/histropedia-ltd>inke
>>>>>> <http://www.linkedin.com/company/histropedia-ltd>dIn
>>>>>> <http://www.linkedin.com/company/histropedia-ltd>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Labs-l mailing list
>>>>>> Labs-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Labs-l mailing list
>>>>> Labs-l at lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20140909/d59f0d71/attachment-0001.html>


More information about the Labs-l mailing list