[Labs-l] Processing dumps with Wikimedia Utilities

Emilio J. Rodríguez-Posada emijrp at gmail.com
Sun May 18 15:15:14 UTC 2014


Hi again;

I have created the virtualenv for Python3, installed mediawiki-utilities,
etc. I can launch my script in that virtualenv and works fine, but when I
do 'jsub', the destination machine obviously doesn't have that module:

from mw.xml_dump import Iterator
ImportError: No module named mw.xml_dump

How can I launch a jsub using my virtualenv?

Thanks



2014-05-16 18:19 GMT+02:00 Emilio J. Rodríguez-Posada <emijrp at gmail.com>:

> 2014-05-13 15:06 GMT+02:00 Aaron Halfaker <aaron.halfaker at gmail.com>:
>
> Emilio,
>>
>> I'm very interested in making your XML dump processing work easier.  If
>> you file any bugs against the old[1] or new[2] libraries, I'll be quick to
>> turn around on them.
>>
>> 1. https://bitbucket.org/halfak/wikimedia-utilities
>> 2. https://github.com/halfak/mediawiki-utilities
>>
>
> Thanks Aaron, I'm going to use the new version. I hope I can help to your
> project, reporting bugs, sending some patch, scripts for the example
> directory, or anyway. I like processing XML dumps and your library is very
> useful. Fav'ed on Github.
>
>
>>
>> -Aaron
>>
>>
>> On Mon, May 12, 2014 at 10:30 AM, Morten Wang <nettrom at gmail.com> wrote:
>>
>>> Hi Emilio,
>>>
>>> You're probably aware of it, but one way to handle your own installs is
>>> to use virtual environments: https://virtualenv.pypa.io/en/latest/
>>>
>>> BTW, the Python utilities you pointed to is now deprecated in favour of
>>> a newer version, but the newer version is Python 3.x only:
>>> http://pythonhosted.org/mediawiki-utilities/
>>>
>>> I have the older version of his utilities installed in my virtual
>>> environment. When I processed the English dump about a month ago I used
>>> tools-dev for testing and then submitted jobs to the job servers when it
>>> was ready, running over the smaller split files of the dump for
>>> parallelisation and less memory usage.
>>>
>>> From what I've heard the newer library is considerably faster than the
>>> 2.x version, but I haven't yet had a project where I could test that.
>>>
>>>
> Thanks Morten for the virtualenv tip. I'm using it now.
>
>
>> Regards,
>>> Morten
>>>
>>>
>>>
>>> On 11 May 2014 13:10, Emilio J. Rodríguez-Posada <emijrp at gmail.com>wrote:
>>>
>>>> Hi;
>>>>
>>>> I would like to process some Wikipedia dumps. The right place for this
>>>> is tools-dev? I don't see Wikimedia Utilities[1] available there.
>>>>
>>>> Do I have to install it or this is a task for an admin?
>>>>
>>>> Regards
>>>>
>>>> [1] https://bitbucket.org/halfak/wikimedia-utilities/wiki/Home
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20140518/50f9f697/attachment.html>


More information about the Labs-l mailing list