[Xmldatadumps-admin-l] Errors in importing frWiki XML
Felipe Ortega
glimmer_phoenix at yahoo.es
Thu Oct 1 15:02:44 UTC 2009
--- El mié, 30/9/09, Bilal Abdul Kader <bilalak at gmail.com> escribió:
> De: Bilal Abdul Kader <bilalak at gmail.com>
> Asunto: Re: [Xmldatadumps-admin-l] Errors in importing frWiki XML
> Para: xmldatadumps-admin-l at lists.wikimedia.org
> Fecha: miércoles, 30 septiembre, 2009 10:16
>
> Hi Felipe,
> Thanks for the response and the suggestion. In fact, I have
> used a modified version of the python import script (the
> modifications fix the empty contributor bug) but all the rev
Mmmm, I thought I already uploaded a revised version of the code,
correcting that bug of XML dumps. Thanks, I'll double-check that.
> len are empty. It seems the script did not calculate this
> field.
>
Strange, since dump_sax_research.py has always calculated rev_len. I've done so in august for the dumps we have uploaded to RedIRIS.
BTW, you can just upload the mysqldumps we created, with all relevant info already parsed:
ftp://ftp.rediris.es/mirror/WKP_research
>
> If the number is just the rev length, I can get it by
> running an SQL query.
>
> I still have another major problem with the categories. I
> am not sure why I am not able to get that althought I did a
> full xml import and imported also
> frwiki-latest-categorylinks.sql.gz?
>
>
> Any suggestion on this end please?
>
Sorry, Bilal, I'm not sure what do you mean here. If you import the categorylinks table, it should contain all information of which category/ies a certain page belongs to.
Best,
F --
> bilal
>
>
> On Wed, Sep 30, 2009 at 4:08 PM,
> Felipe Ortega <glimmer_phoenix at yahoo.es>
> wrote:
>
>
> --- El mar, 29/9/09, Bilal Abdul
> Kader <bilalak at gmail.com>
> escribió:
>
>
>
>
>
> > De: Bilal Abdul Kader <bilalak at gmail.com>
>
> > Asunto: [Xmldatadumps-admin-l] Errors in importing
> frWiki XML
>
> > Para: xmldatadumps-admin-l at lists.wikimedia.org
>
> > Fecha: martes, 29 septiembre, 2009 11:53
>
> > Greetings,
>
> > I imported the french wiki. I was able to get it all
> with
>
> > full history but the number of bytes between versions
> is
>
> > empty. It seems it was not imported for a weird
> reason.
>
> >
>
>
>
> Dear Bilal,
>
>
>
> I'm not sure which field in the revision table are you
> referring to. If it is 'rev_len', it stores the
> length of the revision, in bytes, not the diff between the
> revision of a page and the previous one:
>
>
>
> http://www.mediawiki.org/wiki/Manual:Revision_table
>
>
>
> Likewise, AFAIK that number is not provided in the full
> dump. For instance, WikiXRay Python parser have to compute
> it from the text content of each revision to fill in the
> value in the rev_len field for that row.
>
>
>
> If you also have rev_parent_id field (for instance, the
> WikiXRay Python parser also computes this), it shouldn't
> be difficult to compute the diff between any given pair of
> revisions.
>
>
>
> Regards,
>
> F --
>
>
>
> > Is there any script to do that in the maintenance
> folder?
>
> > Is the diff algorithm a simple characters count or
> there is
>
> > more behind the generation of this number.
>
> >
>
> >
>
> > The categories are not imported on the main page of
> the
>
> > wiki. How to solve that?
>
> >
>
> > bilal
>
> >
>
> >
>
> > --
>
> > Verily, with hardship comes ease.
>
> >
>
> --
> Verily, with hardship comes ease.
>
>
> -----Adjunto en línea a continuación-----
>
> _______________________________________________
> Xmldatadumps-admin-l mailing list
> Xmldatadumps-admin-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
>
More information about the Xmldatadumps-admin-l
mailing list