[Xmldatadumps-admin-l] Errors in importing frWiki XML

Felipe Ortega glimmer_phoenix at yahoo.es
Thu Oct 1 15:02:44 UTC 2009



--- El mié, 30/9/09, Bilal Abdul Kader <bilalak at gmail.com> escribió:

> De: Bilal Abdul Kader <bilalak at gmail.com>
> Asunto: Re: [Xmldatadumps-admin-l] Errors in importing frWiki XML
> Para: xmldatadumps-admin-l at lists.wikimedia.org
> Fecha: miércoles, 30 septiembre, 2009 10:16
> 
> Hi Felipe,
> Thanks for the response and the suggestion. In fact, I have
> used a modified version of the python import script (the
> modifications fix the empty contributor bug) but all the rev

Mmmm, I thought I already uploaded a revised version of the code,
correcting that bug of XML dumps. Thanks, I'll double-check that.

> len are empty. It seems the script did not calculate this
> field. 
> 

Strange, since dump_sax_research.py has always calculated rev_len. I've done so in august for the dumps we have uploaded to RedIRIS.

BTW, you can just upload the mysqldumps we created, with all relevant info already parsed:

ftp://ftp.rediris.es/mirror/WKP_research 

> 
> If the number is just the rev length, I can get it by
> running an SQL query. 
> 
> I still have another major problem with the categories. I
> am not sure why I am not able to get that althought I did a
> full xml import and imported also
> frwiki-latest-categorylinks.sql.gz?
> 
> 
> Any suggestion on this end please?
> 

Sorry, Bilal, I'm not sure what do you mean here. If you import the categorylinks table, it should contain all information of which category/ies a certain page belongs to.

Best,
F --

> bilal
> 
> 
> On Wed, Sep 30, 2009 at 4:08 PM,
> Felipe Ortega <glimmer_phoenix at yahoo.es>
> wrote:
> 
> 
> --- El mar, 29/9/09, Bilal Abdul
> Kader <bilalak at gmail.com>
> escribió:
> 
> 
> 
> 
> 
> > De: Bilal Abdul Kader <bilalak at gmail.com>
> 
> > Asunto: [Xmldatadumps-admin-l] Errors in importing
> frWiki XML
> 
> > Para: xmldatadumps-admin-l at lists.wikimedia.org
> 
> > Fecha: martes, 29 septiembre, 2009 11:53
> 
> > Greetings,
> 
> > I imported the french wiki. I was able to get it all
> with
> 
> > full history but the number of bytes between versions
> is
> 
> > empty. It seems it was not imported for a weird
> reason.
> 
> >
> 
> 
> 
> Dear Bilal,
> 
> 
> 
> I'm not sure which field in the revision table are you
> referring to. If it is 'rev_len', it stores the
> length of the revision, in bytes, not the diff between the
> revision of a page and the previous one:
> 
> 
> 
> http://www.mediawiki.org/wiki/Manual:Revision_table
> 
> 
> 
> Likewise, AFAIK that number is not provided in the full
> dump. For instance, WikiXRay Python parser have to compute
> it from the text content of each revision to fill in the
> value in the rev_len field for that row.
> 
> 
> 
> If you also have rev_parent_id field (for instance, the
> WikiXRay Python parser also computes this), it shouldn't
> be difficult to compute the diff between any given pair of
> revisions.
> 
> 
> 
> Regards,
> 
> F --
> 
> 
> 
> > Is there any script to do that in the maintenance
> folder?
> 
> > Is the diff algorithm a simple characters count or
> there is
> 
> > more behind the generation of this number.
> 
> >
> 
> >
> 
> > The categories are not imported on the main page of
> the
> 
> > wiki. How to solve that?
> 
> >
> 
> > bilal
> 
> >
> 
> >
> 
> > --
> 
> > Verily, with hardship comes ease.
> 
> >
> 
> -- 
> Verily, with hardship comes ease.
> 
> 
> -----Adjunto en línea a continuación-----
> 
> _______________________________________________
> Xmldatadumps-admin-l mailing list
> Xmldatadumps-admin-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
> 


      



More information about the Xmldatadumps-admin-l mailing list