[Xmldatadumps-admin-l] Errors in importing frWiki XML

Bilal Abdul Kader bilalak at gmail.com
Fri Oct 2 06:29:35 UTC 2009


Hi Felipe,
Thanks for the response and the help. I have downloaded the FR SQL dump from
RedIRIS. I think it does not have the full text for the revisions. Still It
would be helpful to get all dumps as SQL.

For the categories issue, I have imported the categorylinks table but still
the main categories on the home page are not showing. I am not sure why.

bilal


On Thu, Oct 1, 2009 at 11:02 AM, Felipe Ortega <glimmer_phoenix at yahoo.es>wrote:

>
>
> --- El mié, 30/9/09, Bilal Abdul Kader <bilalak at gmail.com> escribió:
>
> > De: Bilal Abdul Kader <bilalak at gmail.com>
> > Asunto: Re: [Xmldatadumps-admin-l] Errors in importing frWiki XML
> > Para: xmldatadumps-admin-l at lists.wikimedia.org
> > Fecha: miércoles, 30 septiembre, 2009 10:16
> >
> > Hi Felipe,
> > Thanks for the response and the suggestion. In fact, I have
> > used a modified version of the python import script (the
> > modifications fix the empty contributor bug) but all the rev
>
> Mmmm, I thought I already uploaded a revised version of the code,
> correcting that bug of XML dumps. Thanks, I'll double-check that.
>
> > len are empty. It seems the script did not calculate this
> > field.
> >
>
> Strange, since dump_sax_research.py has always calculated rev_len. I've
> done so in august for the dumps we have uploaded to RedIRIS.
>
> BTW, you can just upload the mysqldumps we created, with all relevant info
> already parsed:
>
> ftp://ftp.rediris.es/mirror/WKP_research
>
> >
> > If the number is just the rev length, I can get it by
> > running an SQL query.
> >
> > I still have another major problem with the categories. I
> > am not sure why I am not able to get that althought I did a
> > full xml import and imported also
> > frwiki-latest-categorylinks.sql.gz?
> >
> >
> > Any suggestion on this end please?
> >
>
> Sorry, Bilal, I'm not sure what do you mean here. If you import the
> categorylinks table, it should contain all information of which category/ies
> a certain page belongs to.
>
> Best,
> F --
>
> > bilal
> >
> >
> > On Wed, Sep 30, 2009 at 4:08 PM,
> > Felipe Ortega <glimmer_phoenix at yahoo.es>
> > wrote:
> >
> >
> > --- El mar, 29/9/09, Bilal Abdul
> > Kader <bilalak at gmail.com>
> > escribió:
> >
> >
> >
> >
> >
> > > De: Bilal Abdul Kader <bilalak at gmail.com>
> >
> > > Asunto: [Xmldatadumps-admin-l] Errors in importing
> > frWiki XML
> >
> > > Para: xmldatadumps-admin-l at lists.wikimedia.org
> >
> > > Fecha: martes, 29 septiembre, 2009 11:53
> >
> > > Greetings,
> >
> > > I imported the french wiki. I was able to get it all
> > with
> >
> > > full history but the number of bytes between versions
> > is
> >
> > > empty. It seems it was not imported for a weird
> > reason.
> >
> > >
> >
> >
> >
> > Dear Bilal,
> >
> >
> >
> > I'm not sure which field in the revision table are you
> > referring to. If it is 'rev_len', it stores the
> > length of the revision, in bytes, not the diff between the
> > revision of a page and the previous one:
> >
> >
> >
> > http://www.mediawiki.org/wiki/Manual:Revision_table
> >
> >
> >
> > Likewise, AFAIK that number is not provided in the full
> > dump. For instance, WikiXRay Python parser have to compute
> > it from the text content of each revision to fill in the
> > value in the rev_len field for that row.
> >
> >
> >
> > If you also have rev_parent_id field (for instance, the
> > WikiXRay Python parser also computes this), it shouldn't
> > be difficult to compute the diff between any given pair of
> > revisions.
> >
> >
> >
> > Regards,
> >
> > F --
> >
> >
> >
> > > Is there any script to do that in the maintenance
> > folder?
> >
> > > Is the diff algorithm a simple characters count or
> > there is
> >
> > > more behind the generation of this number.
> >
> > >
> >
> > >
> >
> > > The categories are not imported on the main page of
> > the
> >
> > > wiki. How to solve that?
> >
> > >
> >
> > > bilal
> >
> > >
> >
> > >
> >
> > > --
> >
> > > Verily, with hardship comes ease.
> >
> > >
> >
> > --
> > Verily, with hardship comes ease.
> >
> >
> > -----Adjunto en línea a continuación-----
> >
> > _______________________________________________
> > Xmldatadumps-admin-l mailing list
> > Xmldatadumps-admin-l at lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
> >
>
>
>
>


-- 
Verily, with hardship comes ease.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20091002/5a08c59a/attachment.htm 


More information about the Xmldatadumps-admin-l mailing list