[Xmldatadumps-admin-l] Errors in importing frWiki XML

Bilal Abdul Kader bilalak at gmail.com
Wed Sep 30 20:16:00 UTC 2009


Hi Felipe,
Thanks for the response and the suggestion. In fact, I have used a modified
version of the python import script (the modifications fix the empty
contributor bug) but all the rev len are empty. It seems the script did not
calculate this field.
If the number is just the rev length, I can get it by running an SQL query.

I still have another major problem with the categories. I am not sure why I
am not able to get that althought I did a full xml import and imported also
frwiki-latest-categorylinks.sql.gz?
Any suggestion on this end please?

bilal



On Wed, Sep 30, 2009 at 4:08 PM, Felipe Ortega <glimmer_phoenix at yahoo.es>wrote:

> --- El mar, 29/9/09, Bilal Abdul Kader <bilalak at gmail.com> escribió:
>
> > De: Bilal Abdul Kader <bilalak at gmail.com>
> > Asunto: [Xmldatadumps-admin-l] Errors in importing frWiki XML
> > Para: xmldatadumps-admin-l at lists.wikimedia.org
> > Fecha: martes, 29 septiembre, 2009 11:53
> > Greetings,
> > I imported the french wiki. I was able to get it all with
> > full history but the number of bytes between versions is
> > empty. It seems it was not imported for a weird reason.
> >
>
> Dear Bilal,
>
> I'm not sure which field in the revision table are you referring to. If it
> is 'rev_len', it stores the length of the revision, in bytes, not the diff
> between the revision of a page and the previous one:
>
> http://www.mediawiki.org/wiki/Manual:Revision_table
>
> Likewise, AFAIK that number is not provided in the full dump. For instance,
> WikiXRay Python parser have to compute it from the text content of each
> revision to fill in the value in the rev_len field for that row.
>
> If you also have rev_parent_id field (for instance, the WikiXRay Python
> parser also computes this), it shouldn't be difficult to compute the diff
> between any given pair of revisions.
>
> Regards,
> F --
>
> > Is there any script to do that in the maintenance folder?
> > Is the diff algorithm a simple characters count or there is
> > more behind the generation of this number.
> >
> >
> > The categories are not imported on the main page of the
> > wiki. How to solve that?
> >
> > bilal
> >
> >
> > --
> > Verily, with hardship comes ease.
> >
>

-- 
Verily, with hardship comes ease.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20090930/843fd9a9/attachment.htm 


More information about the Xmldatadumps-admin-l mailing list