[Xmldatadumps-admin-l] Errors in importing frWiki XML

Felipe Ortega glimmer_phoenix at yahoo.es
Wed Sep 30 20:08:13 UTC 2009


--- El mar, 29/9/09, Bilal Abdul Kader <bilalak at gmail.com> escribió:

> De: Bilal Abdul Kader <bilalak at gmail.com>
> Asunto: [Xmldatadumps-admin-l] Errors in importing frWiki XML
> Para: xmldatadumps-admin-l at lists.wikimedia.org
> Fecha: martes, 29 septiembre, 2009 11:53
> Greetings,
> I imported the french wiki. I was able to get it all with
> full history but the number of bytes between versions is
> empty. It seems it was not imported for a weird reason.
>

Dear Bilal,

I'm not sure which field in the revision table are you referring to. If it is 'rev_len', it stores the length of the revision, in bytes, not the diff between the revision of a page and the previous one:

http://www.mediawiki.org/wiki/Manual:Revision_table 

Likewise, AFAIK that number is not provided in the full dump. For instance, WikiXRay Python parser have to compute it from the text content of each revision to fill in the value in the rev_len field for that row.

If you also have rev_parent_id field (for instance, the WikiXRay Python parser also computes this), it shouldn't be difficult to compute the diff between any given pair of revisions.

Regards,
F --
 
> Is there any script to do that in the maintenance folder?
> Is the diff algorithm a simple characters count or there is
> more behind the generation of this number.
> 
> 
> The categories are not imported on the main page of the
> wiki. How to solve that?
> 
> bilal
> 
> 
> --
> Verily, with hardship comes ease.
> 
> 
> -----Adjunto en línea a continuación-----
> 
> _______________________________________________
> Xmldatadumps-admin-l mailing list
> Xmldatadumps-admin-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
> 


      




More information about the Xmldatadumps-admin-l mailing list