Good question, but I don't need to do that.
I just need to extract all possible contexts for every link to a specific target. I don't compare different revisions of an article directly, it's merely a comparison of which knowledge could be extracted with links from different versions.

Best,
Nicolai

Von: Napolitano, Diane [dnapolitano@ets.org]
Gesendet: Freitag, 27. April 2012 17:15
An: Nicolai Erbs
Cc: xmldatadumps-l@lists.wikimedia.org
Betreff: RE: [Xmldatadumps-l] Old dump for Wikipedia (August 8th, 2008)

Cool!!  But if a page exists in August 2008 and not in the current dump, how are you going to compare the two pages?

 

- Diane

 

From: xmldatadumps-l-bounces@lists.wikimedia.org [mailto:xmldatadumps-l-bounces@lists.wikimedia.org] On Behalf Of Nicolai Erbs
Sent: Friday, April 27, 2012 10:29 AM
To: Diederik van Liere; emijrp
Cc: xmldatadumps-l@lists.wikimedia.org
Subject: Re: [Xmldatadumps-l] Old dump for Wikipedia (August 8th, 2008)

 

Thanks for your answers so far!

I would like to compare contexts of links in two versions of Wikipedia for the purpose of named entity disambiguation (one is a current version and the other one should be from August, 2008).

It might be possible to reconstruct a version but this could be time-consuming. Additionally, wouldn't I miss those articles that have been deleted in the meantime?

Best,
Nicolai


Von: Diederik van Liere [dvanliere@gmail.com]
Gesendet: Freitag, 27. April 2012 16:19
An: emijrp
Cc: Nicolai Erbs; xmldatadumps-l@lists.wikimedia.org
Betreff: Re: [Xmldatadumps-l] Old dump for Wikipedia (August 8th, 2008)

Hi,

Why do you need a dump from 2008? You can use a recent dump and only analyze the data up to 20080103

Best,

Diederik

On Fri, Apr 27, 2012 at 10:16 AM, emijrp <emijrp@gmail.com> wrote:

 

2012/4/27 Nicolai Erbs <erbs@ukp.informatik.tu-darmstadt.de>

English, please.


Here you have one English Wikipedia dump from 20080103 http://dumps.wikimedia.org/archive/ But I remember some old dumps were corrupted.

Ariel, is that dump OK?



--

Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com

Pre-doctoral student at the University of Cádiz (Spain)

 


_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l