Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

3 May 2018

On 3 May 2018 at 19:54, Aidan Hogan &lt;ahogan(a)dcc.uchile.cl&gt; wrote:
...
  Hi all,

 I am wondering what is the fastest/best way to get a local dump of English
 Wikipedia in HTML? We are looking just for the current versions (no edit
 history) of articles for the purposes of a research project.

 We have been exploring using bliki [1] to do the conversion of the source
 markup in the Wikipedia dumps to HTML, but the latest version seems to take
 on average several seconds per article (including after the most common
 templates have been downloaded and stored locally). This means it would take
 several months to convert the dump.

 We also considered using Nutch to crawl Wikipedia, but with a reasonable
 crawl delay (5 seconds) it would several months to get a copy of every
 article in HTML (or at least the "reachable" ones).

 Hence we are a bit stuck right now and not sure how to proceed. Any help,
 pointers or advice would be greatly appreciated!!

 Best,
 Aidan

 [1] https://bitbucket.org/axelclk/info.bliki.wiki/wiki/Home

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l 
Just in case you have not thought of it, how about taking the XML dump
and converting it to the format you are looking for?

Ref https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_…

Fae
-- 
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML