Wikitech-l September 2006

wikitech-l@lists.wikimedia.org

91 participants
141 discussions

by Mike O

> ----- Original Message ----- > From: "Brion Vibber" <brion(a)pobox.com> > To: "Wikimedia developers" <wikitech-l(a)wikimedia.org> > Subject: Re: [Wikitech-l] Parsing database dumps > Date: Thu, 28 Sep 2006 13:50:07 -0700 > > > Those particular page titles are in the main (article) namespace, which has no > prefix: > > > <namespace key="0" /> > [snip] > > <title>AaA</title> > [snip] > > <title>AlgeriA</title> OK, thanks, that makes sense. I nearly made a leap to that assumption but then thought I'd better not without really understanding. So are you saying that if there's no colon in the title it's safe the assume that they go in namespace 0. And if there is a colon then I need to parse to see if text before a colon matches a namespace, and if it doesn't then the colon is just part of the article title ? Mike O -- _______________________________________________ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze

17 years, 7 months

Re: [Wikitech-l] Parsing database dumps

by Mike O

> The namespace prefix appears at the beginning of the page title, which appears > as the text contents of the /mediawiki/page/title element, > separated by a colon > from the remaining title part. That's what I gathered from reading the Wiki docs before I started trying to parse the XML. But look at this snippet of XML from enwiki-latest-pages-articles.xml.bz2 taken from late August. I don't see any namespace in the <title> elements, any ideas why, or am I looking in the wrong spot? <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en"> <siteinfo> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.8alpha</generator> <case>first-letter</case> <namespaces> <namespace key="-2">Media</namespace> <namespace key="-1">Special</namespace> <namespace key="0" /> <namespace key="1">Talk</namespace> <namespace key="2">User</namespace> <namespace key="3">User talk</namespace> <namespace key="4">Wikipedia</namespace> <namespace key="5">Wikipedia talk</namespace> <namespace key="6">Image</namespace> <namespace key="7">Image talk</namespace> <namespace key="8">MediaWiki</namespace> <namespace key="9">MediaWiki talk</namespace> <namespace key="10">Template</namespace> <namespace key="11">Template talk</namespace> <namespace key="12">Help</namespace> <namespace key="13">Help talk</namespace> <namespace key="14">Category</namespace> <namespace key="15">Category talk</namespace> <namespace key="100">Portal</namespace> <namespace key="101">Portal talk</namespace> </namespaces> </siteinfo> <page> <title>AaA</title> <id>1</id> <revision> <id>46448774</id> <timestamp>2006-04-01T12:07:25Z</timestamp> <contributor> <username>Gurch</username> <id>241822</id> </contributor> <minor /> <comment>{{R from CamelCase}}</comment> <text xml:space="preserve">#REDIRECT [[AAA]] {{R from CamelCase}} {{R from other capitalisation}}</text> </revision> </page> <page> <title>AlgeriA</title> <id>5</id> <revision> <id>18063769</id> <timestamp>2005-07-03T11:13:13Z</timestamp> <contributor> <username>Docu</username> <id>8029</id> </contributor> <minor /> <comment>adding cur_id=5: {{R from CamelCase}}</comment> <text xml:space="preserve">#REDIRECT [[Algeria]]{{R from CamelCase}}</text> </revision> </page> [...] Mike O -- _______________________________________________ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze

17 years, 7 months

Parsing database dumps

by Mike O

I'm having a bit of trouble figuring out database dump XML. Looking at the articles dump I see page content is wrapped in <page> and </page> elements. What I don't see is how to determine what namespace an article correlates to. I see the namespace elements at the top of the file, but how do I match articles to the right namespace? Mike O -- _______________________________________________ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze

17 years, 7 months

MediaWiki automated test run failure 2006-09-28

by brion＠pobox.com

An automated run of parserTests.php showed the following failures: Running test TODO: Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)... FAILED! Running test TODO: Link containing double-single-quotes '' (bug 4598)... FAILED! Running test TODO: Template with thumb image (with link in description)... FAILED! Running test Template infinite loop... FAILED! Running test TODO: message transform: <noinclude> in transcluded template (bug 4926)... FAILED! Running test TODO: message transform: <onlyinclude> in transcluded template (bug 4926)... FAILED! Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED! Running test TODO: HTML bullet list, unclosed tags (bug 5497)... FAILED! Running test TODO: HTML ordered list, unclosed tags (bug 5497)... FAILED! Running test TODO: HTML nested bullet list, open tags (bug 5497)... FAILED! Running test TODO: HTML nested ordered list, open tags (bug 5497)... FAILED! Running test TODO: Parsing optional HTML elements (Bug 6171)... FAILED! Running test TODO: Inline HTML vs wiki block nesting... FAILED! Running test TODO: Mixing markup for italics and bold... FAILED! Running test TODO: 5 quotes, code coverage +1 line... FAILED! Running test TODO: HTML Hex character encoding.... FAILED! Running test TODO: dt/dd/dl test... FAILED! Passed 412 of 429 tests (96.04%) FAILED!

17 years, 7 months

Wikipedia mirror

by Krishna Pagadala

All, I am planning on building a computer which is a copy of en.wikipedia (including portals), en.wikitonary, en.wikibooks, species.wikimedia along with the pictures. The computer will be sent to India to be used in educational settings, and will probably be copied and shared. I have already built a mirror of en.wikipedia at http://freeknowledge.dyndns.org/ however some of the pages are not rendered properly. For e.g. http://freeknowledge.dyndns.org/index.php/India Any suggestions/advice on how to fix this and pitfalls that I will likely encounter while adding other websites (en.wikitonary, en.wikibooks and probably ta.wik*) is greatly appreciated. -Krishna ===================================== Misinterpreting Copyright by Richard Stallman "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought by Eben Moglen mp3 ogg Free Knowledge blog . --------------------------------- Get your email and more, right on the new Yahoo.com

17 years, 7 months

Re: [Wikitech-l] Category Intersections - some early testing

by Aerik Sylvan

Sorry, took me longer to get around to this than I'd planned. So, I restored the 6 million row categorylinks table to a local computer for testing and threw some sql at it. I got mixed results - in the first pass I'm using my "count... group by" approach and did different queries to get the pages at the intersection of 2 categories. I used at least semi-meaningful categories to try to make the testing at least somewhat representative of real possible usage. I got several sets of results in under 1 second (the lowest time being .3 seconds), one query returned in 8 seconds and another in 36 seconds. I'm going to try re-running the same queries after the query cache is empty (gotta go learn about how to do that) several times to see what the repeatability is, then see if I can glean what the long query times correlate to (intuitively I'm guessing the come from the intersections of large categories, but I haven't tested that yet even though it's easy to do). I'll publish detailed results with figures and actual queries once I've got more data. (Plan to do this tonight or tomorrow night.) Best Regards, Aerik

17 years, 7 months

is texvc still under development?

by Uwe Brauer

Hello Could somebody tell me please whether texvc is still under development and who is the author(s), whom I could contact? Thanks and regards Uwe Brauer

17 years, 7 months

MediaWiki automated test run failure 2006-09-27

by brion＠pobox.com

17 years, 7 months

Re: [Wikitech-l] Finding moved/redirected pages

by Mike O

> ----- Original Message ----- > From: "Brion Vibber" <brion(a)pobox.com> > To: "Wikimedia developers" <wikitech-l(a)wikimedia.org> > Subject: Re: [Wikitech-l] Finding moved/redirected pages > Date: Tue, 26 Sep 2006 12:26:00 -0700 > > > That is indeed the right record. Thanks Brion, I see now what is happening. The other records point to history. So essentially it's a chain, but the top (page_id in pages) points to the current revision and the rest points to historic revs. -- _______________________________________________ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze

17 years, 7 months

Finding moved/redirected pages

by Mike O

Can someone explain how to programatically find the current text of a redirected article? I can't seem to figure this out. Say for example 'article A' is created, then moved to 'article B' then moved again to 'article C'. The Page table has an entry for 'article C' (and 'article B' for that matter) with the original page_id assigned when the article was first created as 'article A' along with a page_latest pointer to the record in the Revision table as rev_id. In the Revision table record there is rev_text_id which has a record number pointer to the actual article content in the Text table. But, since the article was moved twice, the record in the Text table isn't the right record - it's the original record, not the current record. Ho do I trace this logically to find the current article content? -- _______________________________________________ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze

17 years, 7 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l September 2006