Wikitech-l February 2008

wikitech-l@lists.wikimedia.org

100 participants
104 discussions

by Huji

Revision: 31145 Author: tstarling Date: 2008-02-21 10:25:44 +0000 (Thu, 21 Feb 2008) Log Message: ----------- Hook suggested on IRC. Modified Paths: -------------- trunk/phase3/includes/Parser.php Modified: trunk/phase3/includes/Parser.php =================================================================== --- trunk/phase3/includes/Parser.php 2008-02-21 10:06:48 UTC (rev 31144) +++ trunk/phase3/includes/Parser.php 2008-02-21 10:25:44 UTC (rev 31145) @@ -4443,6 +4443,8 @@ $params['frame']['alt'] = $alt; $params['frame']['caption'] = $caption; + wfRunHooks( 'ParserMakeImageParams', array( $title, $file, &$params ) ); + # Linker does the rest $ret = $sk->makeImageLink2( $title, $file, $params['frame'], $params['handler'] ); Can someone, who is aware of the discussion behind adding this hook, update docs/hooks.txt please?

16 years, 3 months

Wikistats in SQLite3

by Lars Aronsson

I've been an enthusiastic downloader and user of Domas' wikistats or page counter logfiles since they first appeared on December 9, http://lists.wikimedia.org/pipermail/wikitech-l/2007-December/035435.html One problem, however, is that they are plain text files that need to be loaded into some kind of database system before you can do any interesting analysis. Whether text files or XML or MySQL dumps, they all take quite some time to import. It's like unpacking a huge tar archive, rather than instantaneously mounting an ISO 9660 image, if you get the analogy. You could probably load the data into MySQL and then distribute the raw tablespace files, but I haven't heard of any project that does this. MySQL wasn't built with this in mind. The data could be loaded into MySQL at the toolserver (maybe someone did already?), and we could each run our queries there, but that doesn't scale if many people want to run heavy queries. When looking around, I found SQLite (www.sqlite.org), a free software (public domain, actually) light-weight SQL database engine. It is server-less and runs entirely as a one-user application, storing tables and indexes in a plain file. As an experiment, I loaded the first two months of page counter log files for the Swedish Wikipedia into SQLite (version 3). The resulting databse file is 3.1 GB, which bzip2 shrinks to 638 MB. The idea is that you can download these 638 MB, run bunzip2, and then start sqlite3 and run SQL queries right away. It doesn't take many minutes to get started. Now I want to find out if this is a useful scheme. Then I could set up a process to provide such SQLite dumps for all languages. But perhaps some parts need adjustment or tuning. I need your feedback for this. You'll have to do analyze the Swedish Wikipedia initially, since that's all I provide for now. Here's what I have done: First I decode the URL encoding and normalize some page names. Main_Page, Main+Page and Main%20Page are all converted to Main_Page, and even translated to Huvudsida for the Swedish Wikipedia. That means I add up the page counters for these page names and store them under Huvudsida. All page names are stored in one table (names) and given an integer primary key (names.id), to avoid duplicate storage of text strings. The names table now has 1.9 million entries. Each logfile covers one hour, timestamped in UTC. A table called "times" uses "unix seconds" as a primary key (times.unix) and lists the year, month, day-of-month, week, day-of-week, and hour. Perhaps this was unnecessary given the date and time functions provided by SQLite, but I still believe it can be helpful. For these 2 months (62 days), the times table has 1734 entries. The big "counts" table contains the language ("sv") as a text field and integer fields for time (references times.unix), name (references names.id) and count. The counts table has 68.3 million entries. A typical query you can run is select sum(count), year, month, mday from counts, times, names where names.name='Huvudsida' and year=2007 and names.id=counts.name and counts.time=times.unix group by 2,3,4; Queries are not necessarily fast, but you can create indexes as you wish. Are there any indexes you would like me to build and supply as part of the distributed database file? The query above returns this result: 32403|2007|12|9 119005|2007|12|10 117551|2007|12|11 107630|2007|12|12 102178|2007|12|13 88766|2007|12|14 65733|2007|12|15 87048|2007|12|16 106643|2007|12|17 96751|2007|12|18 86955|2007|12|19 74297|2007|12|20 63383|2007|12|21 57908|2007|12|22 59360|2007|12|23 45230|2007|12|24 56469|2007|12|25 58494|2007|12|26 66068|2007|12|27 63538|2007|12|28 65137|2007|12|29 68636|2007|12|30 55821|2007|12|31 Currently, you'll find the database file (both compressed and not) at http://mirabell.runeberg.lysator.liu.se/ Here's what you need to do (UNIX/Linux commands): sudo apt-get install bzip2 sudo apt-get install sqlite3 wget http://mirabell.runeberg.lysator.liu.se/sv-counts-20080219.db.bz2 bunzip2 sv-counts-20080219.db.bz2 sqlite3 sv-counts-20080219.db That URL is not permanent, but only available for the current test. -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

16 years, 3 months

Rendering wikimarkup using code from Mediawiki

by Apple Grew

I am trying to create an offline Wikipedia client for the Wikipedia XML dump. I know there are lot of programs on the, but all seems to render the page very badly, because the Wiki markup has enhanced considerably but all these programs are quite outdated and almost dead. After scanning the internet since yesterday, I have come-up a number of libraries and programs but all of them don't render the page perfectly. Hence, I was toying with the idea of rendering the page using MediaWiki's php files as the people at woc.fslab.de (Offline Wikipedia Client) have done. I have downloaded Offline Wikipedia Client but I yet haven't been able to figure it out how to use it. Anyway, its looks too complicated and overly large. I want an Offline Client myself which would be served via Http (I have tried importing the dump into freashly installed Mediawiki but the rebuild links takes forever and WikiFilter is not for Linux - yet I tried that over wine). So, my question is. Can anyone please guide me to the php files of MediaWiki that I can use with little modification. I intend to provide it all the necessary details like list of Templates and their codes to substitute, the categories and the article markup as input to the php file,etc. I expect to get the html code that can be sent to the user's browser. All this may look very pointless, but after battering my brains over this thing and repeatedly getting disappointing results, my brain has gone fuzzy and desperate. May you have peace pf mind. Reagrds, Apple Grew my blog @ http://applegrew.blogspot.com/

16 years, 3 months

Array plus operator

by Tim Starling

Time for a lesson in basic PHP. A bug was introduced in r25374 by Aaron, last September, and despite half a dozen people editing the few lines around that point, nobody picked it up. Simetrical eventually fixed it in r29156, blaming a bug in PHP's array_diff(). It was not. $permErrors += array_diff( $this->mTitle->getUserPermissionsErrors('create', $wgUser), $permErrors ); I used to make the same error myself. Although I learnt from my mistakes, we obviously haven't learnt as a team. http://www.php.net/manual/en/language.operators.array.php "The + operator appends elements of remaining keys from the right handed array to the left handed, whereas duplicated keys are NOT overwritten." That explains the behaviour of the array plus operator in its entirety. If you add two arrays, and both have an element with a key of zero, the one on the left-hand side wins. The elements are NOT renumbered. For example: > print_r( array( 'foo' ) + array( 'bar' ) ); Array ( [0] => foo ) > print_r( array( 'foo' ) + array( 'bar', 'baz' ) ); Array ( [0] => foo [1] => baz ) If you want the elements to be renumbered, use array_merge(). -- Tim Starling

16 years, 3 months

Anyone interested in collaborating on a MediaWiki analytics engine?

by Ben

Some initial thoughts here: http://en.wiredtape.com/wiki/User:Bfhappy/analytics ActionScript/Flex developers would be greatly appreciated... :)

16 years, 3 months

sequence of hooks

by Andreas Rindler

Hi, I have two extensions (SelectCategoryTagCloud and FCKeditor) that both use the following hook to access the content of the edit page input box: $wgHooks['EditPage::showEditForm:initial'] How can I control which extension gets to do its work first? I would like SelectCategoryTagCloud to strip out all existing category links in the wiki text (and consequently put it in the second input box for category assignment) and then have FCKeditor use the rest of the text to display in the wysiwyg editor. And when the user saves the page, I would like to reverse the order. First, FCKeditor stores the wikitext and then SelectCategoryTagCloud adds the categories selected by the user at the bottom of the wiki text. So the database always stores wiki text. Any thoughts? Thanks, Andi

16 years, 3 months

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [31055] trunk/extensions/RandomRootPage/Randomrootpage.i18n.php

by Brion Vibber

huji(a)svn.wikimedia.org wrote: > Revision: 31055 > Author: huji > Date: 2008-02-18 11:39:29 +0000 (Mon, 18 Feb 2008) > > Log Message: > ----------- > Adding RandomRootPage extension files > > Added Paths: > ----------- > trunk/extensions/RandomRootPage/Randomrootpage.i18n.php Urk.... Can we please keep extension directories and their files cased the same way? Swapping them around is rather maddening. -- brion

16 years, 3 months

Extension Wikipdf

by Julia Reek

Hi, i'm using the wikipdf extension (http://sourceforge.net/projects/wikipdf/). The translation script is written in python and that's my problem. I don't know how to programm in python and have the problem that if i use <math>"here stands tex formula"</math> the part of the text which is already written in tex is also translated. This means that special symbols like \sup_{} don't stay the way they are...so the formula is not shown in my pdf but the tex-text for the formular. Is anybody using this extension, or has anybody already had the same problem and a solution? thanx for the help julia

16 years, 3 months

Fwd: Freebase provides data dumps

by Brianna Laugher

For your interest... "The Freebase Wikipedia Extraction (WEX) is a processed dump of the English language Wikipedia. The wiki markup for each article is transformed into machine-readable XML, and common relational features such as templates, infoboxes, categories, article sections, and redirects are extracted in tabular form. "Freebase WEX is provided as a set of database tables in TSV format for PostgreSQL, along with tables providing mappings between Wikipedia articles and Freebase topics, and corresponding Freebase Types." <http://download.freebase.com/wex/> cheers, Brianna ---------- Forwarded message ---------- From: Georgi Kobilarov <gkob(a)gmx.de> Date: 20 Feb 2008 07:45 Subject: [Dbpedia-discussion] Freebase provides data dumps To: dbpedia-discussion(a)lists.sourceforge.net Hi all, Freebase now provides dumps of their data extracted from Wikipedia. See [1] [2]. Interesting stuff. It is nice to see that Metaweb follows the ideas of DBpedia ;) @Metaweb: it's time to open source your extraction framework as well. (I know you read this :) Cheers, Georgi [1] http://blog.freebase.com/?p=108 [2] http://download.freebase.com/wex/ ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion(a)lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion -- They've just been waiting in a mountain for the right moment: http://modernthings.org/

16 years, 3 months

Re: [Wikitech-l] Developing extension for Mediawiki: problem inadding content to wikitext.

by 65s.mg＠atlas.cz

Try to format the "a" as a "span.editsection".

16 years, 3 months

← Newer
1
2
3
4
5
6
7
8
...
11
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2008