Wikitech-l November 2008

wikitech-l@lists.wikimedia.org

92 participants
71 discussions

by Lars Aronsson

On my favorite page, http://meta.wikimedia.org/wiki/List_of_Wikipedias there is a column for "depth", which is "a rough indicator of a Wikipedia's quality, showing how frequently its articles are updated". Tomorrow that column has been there for two full years, with slight modifications of its formula. I wrote a separate page about this, http://meta.wikimedia.org/wiki/Depth (Note that this is completely unrelated to http://en.wikipedia.org/wiki/Wikipedia:Depth ) There has been a lengthy discussion on the good and evil of trying to estimate the quality of Wikipedia. But I think "depth" is the only measurement that we can track over such a long time. What other estimates of Wikipedia quality do we have, that can be applied across language versions? Erik Zachte's Wikipedia Statistics (last updated in May 2008) presents a number of values that could be used to calculate a quality estimate: number of articles, number of articles longer than 0.5 kbytes or 2 kbytes (excluding some markup), mean edits per article, mean bytes per article, number of edits (total), size of database in bytes or words, number of internal or interwiki or image or external links, number of redirects. The editing depth is essentially the number of edits divided by the number of articles (with two more factors in the formula). This means edit wars and repeated use of the save button (instead of preview) will give a higher depth. But if an article is made perfect before it is saved, it gives a low depth. Thus, "depth" measures the amount of editing activity within Wikipedia, as opposed to the real quality of the resulting article. This can be interesting in itself, but it might also be interesting to estimate the amount of interconnectivity between articles, where orphan articles or articles with just one link to them are discounted as a kind of stub. How can such a measurement be defined? If possible, by just combining the values we already know. Earlier (2005-2006), the Swedish language Wikipedia created many (mostly very short) articles, giving it a high ranking position in the list of Wikipedias (by article count). But since these stubs were created once and never touched again, this gave it a rather low "depth" of 14 (in November 2007). During 2008, a number of subprojects have gone back and made minor edits to many old articles, so the "depth" has climbed to 23. This is not high, but no longer among the very lowest. The increase by +64 percent is however overshadowed by the Turkish Wikipedia's increase by +125 percent (from 39 to 88). Also, the French Wikipedia has increased its depth from 58 to 113, while the German Wikipedia only moved from depth 68 to 80. -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

15 years, 5 months

Re: [Wikitech-l] [Commons-l] Support for Chemical Markup Language

by Marco Schuster

On Sun, Nov 30, 2008 at 1:11 AM, Brian Salter-Duke <b_duke(a)bigpond.net.au> wrote: > On Sun, 30 Nov 2008 00:50:08 +0100, Platonides <Platonides(a)gmail.com> wrote: >> See https://bugzilla.wikimedia.org/show_bug.cgi?id=16491 >> That users can embed javascript is not acceptable to run it on Wikipedia. >> Other parameters, like urlContents or signed wouldn't be used but at >> least they can be disabled. > > I am afraid this is all beyond my expertise. Are you saying that there > is no way Jmol can ever be used on WMF projects? There is, as soon as the Javascript embedding possibility gets disabled and the extension gets a proper review (TM). Marco

15 years, 5 months

Re: [Wikitech-l] [Foundation-l] Language codes to rename

by Brion Vibber

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Gerard Meijssen wrote: > Hoi, > Do not forget the als.wikipedia.org. It stands for Alsatian, but the als > code is the Tosk language. The "gsw" code is the code that should have been > used. > http://www.ethnologue.com/show_language.asp?code=gsw Adding it to my list, thanks! > The nrm.wikipedia is also using a wrong code. nrm is Narom, a language from > Malaysia. Nourmande is not recognised as a language, consequently there is > no code available for it. I propose to use qaa for this. > http://www.sil.org/iso639-3/scope.asp#R I'd recommend roa-x-norman (generic Romance code with an extension tag) rather than a private-use identifier. Private-use identifiers are meant more for things like internal coding within an application; such as where you'd want to indicate that a document is not in a human language, or maybe a special mixed setting that's specific to your organization's internal usage such as "not yet inspected for coding" or something. Per spec: "These identifiers may only be used locally, and may not be used in interchange without a private agreement." The purpose of using language codes on the web as we do is explicitly for interchange with browser software, end users (as a navigation marker in the URL), and content reusers. - -- brion -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkkx0hcACgkQwRnhpk1wk44wGgCg09XgjA6cu1G+SRQLzEsBAXLZ 1ToAnRC6BfYMDAr086q34+qG1K4OBrsf =VpUJ -----END PGP SIGNATURE-----

15 years, 5 months

Donor's names are broken

by mizusumashi

Why are some names of donors broken in http://wikimediafoundation.org/wiki/Special:ContributionHistory ? examples: http://wikimediafoundation.org/wiki/Special:ContributionHistory?offset=1227… http://wikimediafoundation.org/wiki/Special:ContributionHistory?offset=1227… I think no-latin characters are displayed not successfully. ---- mizusumashi

15 years, 5 months

Language codes to rename

by Brion Vibber

For quick background, it's pretty painful to rename a database in our system, and we currently have a lot of bits in our configuration that make automatic relationships between the database name and the domain name, so this has delayed renaming of some language subdomains for a while. It's not impossible to have them be different, just fairly awkward. :) I'd like to get these done soon, but before we get started, I want to make sure the queue is complete and ready to go. I've currently got four language code renames that I see being requested... == Aromanian == roa-rup.wikipedia.org -> rup.wikipedia.org roa-rup.wiktionary.org -> rup.wiktionary.org https://bugzilla.wikimedia.org/show_bug.cgi?id=15988 ISO-639-2 code 'rup' was added in September 2005, and can supersede the generic 'roa' code with 'rup' subtag. This seems pretty uncontroversial. Existing domains and interwikis would be redirected. == Low German == nds.wikipedia.org -> nds-de.wikipedia.org nds.wikibooks.org -> nds-de.wikibooks.org nds.wikiquote.org -> nds-de.wikiquote.org nds.wiktionary.org -> nds-de.wiktionary.org https://bugzilla.wikimedia.org/show_bug.cgi?id=8540 Reasoning: Disambiguation of country variants to create a portal site (nds-nl.wikipedia.org exists as well). The original request is almost 2 years old and didn't seem to have any clear consensus; is this still desired? Creating a portal site could cause difficulties with URL compatibility, and I don't really recommend making this change without clear consensus from the community there. Note that nds.wikipedia.org includes a link on the front page to nds-nl.wikipedia.org. == Moldovan == mo.wikipedia.org -> mo-cyrl.wikipedia.org mo.wiktionary.org -> mo-cyrl.wiktionary.org The official Moldovan language is the same as Romanian, using Latin script and same orthography as on ro.wikipedia.org. Latin script was officially adopted in 1989, replacing Soviet-era Cyrillic script; use of Cyrillic script is still "official" in an unrecognized, lightly-populated breakaway region but if people there use it, they don't seem to edit Wikipedia... The 'mo' language code has been officially deprecated from ISO 639-1/2 as of November 3, 2008; "Moldovan" in general use is just Romanian, and is covered by ro.wikipedia.org. mo.wikipedia.org has not actually been edited since December 2006. mo.wiktionary.org seems to have.... 4 definition pages, which only contain translations (no definitions!) Being inactive, these projects could be closed in addition to / instead of the rename. Use of tag 'mo-cyrl' would follow existing IANA-registered language subtags such as 'bs-Cyrl' and 'bs-Latn' for Cyrillic and Latin script variants. Most likely, for compatibility we would redirect the existing 'mo' URLs to the new 'mo-cyrl' ones, but they would now be visibly marked by the subtag as being "yes we know, it's Cyrillic here". If we're going to lock the site as well, adding a sitenotice pointing to the Romanian wiki is probably wise. == Belorusian "old orthography" == be-x-old.wikipedia.org -> be-tarask.wikipedia.org https://bugzilla.wikimedia.org/show_bug.cgi?id=9823 Some time ago we swapped around the Belorusian Wikipedia, moving the previous version which was primarily using a non-official orthography, from 'be' to 'be-x-old', and re-establishing be.wikipedia.org using the official state orthography. There was later a request to rename 'be-x-old' (using a non-standard code) to 'be-tarask', a IANA-registered subtag which is rather more descriptive. IMHO this change should not be terribly controversial -- if we're not closing it, we may as well give it its official RFC 4646-registered code. Old domain and interwikis would be redirected. -- brion vibber (brion @ wikimedia.org)

15 years, 5 months

Support for Chemical Markup Language

by Eugene Zelenko

Hi! We are discussing on Commons list (http://lists.wikimedia.org/pipermail/commons-l/2008-November/004338.html) possible support for Chemical Markup Language (http://cml.sourceforge.net) and Jmol viewer (http://jmol.sourceforge.net). Extension for MediaWiki is already implemented (http://wiki.jmol.org/index.php/MediaWiki). However it was done for 1.12 and some security concerns exists. Will be great if somebody will review extension code and adapt it to current MediaWiki state if necessary. Eugene.

15 years, 5 months

Correct method of pre-processing article text?

by Mark Clements (HappyDog)

I have an extension which parses the contents of a page to store the content of certain embedded tags to the database, and I want the parsing to take place after the pre-processing (comment removal, template expansion, etc.) I also need the code to be compatible with MW1.6 as I am currently unable to upgrade to PHP5 (hopefully soon...) Here is the code I was using until recently (where $Text is the unmodified article text): // Create new Parser object to deal with some transformations that are // required before saving. $Parser = new Parser(); // Use the Parser object to strip out html comments, nowiki and pre tags // and whatever other bits shouldn't make it through when rendering (so // they don't affect saving). $ParserOptions = new ParserOptions(); $StripState =& $Parser->mStripState; $Parser->mOptions = $ParserOptions; $TidyText = $Parser->strip($Text, $StripState, true); // Then replace any variables, parser functions etc. so that 'hidden' tags // (e.g. tags that are created by code, such as using the ExpandAfter // extension) are expanded properly for saving. $Parser->mFunctionHooks = $wgParser->mFunctionHooks; $Parser->mTitle =& $wgParser->mTitle; $TidyText = $Parser->replaceVariables($TidyText); However, I was recently testing this on MW1.12, and this gives the following error: Fatal error: Call to a member function matchStartToEnd() on a non-object in Parser.php on line 2771 I fixed this by inserting the following two lines just before the second $TidyText = ... $Parser->mVariables =& $wgParser->mVariables; $Parser->mOutput =& $wgParser->mOutput; Now, it is clear to me that this is the wrong way of going about this - I shouldn't be having to mess with the internals of the parser object in order to just pre-process the text, as it will clearly break whenever the parser object is updated! Can someone tell me the correct forward-compatible way to pre-process article text in this manner? - Mark Clements (HappyDog).

15 years, 5 months

Local wikipedia server configuration

by Bilal Abdul Kader

Greetings, I am trying to setup a local box to run it as a local Wikipedia server. It would have Ubuntu 8.04 with PHP5 and Mysql 5. I would like to replicate all en.wikipedia with the history and conversation. This would be part of an academic research on the quality of Wikipedia as an information source. I need your input on the hardware configuration please. - RAM: 16 GB or 32 GB? - CPU: double quad-core (Xeon 2 Ghz) or 1 quad-core or 1 double-core? - I intented to have 4-5 TB of storage space. How much space you think I need for temporary tables? Is 300 GB enough or I should get more. The full database might be around 600 GB uncompressed. One user only is going to run queries on this server and some queries might span several entire tables. Thanks in advance for all your suggestions. bilal

15 years, 5 months

PHP 5.2.7RC5 Testing

by Ilia Alshanetsky

Hello! You are receiving this email because your project has been selected to take part in a new effort by the PHP QA Team to make sure that your project still works with PHP versions to-be-released. With this we hope to make sure that you are either aware of things that might break, or to make sure we don't introduce any strange regressions. With this effort we hope to build a better relationship between the PHP Team and the major projects. If you do not want to receive these heads-up emails, please reply to me personally and I will remove you from the list; but, we hope that you want to actively help us making PHP a better and more stable tool. The fifth & final (for the second time now ;-)) release candidate of PHP 5.2.7 was just released and can be downloaded from http://downloads.php.net/ilia/ . Please try this release candidate against your code and let us know if any regressions should you find any. Since the last release a few memory related issues were addressed, hopefuly we are finally at a conclusion of the release cycle. The goal is to have 5.2.7 by end of next week, so timely testing would be extremely helpful. In case you think that other projects should also receive this kinds of emails, please let me know privately, and I will add them to the list of projects to contact. Best Regards, Ilia Alshanetsky 5.2 Release Master

15 years, 5 months

autolink revision numbers

by Jack Bates

What is the easiest way to automatically link the pattern /r(\d+)/ to a URL like http://code.google.com/p/qubit-toolkit/source/detail?r=\1 wherever it appears in MediaWiki articles? For example, I want an article containing: ... blah blah r1632 blah blah... - to have r1632 automatically replaced with a link to http://code.google.com/p/qubit-toolkit/source/detail?r=1632

15 years, 5 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l November 2008