Wikitech-l October 2007

wikitech-l@lists.wikimedia.org

105 participants
141 discussions

by Hugo Vincent

Hi everyone, I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent. I would prefer a script written in Python, but any recommendations would be very welcome. Do you know of anything suitable? Kind Regards, Hugo Vincent, Bluewater Systems.

11 years, 10 months

Replacement stats for placeholder images?

by David Gerard

I've been putting placeholder images on a lot of articles on en:wp. e.g. [[Image:Replace this image male.svg]], which goes to [[Wikipedia:Fromowner]], which asks people to upload an image if they own one. I know it's inspired people to add free content images to articles in several cases. What I'm interested in is numbers. So what I'd need is a list of edits where one of the SVGs that redirects to [[Wikipedia:Fromowner]] is replaced with an image. (Checking which of those are actually free images can come next.) Is there a tolerably easy way to get this info from a dump? Any Wikipedia statistics fans who think this'd be easy? (If the placeholders do work, then it'd also be useful convincing some wikiprojects to encourage the things. Not that there's ownership of articles on en:wp, of *course* ...) - d.

14 years, 6 months

Re: [Wikitech-l] Primary account for single user login

by Anon Sricharoenchai

> > Message: 8 > Date: Fri, 12 Oct 2007 17:59:22 +0200 > From: GerardM <gerard.meijssen(a)gmail.com> > Subject: Re: [Wikitech-l] Primary account for single user login > > Hoi, > This issue has been decided. Seniority is not fair either; there are > hundreds if not thousands of users that have done no or only a few edits and > I would not consider it fair when a person with say over 10.000 edits should > have to defer to these typically inactive users. 1. Yes, it's not fair, but this is the truth on wikimedia project that ones have to admit. Imagine if, all wikimedia sites has a single user login since when it is first established, the one who first register will own that username for all wikimedia sites. 2. The person with less edits, doesn't mean that they are less active than the one with more edits. And according to, http://en.wikipedia.org/wiki/Wikipedia:Edit_count, ``Edit counts do not necessarily reflect the value of a user's contributions to the Wikipedia project.'' What if, some users have less edits count, * since they deliberately edit, preview, edit, and preview the articles, over and over, before submitting the deliberated versions to wikimedia sites. * Some users edit, edit and edit the articles in their offline storage, over and over, before submitting the only final versions to wikimedia sites. While some users have more edits count, * since they often submit so many changes, without previewing it first, and have to correct the undeliberated edit, over and over. * Some users often submit so many minor changes, over and over, rather than accumulate the changes resulting in fewer edits count. * Some users do so many robot routines by themselves, rather than letting the real robot to do those tasks. * Some users often take part in many edit wars. * Some users often take part in many arguments in many talk pages. What if, the users with less edits count, try to increase their edits count to take back the status of primary account. What if, they decide to change their habit of editing, to increase the edits count, * by submitting many edits without deliberated preview, * by splitting the accumulated changes into many minor edits, and submit them separately, * by stopping their robots, and do those robot routines by themselves, * by joining edit wars. 3. According to 2) above, I think, the better measurement of activeness is to measure the time between the first edit and the last edit of that username. The formula will look like this, activeness = last edit time - first edit time > > A choice has been made and as always, there will be people that will find an > un-justice. There were many discussions and a choice was made. It is not > good to revisit things continuously, it is good to finish things so that > there is no point to it any more. > > Thanks, > GerardM > > On 10/12/07, Anon Sricharoenchai <anon.hui(a)gmail.com> wrote: > > > > According to the conflict resolution process, that the account with > > most edits is selected as a primary account for that username, this > > may sound reasonable for the username that is owned by the same person > > on all wikimedia sites. > > > > But the problem will come when the same username on those wikimedia > > sites is owned by different person and they are actively in used. > > The active account that has registered first (seniority rule) should > > rather be considered the primary account. > > Since, I think the person who register first should own that username > > on the unified > > wikimedia sites. > > > > Imagine, what if the wikimedia sites have been unified ever since the > > sites are > > first established long time ago (that their accounts have never been > > separated), > > the person who register first will own that username on all of the > > wikimedia > > sites. > > The person who come after will be unable to use the registered > > username, and have > > to choose their alternate username. > > This logic should also apply on current wikimedia sites, after it have > > been > > unified. > >

16 years

Updates for Special:Wantedpages

by Rotem Liss

Why is Special:Wantedpages not updated in Wikimedia sites since 3 September? If it is too expensive to generate it in large wikis (especially enwiki), could it be re-enabled for smaller wikis? Thanks.

16 years, 3 months

10k

by Domas Mituzas

Hi, today we came over 10k HTTP requests per second (even with inter-squid traffic eliminated). Especially thanks to Mark and Tim, who've been improving our caching, as well as doing lots of other work, and achieved incredible results (while I was slacking). Really, thanks! Domas

16 years, 4 months

problems with texvc on Mac X

by Uwe Brauer

Hello The following problem I have also found in google, alas no solution. When trying to compile texvc on a MacX I obtain the following error message ld: undefined symbols -sprintf@LDBLStub which looks like a problem with certain links. Anybody can help Thanks and regards Uwe Brauer

16 years, 5 months

Dump is small

by Osnat Etgar

The last pages-meta-current.xml.bz2 in 20071018 is said to be 5.4GB, but when I downloaded it, it was only 1.5 GB. Why is that? Is there a problem with this dump? I saw there was a lot of discussion about it. What should I do? Thanks P Please consider the environment before printing this e-mail

16 years, 5 months

problem on our wiki with FF

by Travis Derouin

We seem to have an issue with users editing on our site with Firefox (reproducible on a Mac, not on Windows) After posting the edit, the server responds with a redirect: HTTP/1.x 302 Moved Temporarily Location: http://www.wikihow.com/Make-Dumplings FF doesn't seem to reissue a request for this article though, and shows the old copy of it without going back to the server. I've double checked this using LiveHTTPHeaders. This seems odd, does anyone have any idea why this would happen? It also seems dependent on time, if the users makes submits an edit within 30 seconds of loading the article, they get the stale version, but if they wait longer, they are more likely to get the fresh version. Other browsers do get the fresh copy if loaded separately with the URL. The normal 200 response for the article includes the cache information: Cache-Control: s-maxage=86400, must-revalidate, max-age=0 but the 302 redirect doesn't include any info about must-revalidate. Could this be an issue? This didn't seem to be an issue earlier than today, we did upgrade to PHP 5.2.4/eaccelerator 0.9.5.2, I'm not sure if this is related. Thanks, Travis

16 years, 5 months

The tables 'page', 'revision' and 'text' must be empty

by Osnat Etgar

If I want to load both the meta-current-pages and the article-pages, do I have to create two separate databases? From the MWDumper, I understand that text, revision and page tables must be empty. "Hint: The tables 'page', 'revision' and 'text' must be empty for a successful import." Aren't they supposed to complete each other? P Please consider the environment before printing this e-mail

16 years, 5 months

MediaWiki aliases feature proposal

by Steve Bennett

I have an idea for an improvement to the system of redirects, by using pattern-based aliases. We've discussed it a bit on wikien-l where it has some support, so I'm posting here to find out: a) If it's feasible (ie, is not computationally too expensive) b) How much work is required to implement it c) If it was implemented, whether it would be enabled at Wikipedia d) If anyone is interested in actually implementing it. If not, I may have a go myself. The problem: Many pages require a largeish number of redirects, to cope with differences in spelling, optional words, accented characters etc. It's a surprising amount of work to create and maintain these, when the value of each individual redirect is so low. For example, [[Thomas-François Dalibard]] might be spelt four ways, each requiring a redirect: Thomas-Francois Dalibart, Thomas François Dalibard, Thomas Francois Dalibard. General solution: Instead of having redirects that point to a page, have the page itself specify aliases which can be used to find it. This is specified as a pattern, like a very cut-down regexp: #ALIASES Thomas[-]Fran[ç|c]ois Dalibard The proposed syntax would be as follows (but is debatable): Foo - matches Foo [Foo] - matches Foo or blank. [Foo|Moo] - matches Foo or Moo. [Foo|Moo|] or [|Foo|Moo] - matches Foo or Moo or blank. Foo\[Moo - matches the literal string Foo[Moo All whitespace is equivalent to a single space. So "Boo [Foo] [Moo] Woo" matches "Boo Woo", rather than "Boo<space><space><space>Woo" for instance. When a user searches for a term (using "Go"), MediaWiki would perform a normal query first, and if that fails, do an alias-based search. Thus: - Search term matches no real pages, no aliases: takes you to some search results. - Search term matches one real page, no aliases: takes you to real page. - Search term matches one real page, some aliases: takes you to real page. (Arguably gives you a "did you mean...?" banner, but not critical) - Search term matches one alias, no real page: takes you to page. - Search term matches several aliases, no real page: either an automatically generated disambiguation page, or shows you search results with the matching aliases shown first. An automatically generated disambiguation page could make use of some other hypothetical keyword like {{disambig|A 19th century novelist best known for ...}}. So embedding in search results might be simpler, and would work well if it could be forced to show the first sentence or two from the article. Unresolved issues: * Since pattern matching is prone to abuse, the total number of matching aliases should be restricted in some way, perhaps to 10 or 20. The best way to handle an excessively broad query (eg, [A|b|c|d|e][A|b|c|d|e] etc) is left as an open question. Possibiliies include silently failing, noisily failing (with error message in rendered text), a special page for bad aliases... * Whether there should just be one #ALIASES statement, or whether multiple would be allowed. Allowing several would be much more beginner friendly - they could simply state all the intended redirects explicitly. * The role of redirects once this system is in place. One possible implementation would simply create and destroy redirects as required. In any case, they would still be needed for some licensing issues. Possible implementation: Without knowing the MediaWiki DB schema at all, I speculated on a possible implementation that would be a good tradeoff between size and speed. Two new tables are needed: AliasesRaw would contain a constantly updated list of the actual aliases patterns used in articles. Each time an article is saved, this would possibly be updated. AliasesExpanded would contain expansions of these aliases, either fully or partially. So an expansion of #ALIASES [City of ][Greater ]Melbourne[, Victoria| (Australia)] to 5 characters would lead to three rows: "City ","of [Greater ]Melbourne[, Victoria| (Australia)]" "Great", "er Melbourne[, Victoria| (Australia)]" "Melbo", "urne[, Victoria| (Australia)] This means that if a user searches for "Greater Melbourne", then the search process would go something like: - Look for an article called Greater Melbourne, GREATER MELBOURNE, greater melbourne (as present) - assume this fails. - Look up "Great" in the AliasesExpanded table. Now iterate over the matching results, finding one that matches. Obviously the number of characters stored in the expanded aliases could be tuned. I look forward to any comments, Steve

16 years, 5 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l October 2007