Hi everyone,
I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
Kind Regards,
Hugo Vincent,
Bluewater Systems.
I've been putting placeholder images on a lot of articles on en:wp.
e.g. [[Image:Replace this image male.svg]], which goes to
[[Wikipedia:Fromowner]], which asks people to upload an image if they
own one.
I know it's inspired people to add free content images to articles in
several cases. What I'm interested in is numbers. So what I'd need is
a list of edits where one of the SVGs that redirects to
[[Wikipedia:Fromowner]] is replaced with an image. (Checking which of
those are actually free images can come next.)
Is there a tolerably easy way to get this info from a dump? Any
Wikipedia statistics fans who think this'd be easy?
(If the placeholders do work, then it'd also be useful convincing some
wikiprojects to encourage the things. Not that there's ownership of
articles on en:wp, of *course* ...)
- d.
>
> Message: 8
> Date: Fri, 12 Oct 2007 17:59:22 +0200
> From: GerardM <gerard.meijssen(a)gmail.com>
> Subject: Re: [Wikitech-l] Primary account for single user login
>
> Hoi,
> This issue has been decided. Seniority is not fair either; there are
> hundreds if not thousands of users that have done no or only a few edits and
> I would not consider it fair when a person with say over 10.000 edits should
> have to defer to these typically inactive users.
1. Yes, it's not fair, but this is the truth on wikimedia project that ones
have to admit. Imagine if, all wikimedia sites has a single user login
since when it is first established, the one who first register will own that
username for all wikimedia sites.
2. The person with less edits, doesn't mean that they are less active than the
one with more edits. And according to,
http://en.wikipedia.org/wiki/Wikipedia:Edit_count,
``Edit counts do not necessarily reflect the value of a user's contributions
to the Wikipedia project.''
What if, some users have less edits count,
* since they deliberately edit, preview, edit, and preview the articles,
over and over, before submitting the deliberated versions to wikimedia
sites.
* Some users edit, edit and edit the articles in their offline storage, over
and over, before submitting the only final versions to wikimedia sites.
While some users have more edits count,
* since they often submit so many changes, without previewing it first, and
have to correct the undeliberated edit, over and over.
* Some users often submit so many minor changes, over and over, rather than
accumulate the changes resulting in fewer edits count.
* Some users do so many robot routines by themselves, rather than letting
the real robot to do those tasks.
* Some users often take part in many edit wars.
* Some users often take part in many arguments in many talk pages.
What if, the users with less edits count, try to increase their edits count
to take back the status of primary account.
What if, they decide to change their habit of editing, to increase the
edits count,
* by submitting many edits without deliberated preview,
* by splitting the accumulated changes into many minor edits, and submit
them separately,
* by stopping their robots, and do those robot routines by themselves,
* by joining edit wars.
3. According to 2) above, I think, the better measurement of activeness is to
measure the time between the first edit and the last edit of that username.
The formula will look like this,
activeness = last edit time - first edit time
>
> A choice has been made and as always, there will be people that will find an
> un-justice. There were many discussions and a choice was made. It is not
> good to revisit things continuously, it is good to finish things so that
> there is no point to it any more.
>
> Thanks,
> GerardM
>
> On 10/12/07, Anon Sricharoenchai <anon.hui(a)gmail.com> wrote:
> >
> > According to the conflict resolution process, that the account with
> > most edits is selected as a primary account for that username, this
> > may sound reasonable for the username that is owned by the same person
> > on all wikimedia sites.
> >
> > But the problem will come when the same username on those wikimedia
> > sites is owned by different person and they are actively in used.
> > The active account that has registered first (seniority rule) should
> > rather be considered the primary account.
> > Since, I think the person who register first should own that username
> > on the unified
> > wikimedia sites.
> >
> > Imagine, what if the wikimedia sites have been unified ever since the
> > sites are
> > first established long time ago (that their accounts have never been
> > separated),
> > the person who register first will own that username on all of the
> > wikimedia
> > sites.
> > The person who come after will be unable to use the registered
> > username, and have
> > to choose their alternate username.
> > This logic should also apply on current wikimedia sites, after it have
> > been
> > unified.
> >
Why is Special:Wantedpages not updated in Wikimedia sites since 3 September? If
it is too expensive to generate it in large wikis (especially enwiki), could it
be re-enabled for smaller wikis?
Thanks.
Hi,
today we came over 10k HTTP requests per second (even with inter-squid
traffic eliminated). Especially thanks to Mark and Tim, who've been
improving our caching, as well as doing lots of other work, and
achieved incredible results (while I was slacking). Really, thanks!
Domas
Hello
The following problem I have also found in google, alas no solution.
When trying to compile texvc on a MacX
I obtain the following error message
ld: undefined symbols -sprintf@LDBLStub
which looks like a problem with certain links.
Anybody can help
Thanks and regards
Uwe Brauer
The last pages-meta-current.xml.bz2 in 20071018 is said to be 5.4GB, but
when I downloaded it, it was only 1.5 GB.
Why is that? Is there a problem with this dump? I saw there was a lot of
discussion about it. What should I do?
Thanks
P Please consider the environment before printing this e-mail
We seem to have an issue with users editing on our site with Firefox
(reproducible on a Mac, not on Windows)
After posting the edit, the server responds with a redirect:
HTTP/1.x 302 Moved Temporarily
Location: http://www.wikihow.com/Make-Dumplings
FF doesn't seem to reissue a request for this article though, and
shows the old copy of it without going back to the server. I've double
checked this using LiveHTTPHeaders.
This seems odd, does anyone have any idea why this would happen? It
also seems dependent on time, if the users makes submits an edit
within 30 seconds of loading the article, they get the stale version,
but if they wait longer, they are more likely to get the fresh
version. Other browsers do get the fresh copy if loaded separately
with the URL.
The normal 200 response for the article includes the cache information:
Cache-Control: s-maxage=86400, must-revalidate, max-age=0
but the 302 redirect doesn't include any info about must-revalidate.
Could this be an issue?
This didn't seem to be an issue earlier than today, we did upgrade to
PHP 5.2.4/eaccelerator 0.9.5.2, I'm not sure if this is related.
Thanks,
Travis
If I want to load both the meta-current-pages and the article-pages, do
I have to create two separate databases? From the MWDumper, I
understand that text, revision and page tables must be empty. "Hint: The
tables 'page', 'revision' and 'text' must be empty for a successful
import."
Aren't they supposed to complete each other?
P Please consider the environment before printing this e-mail
I have an idea for an improvement to the system of redirects, by using
pattern-based aliases. We've discussed it a bit on
wikien-l where it has some support, so I'm posting here to find out:
a) If it's feasible (ie, is not computationally too expensive)
b) How much work is required to implement it
c) If it was implemented, whether it would be enabled at Wikipedia
d) If anyone is interested in actually implementing it. If not, I may have a
go myself.
The problem:
Many pages require a largeish number of
redirects, to cope with differences in spelling, optional words,
accented characters etc. It's a surprising amount of work to create
and maintain these,
when the value of each individual redirect is so low. For example,
[[Thomas-François Dalibard]] might be spelt four ways, each requiring a
redirect: Thomas-Francois Dalibart, Thomas François Dalibard, Thomas
Francois Dalibard.
General solution:
Instead of having redirects that point to a page, have the page itself
specify aliases which can be used to find it. This is specified as a
pattern, like a very cut-down regexp: #ALIASES Thomas[-]Fran[ç|c]ois
Dalibard
The proposed syntax would be as follows (but is debatable):
Foo - matches Foo
[Foo] - matches Foo or blank.
[Foo|Moo] - matches Foo or Moo.
[Foo|Moo|] or [|Foo|Moo] - matches Foo or Moo or blank.
Foo\[Moo - matches the literal string Foo[Moo
All whitespace is equivalent to a single space. So "Boo [Foo]
[Moo] Woo" matches "Boo Woo", rather than
"Boo<space><space><space>Woo" for instance.
When a user searches for a term (using "Go"), MediaWiki would perform a
normal query first, and if that fails, do an alias-based search. Thus:
- Search term matches no real pages, no aliases: takes you to some search
results.
- Search term matches one real page, no aliases: takes you to real page.
- Search term matches one real page, some aliases: takes you to real page.
(Arguably gives you a "did you mean...?" banner, but not critical)
- Search term matches one alias, no real page: takes you to page.
- Search term matches several aliases, no real
page: either an automatically generated disambiguation page, or shows you
search results with the matching aliases shown first.
An automatically generated disambiguation page could make use of some other
hypothetical keyword like {{disambig|A 19th century novelist best known for
...}}. So embedding in search results might be simpler,
and would work well if it could be forced to show the first sentence or two
from the article.
Unresolved issues:
* Since pattern matching is prone to abuse, the total number of matching
aliases should
be restricted in some way, perhaps to 10 or 20. The best way to handle
an excessively broad query (eg, [A|b|c|d|e][A|b|c|d|e] etc) is left
as an open question. Possibiliies include silently failing, noisily failing
(with error message in rendered text), a special page for bad aliases...
* Whether there should just be one #ALIASES statement, or whether multiple
would be allowed. Allowing several would be much more beginner friendly -
they could simply state all the intended redirects explicitly.
* The role of redirects once this system is in place. One possible
implementation would simply create and destroy redirects as required. In any
case, they would still be needed for some licensing issues.
Possible implementation:
Without knowing the MediaWiki DB schema at all, I speculated on a possible
implementation that would be a good tradeoff between size and speed. Two new
tables are needed:
AliasesRaw would contain a constantly updated list of the actual aliases
patterns used in articles. Each time an article is saved, this would
possibly be updated.
AliasesExpanded would contain expansions of these aliases, either fully or
partially. So an expansion of #ALIASES [City of ][Greater ]Melbourne[,
Victoria| (Australia)] to 5 characters would lead to three rows:
"City ","of [Greater ]Melbourne[, Victoria| (Australia)]"
"Great", "er Melbourne[, Victoria| (Australia)]"
"Melbo", "urne[, Victoria| (Australia)]
This means that if a user searches for "Greater Melbourne", then the search
process would go something like:
- Look for an article called Greater Melbourne, GREATER MELBOURNE, greater
melbourne (as present) - assume this fails.
- Look up "Great" in the AliasesExpanded table. Now iterate over the
matching results, finding one that matches.
Obviously the number of characters stored in the expanded aliases could be
tuned.
I look forward to any comments,
Steve