For those not spending 24hrs/day in #mediawiki, here are some updates:
The file cache has been turned off. It was largely redundant, since the
squids handle upstream caching, and it caused congestion on the NFS
server. Moving the file cache onto each server's local disk would be
possible but so far doesn't look like it'd be worth the effort.
Output compression has been enabled globally for browsers that support
it. (Formerly this was special-cased in the file cache as a side effect
of compressing the cached pages for disk savings.) Some browsers may be
briefly confused during the switchover if they get a 304 response for a
page that is _now_ compressed but wasn't before. Mozilla is known to
have such problems. Close the browser and restart; that will probably
clear it up. If not, give a shout.
The additional compression doesn't seem to hurt load on the servers,
and it'll cut bandwidth usage further.
-- brion vibber (brion @ pobox.com)
So far, just today, he has reverted [[DNA]] (again) to his version of the
article and then protected his version without adding {{msg:protected}} or
listing it at [[wikipedia:Protected page]]. When I updated the summary at
[[Wikipedia:Requests for comment/168]] to reflect this 168 protected that
page and when the protection was lifted by another admin he DELETED the page
(10 times so far) and blanked it a couple times.
Please see the summary at
http://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/168 (if it still
exists).
I hereby request that 168...'s sysop status be removed and for the matter to
go directly to the arbitration committee. Since I am on the arbitration
committee and a party to this dispute I will recuse myself.
-- Daniel Mayer (aka mav)
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
Almost all traffic is hitting browne now (coronelli isn't in the dns table
for the main domains), thing work quite well so far.
Live stats of browne & coronelli set up by jeronmim via snmp are online
at http://wmperf.mine.nu:8043/wmperf/index.org.wikimedia.all-squids.html.
You can see the cache grow...
Cheers
--
Gabriel Wicke
I'm heading to San Diego Tomorrow (Friday the 13th) to remove the
servers that are now out of use (most notably Gunther and Geoffrin).
I imagine I'll be in contact with Brion before I pullthe machines, but
feel free to pipe up if you have concerns...
--
"Jason C. Richey" <jasonr(a)bomis.com>
> While trying to retrieve the URL: http://de.wikipedia.org/wiki/Hauptseite
>
> The following error was encountered:
> Access Denied.
> Access control configuration prevents your request from being allowed at
> this time. Please contact your service provider if you feel this is incorrect.
> Your cache administrator is webmaster.
<span style="AYBABTU">
What happen?
</span>
Nils.
--
Created by 100 monkeys with 100 typewriters.
Jason, could you update the DNS entries for the various *.wikipedia.org
web servers to point at the new cache servers?
Ideally we should have two records so it'll hit both squids round-robin:
207.142.131.235
207.142.131.236
These are virtual addresses aliased by browne and coronelli. For
testing or failover one can take on both addresses.
-- brion vibber (brion @ pobox.com)
I've asked Jason to setup a wiktionary-l. I don't know if he has yet,
but when he has, I intend for there to be a big notice posted on
wikitionary.org, on the wikipedia-l and wikien-l mailing lists, as
well as on Wikipedia itself, inviting a sort of "global summit
meeting" to discuss some of the things that I outline below.
Wiktionary has been cranking along happily in a state of technical
neglect for quite some time.
There are currently 32,246 entries. That's enough that we must
preserve the work that's already done. It also precludes any change
of license, whether that's fortunate or unfortunate I don't know.
There is an active community there, with a lot of overlap to the
broader wikipedia community. The need to be consulted on any changes
that we implement.
They have an existing schema whereby they are doing in freeform text
just what we ought to try to help them formalize with actual database
functionality. One can only assume that their scheme is sometimes
followed inconsistently because human editing is inevitably
inconsistent. However, there appears to me to be enough consistency
that a semi-automated conversion process should be possible.
Anything that we do should favor the needs of editors over abstract a
priori desires for the end product. That is to say, if some fancy and
clever thing requires a lot of work from editors, we just skip it.
The editors are primary, or any wiki community will be destroyed.
At the same time, we should design a "structured wiki" with one eye on
campatibility with re-use. If there are existing XML schemas that
have prominence in the wider community, we should look to them as a
part of our design, even if we deliberately choose not to implement
every possible aspect in order to favor ease-of-use for editors.
Consider this for an example:
http://wiktionary.org/wiki/Vision
As a rank amateur database designer, I see several immediate
possibilities which would make an instant and easy improvement. Even
if we had a simple and less-than-ideal design, we could lay the
groundwork now for something better in the future.
I'm a huge fan of incremental change in cases like this. We'd like to
improve the software for the wiktionarians in a way that conforms to
how they like to edit, while laying the groundwork for further
revisions down the line.
--------
Consider a really bad database design, a 'flat file' design, or nearly
so.
word
AHD pronunciation
IPA pronunciation
SAMPA pronunciation
definition
synonym list
related terms list
translation list
This is a horrible design, with multi-valued fields, etc. It can be
improved in just a few minutes of work. But even this horrible design
would be better than freeform text.
Developer time and energy is at a premium (at least, until some clever
developer really takes this up as a cause!) and so simplicity is a
huge virtue. A little bit of fixing done soon, is better than an
imaging hypothetical perfect system that's too intimidating and never
gets off the ground.
--Jimbo
Here's the current rundown:
In California:
* larousse is running a squid proxy to the web servers in Florida
* gunther is making a backup dump
* ursula is not doing much of anything
* geoffrin, pliny and susan are very sad, wishing someone would take
away their pain
In Florida:
* suda is running a master database
* zwinger is serving mail and is the main fileserver for the other
machines
* bart, bayle, moreri, and vincent are running identically-configured
apaches. They share their work directories by NFS, so uploads etc
should remain quite in sync. The squid picks between then round-robin.
Right now only bart is running a memcached, but it should work to split
the cache over them all, and they'lll ask each other for the bits they
don't have.
* browne and coronelli have test squid installations that Gabriel is
setting up, testing failover systems etc.
* isidore isn't doing anything just yet but should soon start syncing
over a backup of zwinger's stuff.
So far things seem mostly in order. Zwinger's occasionally freezing up
for a couple seconds, may be something that does too much IO. Trying to
track this down.
Eventually we'll want to move the squids over to florida, which means
changing DNS. This'll cut out the 80ms round-trip across the continent
for every hit, as well as give us bigger beefier caches.
-- brion vibber (brion @ pobox.com)
"Axel Boldt" <axelboldt(a)yahoo.com> schrieb:
> Do we forbid certain spiders access to the site based on User-Agent? A
> user in a German forum reported recently that he couldn't access
> Wikipedia at all, always receiving a "Forbidden" message. It turned out
> that his webwasher proxy was to blame (an ad banner block). The proxy
> sends the User-Agent
>
> "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) WebWasher 3.0"
>
> Webwasher cannot be used to spider and download sites.
We forbid spiders based on User Agent, but WebWasher seems not to be
in the list. according to http://www.wikipedia.org/robots.txt, the
following User-Agents are disallowed:
UbiCrawler
DOC
Zao
sitecheck.internetseer.com
Zealbot
MSIECrawler
SiteSnagger
WebStripper
WebCopier
Fetch
Ofline Explorer
Teleport
TeleportPro
WebZIP
linko
HTTrack
Microsoft.URL.Control
Xenu
larbin
libwww
ZyBORG
Download Ninja
wget
grub-client
k2spider
NPBot
HTTrack
Furthermore I know that any request without a User Agent is refused.
There might be others, but someone who knows more about it than me
should check that.
Andre Engels
brion wrote:
>I can access pages on de.wikipedia.org using the above user-agent
>string, but don't have WebWasher to test with.
I'm using WebWasher and haven't had any problems with it yet. Maybe
the user misconfigured his custom WebWasher filter rules, but this
wouldn't give a "Forbidden" message, but something like
"WebWasher is configured to block the requested page
'http://de.wikipedia.org/'".
Daniel