Eden Akhavi wrote:
>If we are talking Europe, I think the key here is to consider where the
>traffic comes from and who has good connectivity to that audience.
>
>
[analysis of network traffic by country snipped]
I'm not sure for our goals it's that big a deal which network we're on,
so long as it's a decent one. If the bottleneck in Wikipedia's
performance were network latency, we'd be in pretty good shape.
-Mark
Folks,
Is there a way for a specific user (not a sysop or developer)
to set their own accesskey-xxx settings, or to disable accesskeys in
mediawiki pages altogether?
--
John Fader
Historically, there are several problems with the cur & old SQL dumps
we've provided:
* It's hard to import a dump from MySQL into another DBMS like
PostgreSQL without some filtering.
* Reading the SQL dumps directly into external tools, like Erik Zachte's
statistics scripts, is a pain in the butt.
* Compressed text storage makes it even harder to use the dumps outside
of MediaWiki, or even in if PHP is missing the zlib module.
The new schema in MediaWiki 1.5 exacerbates these:
* No easy way to get only current revisions from raw table dumps without
downloading a lot of extra junk.
* Some text may not even *be* there -- conversion will add references to
the leftover cur table, and we expect to start storing bulk text outside
the database entirely at some point.
* Deleted text isn't automatically removed from the text table, so a raw
dump is not safe for distribution.
For some time I've planned to replace the cur+old dumps with a dump
using the Special:Export XML stream format[1]. This is a simple XML
wrapper around the page/revision model which reflects the data in our
schema without being tied to the actual table layout itself.
This should be more accessible to external tools: most programming
language environments provide a stream-friendly XML parser and can
easily slurp in and process the data without a lot of hacking.
The XML dump is also smaller: compressed blobs are opaque to the SQL
dumps, but here they're expanded and recompressed more efficiently. In
my testing, the gzipped full XML dump of nl.wikipedia.org is about 1/3
smaller than the gzipped SQL cur+old dumps. (A cur-only dump is about
the same size compressed as the SQL version.)
Dumps can be generated with maintenance/dumpBackup.php, which I've just
checked in. It can create both full-history and current-only dumps.
I still need to finish up an importer script using the Special:Import
framework. Also, a standalone .xml.gz->to->database importer tool might
be a useful tool / demo / exercise.
[1] http://meta.wikimedia.org/wiki/Help:Export
Note that final dumps will include page, revision, and user id numbers
which are not reflected in the old sample markup at that page.
-- brion vibber (brion @ pobox.com)
On Sat, May 07, 2005 at 01:28:23AM +0200, Eden Akhavi wrote:
> > Seriously, though: I do not understand how setting up
> > colocations in other countries is going to accomplish
> > anything that putting the same hardware into the Florida colo
> > wouldn't already accomplish.
>
> Disaster recovery, power failure (UPS/Generator failure), fibre cut, ISP
> going bankrupt leading to lock out, international fibre system failure,
> faster access for non US customers, etc.
Unless there's data redundancy in these areas (whether by way of
redundant databases or redundant caches), separate locations are not
going to provide that sort of capability very much beyond simply having
one additional datacenter for purposes of quick replacement of services
in another location in case of disaster. If they're sufficiently
distant from one another, two centers would provide excellent location
redundancy for most purposes.
Data redundancy in more than two locations would be fantastic. That's a
separate issue from the discussions of linguistically segregated
locales, though, as putting the German Wikipedia in Germany and the
Korean Wikipedia in Korea (for instance) would not address data
redundancy one whit.
>
> Florida is not an ideal location either from a US network perspective, most
> of the European landings are in NY; whereas Asian landings are California.
> >From memory I think the Columbus cable system is the only transatlantic
> fibre to land in Florida which connects to Tarifa in Southern Spain; and the
> Spanish end is not idea for European connectivity.
I'm not sure what difference it makes for a single datacenter, frankly.
It's convenient for it to be in Florida because that's where the offices
are, of course. Additional datacenters might perhaps be best located
near the Eastern Seaboard and the West Coast areas, plus perhaps
something more midwest-ish, just for purposes of covering the bases, if
we want multiple locations -- though cost effectiveness of locations
would need to be considered as well (San Francisco and New York City
would be abysmal choices for purposes of cost). I don't see how being
near international airports and major shipping ports for Asian and
Eurpoean traffic would really matter too much, though. The beauty of
the Internet is that only your bandwidth really matters all that much
when determining location.
--
Chad Perrin
[ CCD CopyWrite | http://ccd.apotheon.org ]
On Fri, May 06, 2005 at 05:13:17PM +0200, Eden Akhavi wrote:
> > Belnet/Belgium -- 1 rack of space, unlimited bandwidth, they are ready
> > to go Monday, they can do full hands-on, etc., including replacing
> > borken hard drives and so on like that. They are excited to move
> > forward quickly. In this case, we must supply the hardware. We can
> > either buy hardware (with the German money?) or I can ask someone to buy
> > it for us (see Big Company X, below).
>
> > Amsterdam - a large NGO wants to do a big press announcement when I'm
> > there in Holland at the end of this month. They are providing a set of
> > servers which have already been ordered. I do not know the exact
> > specifications, perhaps someone else can tell me?
>
> If we are talking Europe, I think the key here is to consider where the
> traffic comes from and who has good connectivity to that audience.
> [detailed discussion]
Looking from Germany, the two data centers seem to be well connected.
Traceroute shows direct connection from all major providers, using
peering points in Amsterdam an London. Ping times are in the 20-25ms
range. The ping time to Florida is 130ms. LINX-connected Janet is
about 22ms, quite similar to the two data centers in benelux.
So from a network POV, I think that the Benelux locations are quite
good.
Regards,
JeLuF
Hi,
I am not sure if this is the right mailing list to post this to so my
apologies ahead of time if it is not. I have been working hard to get
texvc working correctly on my installation of mediawiki. It has
compiled correctly and when used from a command line works entirely
correctly. But, when it is called from math.php it does not work
properly. It produces a tex file but nothing else. The best that I can
tell, it does not seem able to properly use latex, dvips and convert. I
am working on a shared server (freebsd and apache). So this might
provide some limitations on shell commands executed from PHP. Any help
would be greatly appreciated.
Thanks
http://meta.wikipedia.org/wiki/Enotif documentation
http://meta.wikimedia.org/wiki/Enotif#Download download
http://test.leuksman.com/index.php/Enotif helpdesk hotpage
The ENotif version for MediaWiki 1.4.4 is basically a maintenance
release based on the current 1.4.4 code. It still uses the old and now
abandoned method "e-mail authentication" for verifying e-mail addresses
(in mw1.5., the method is changed and called "confirmation") . The
feature can be fully disabled during the installation procedure and also
afterwards by setting $wgEmailAuthentication = false (in
DefaultSettings.php or LocalSettings.php)
All newer ENotif versions for MediaWiki 1.5.x use the better "EConfirm"
method, wherein a token is mailed to the user.
T.
===In the spirit of redundancy and availability===
A project suggestion:
Develop a completely separate and redundant project to maintain our
own fast, quickly-updated static mirror of Wikipedia content. We
could redirect visitors to this mirror, via DNS if necessary, in case
of any real catastrophe; it would also be useful when the primary site
is slow.
Advantages: it's redundant. Improves catastrophe protection.
Improves availability of data and discussions stored on-wiki. Further
search improvements? '''Could be handled entirely separate from core
cluster work; offloaded onto non-core devs.'''
Disadvantages: it's redundant. More work
===In the spirit of failsafes and backups===
A system to periodically store entire database snapshots (every
month?), to recover from subtle, undetected database corruption. (my
impression was that this is not done already)
--
+sj+