Jimbo, look again:
http://en.wikipedia.org/stats
Dec figure is for 2002
Jan is Jan 2003
Both should not be there.
Obviously Webalizer mixed things up.
Erik Zachte
We should set permanent cookies on every pageview except saves, require
cookies for saving pages, assign random account names (anon2349bx29s) to
anonymous editors, and use cookies to block most users.
We should do away with IP numbers in page histories, recent changes etc.
completely.
We should retain the ability to block by IP in emergencies.
This would address several current problems and have several advantages.
1) Having users' IP numbers published all over the place is a quite
serious privacy violation. It would be trivial to scan recent changes for
hosts with open ports and security vulnerabilities. Furthermore, it
reveals geographic information about anonymous editors which they may want
to keep private (such information can be very specific, depending on the
ISP).
2) Banning anonymous users by IP affects anyone who also uses the same IP.
In case of proxies, this may be thousands of individuals. If the first
message we send a new user - because they share a vandal IP - is "You are
banned from editing for serious vandalism", that user is unlikely to
become a regular contributor. Even regulars are frequently pissed off
because they accidentally get blocked.
3) Banning users by IP is also ineffective, as for most users, it is
trivial to get a new dynamic IP address.
4) For repeat vandals, we can set a very high or unlimited expiry without
fear of blocking someone else.
5) Requiring cookies even for anons allows them to change their user
preferences even without creating an account.
6) We can more easily attribute edits to users and easily change anon
edits over to real accounts when people decide to create an account. This
may also address some copyright issues.
Now, regarding some possible criticisms:
1) "They will just delete the cookie and edit away." Yes, some users will
do that. For these users, we should retain the ability to block by IP
(without revealing that IP address to sysops). However, doing so requires
an understanding of how the blocking mechanism works, which most users
don't have. They will have to know how to *remove* cookies, not just
disable them. The user will have to keep deleting the cookie every time it
is re-blocked. And sysops don't have to be hesitant about blocking them,
because no other users can be affected by it. So we can in fact make this
a single-click operation, making it costly for the average user, and cheap
for us.
2) "I have cookies disabled for privacy reasons!" Then you can't be
editing Wikipedia non-anonymously. We already require cookies for signed
in users. Most modern browsers allow enabling cookies on a case-by-case
basis. If a user tries to edit a page without having cookies enabled, we
will let them know that they need to enable them. If you are concerned
about privacy, you should be more concerned about having IP addresses
publicized everywhere, even stored permanently in the page history.
3) "This won't help us to deal with the most egregious vandals." Maybe,
maybe not. A vandal using a script would have to do the same thing as a
malicious user -- get a fresh cookie from a regular pageview, use that
cookie to submit an edit, then discard the cookie. This isn't hard to do,
but I doubt the average kiddie will be able to figure it out. On the other
hand, we can build more extreme anti-vandalism measures on top of this,
like disabling edits by any completely new contributor (= not setting any
new cookies) for a few hours.
All in all, I think this would greatly reduce the time spent on fighting
vandalism, and allow us to focus on more important matters, like creating
an encyclopedia.
Regards,
Erik
Jimbo wrote:
>Did you see my idea of a table like this?
>
>user page throttle expiration
>jwales DNA 2 (timestamps go here)
>* Israel 3
>jwales Turkey 0
>plautus * 0
>wik * 3
That's a nice idea, but I'd suggest adding some more precision to the
throttle, similar to the way a cron job works, e.g.,
user page throttle.minute throttle.hour throttle.day
expiration
jwales DNA 2 10 25 (timestamp)
wik * * * 3
This would say that user jwales can make up to 2 edits per minute, up
to 10 per hour, and up to 25 per day on the DNA article, while user
wik can only make up to 3 edits per day on all articles combined.
--Sheldon Rampton
Just in case...
2 days ago, I made an edit on Erik page, and later I was notified that
while doing my edit, I removed an edit made by Erik 2 hours sooner than
I. I had no edit conflict. I did not see Erik edit on his page when I
edited it.
Yesterday, the same thing happened to Patrick with another person edit,
just made a couple of minutes sooner. He erased it without knowing.
Also, I frequently had in the past couple of days notification of new
messages. Then, I went on my talk page, and saw nothing new. I went in
history and saw someone edited it. I have to reload the page to see the
message apppear.
I suppose these three cases are all cache problems. With the cache also
on the edit window. But why recently ? Has this been noticed by others ?
PS : this is embarassing, because some editors may believe the removal
of text was done on purpose :-(
ant
Hello,
I would like to say that actually we have a single point of faillure. The DB server.
We only have one dual-opteron and geoffrin is not yet back. We 'really' need a backup db server which could replace the actual one. None of the others servers could do that.
Jimbo, if you can, try to move those things, or think about getting another dual-opteron.
Thanks
Shaihulud
I've checked into the head branch some changes to the link tables. They
now all use a key on cur_id for the *_from column instead of strings,
and have a unique index to force prevent any duplicate entries. There's
not yet a clean step in the update script, so just clear out your links
tables (patch-linktables.sql) and rebuild them with refreshLinks.php.
This saves trouble in a number of places where we can now do joins with
the link tables to get other info (such as cur_is_redirect!) as well as
the name, and fewer bits need to be juggled on page renaming, as
outgoing links no longer have to be changed (cur_id remains the same
when a page is renamed).
rebuildLinks.inc and some of the tools in the 'maintenance page' still
need to be updated to work with the new setup. (Special:Maintenance
needs a *lot* of cleanup in general. It's kind of a catch-all of
vaguely defined features which suck performance like a hydroelectric
dam.)
Also I've slipped in some extra debug code. And, I think 'indexes.sql'
is a big waste of time and should all be moved into tables.sql.
Building indexes separately doesn't help on InnoDB and won't do
anything on MyISAM either if you're just going to replace the table
after it's built with an imported one from a dump which creates it with
indexes.
Note that one of the driving forces behind schema changes here is size
& number of rows to change. We've had some troubles where someone tries
to rename a page with a _very_ long edit history and the wiki gets a
little lost doing the updates. Changing a username and reassigning the
marked edits can be similarly problematic when a lot of edits have been
made. Ideally, such operations shouldn't be too 'big'... A rename
shouldn't have to touch potentially thousands of old_title fields, when
we can change just one and let the unchanging numeric page id link the
pages to it.
It might actually be a good idea to merge links and brokenlinks into a
single links table that looks like this:
l_from -> key to cur_id
l_to_ns, l_to_title -> key to cur_namespace, cur_title
This would avoid any need to alter the links table on page rename,
creation, or deletion: outgoing links are fixed to the page id, and
incoming links are fixed to the page name. It's late and I can't think
right now so I'm not sure if this would interfere terribly with
operations that need to treat live and broken links differently; it
would require a join to cur and a check for existence or null. For page
rendering duties we can cache that lookup data in linkscc as we do now,
of course.
-- brion vibber (brion @ pobox.com)
After being slow for most of the day suda has become completely
unresponsive now, the shells are frozen, new ssh connects get an error
message.
We hope to get gunther up as a replacement, but this will take a while as
it's not in sync.
Brion- if you're around- we need you!
--
Gabriel Wicke
In some e-mail I received this from a user:
"Lately Wikipedia is having serious technical
difficulties where the article text and page
history are not in synch with each other."
I don't know why the user has this problem etc,
but I thought I should let you know about.
--Optim
__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you�re looking for faster
http://search.yahoo.com
The german squid is ready to go, pending dns updates.
The ttl for the de.wp.org entry should be short to allow a fast dns switch
in case of trouble with the german machine. This depends on the dns server
entry getting finally switched from register.com to zwinger. Same for load
balancing, peaks like the ones after tv reports only hit one of the squids
currently while the load will be distributed after the dns switch.
--
Gabriel Wicke
Gabriel-
> This would defeat all caching, we use 'Vary: Accept-Encoding, Cookie'.
Yes, if we want to allow anons to set preferences, we'd have to think
about ways to integrate the cookie system into the cache. That'd be a
longterm project.
But for vandalism prevention alone, we don't need that. We can just set
the cookie on the "edit" page, which is not cached. Remember, the only
action we need to require the cookie for is saving pages.
Regards,
Erik