lcrocker(a)nupedia.com wrote:
>>20020721000038 0194.499 /wiki/Astronomer
>>20020721001207 0018.859 /wiki/Astronomer
>>20020721002017 0122.170 /wiki/Astronomer
>>20020721002354 0171.431 /wiki/Astronomer
>>20020721002825 0110.615 /wiki/Astronomer
>>20020721003134 0068.176 /wiki/Astronomer
>>20020721003758 0159.654 /wiki/Astronomer
>>
>>**Most** accesses to this particular page seem to have taken
>>**over 1 minute**, and yet there appears to be nothing
>>pathological about the contents of the page. This is interesting!
>>
>>
>
>The "Astronomer" page contains an extraordinary number of links,
>mostly to "year" pages. Each link on a page requires a database
>lookup (a quick one, but still a lookup). "Current events" had
>the same problem before I reorganized it. I'm inclined to just
>remove the year links here. They don't really serve much purpose.
>
Argh! I didn't notice that. Thanks for pointing that out.
Wouldn't page caching sort this, though?
Neil
I'm all for it. I'm going to repoint all the .org domains to the new machine, and
we can start using them in a couple of days, after we are sure that they point to the
right place.
All the old .com urls must be supported forever, of course. But we can make the
default be .org.
Tomasz Wegrzanowski wrote:
> On Mon, Jul 22, 2002 at 10:38:56AM -0700, Jimmy Wales wrote:
> > (Moved this to tech.)
> >
> > Lee, what "many other things" will reassigning wikipedia.com affect?
> >
> > Fortunately, we have no wikipedia webmail, so that complication doesn't exist.
> >
> > I *think* I can change wikipedia.com, while leaving *.wikipedia.com to the old
> > machine. This is important until we move the Intl wikis over.
> >
> > We need a migration plan for those, too, of course.
>
> Shouldn't we switch from .com to .org or .net now ?
> .com url is bad for PR.
> Some special pages are still moderately slow (particularly "wanted"
> and "random page"), but the real time hogs now are very long pages
> with lots of links.
I looked at the random page code, and right now it fetches a complete
list of all article IDs, in order to pick one out randomly. There must
be a better way to do this. This should be an O(log n) operation, not O(n).
The "wanted" special page will get a lot faster if we implement
Jan's idea of a table recording the number of broken links to every
unwritten article.
If long pages with lots of links cause trouble, maybe we should revive
the caching idea of the current code: the cur table gets another
column cur_cache where we store the rendered HTML. When displaying an
article, we simply pump out the contents of cur_cache, or, if cur_cache
is empty, we render, display and store in cur_cache. If a newly saved
article necessitates the updating of links, we junk the cur_cache of
all affected articles.
Axel
This is an attempt to divide the special functions and actions into
functional groups, whilst still preserving the general frequency order.
Top picks:
Main Page
Edit this page
Recent changes
Random page
History
Current events
What links here
Special pages
Statistics
Login area:
Preferences
User login
Log out
User pages only:
This user's contributions
Special pages in general - these are all global:
List uploaded images
Most wanted pages
List of all pages
Upload image files
Show orphan pages
Show most popular pages
Blocked IP addresses
List all users
List unused images
List new pages
List of long pages
List of short pages
Page-related, but less common:
Print this page
Recent changes in linked pages
Logged-in user page-related features:
Watch list
Watch this page
Stop watching this page
Move this page
Delete this page
Here are some statistics for special pages and actions taken from
today's real user data. This should help with ergonomic design for the
user interface.
I will be able to re-run these tests later in the week, when we have
more data.
Observations:
* pages are opened for edit roughly 5 times as often as their edits are
submitted
* some of the CamelCase special pages listed do not actually exist on
the new server: they are presumably links referenced on the old server
main page being given to the new server.
Here are some ordinary pages for comparison:
Main Page (2320 views)
Biographical Listing/A (152 views)
Current events (105 views)
In descending order of frequency, the special pages and actions are:
1241 action=edit
760 Special:Recentchanges
293 Special:Randompage
263 action=history
240 action=submit
141 Special:Preferences
110 Special:Userlogin
83 Special:Whatlinkshere
73 Special:Specialpages
69 Special:Statistics
60 Special:Imagelist
54 Special:Watchlist
48 Special:Contributions
30 Special:Wantedpages
28 action=print
27 Special:Recentchangeslinked
27 Special:Allpages
26 Special:Upload
26 Special:Lonelypages
19 Special:Popularpages
15 action=watch
12 Special:Ipblocklist
11 Special:Listusers
10 Special:Unusedimages
8 Special:Newpages
7 action=view
7 action=delete
7 Special:Userlogout
7 Special:Longpages
4 Special:Shortpages
3 Special:RecentChanges
3 Special:AllPages
2 action=unwatch
2 Special:Movepage
1 Special:UserLogin
1 Special:Recentchanges/
1 Special:RandomPage/
1 Special:RandomPage
1 Special:NewPages/
1 Special:AllPages/
Here are some service times for the new Wikipedia server under actual
user load, and a bit of analysis: see futher down the page for more details.
Sample set was the subset of logfile entries up to my sampling time,
with timestamps starting 20020721
Here is a summary of all pages served in less than 30 seconds.
Values on a line are, from left to right:
* service time bin (bin n is all pages serviced in n <= time < n+1 seconds)
* number of pages served in that time bin
* cumulative fraction of pages serviced in that time bin or less
0 12734 0.830171458374
1 1295 0.914596779451
2 493 0.946737075429
3 259 0.963622139644
4 116 0.971184562227
5 80 0.976400026077
6 59 0.980246430667
7 46 0.983245322381
8 24 0.984809961536
9 15 0.985787861008
10 15 0.98676576048
11 27 0.988525979529
12 17 0.989634265597
13 15 0.990612165069
14 15 0.991590064541
15 11 0.992307190821
16 7 0.992763543908
17 7 0.993219896995
18 7 0.993676250081
19 5 0.994002216572
20 7 0.994458569659
21 7 0.994914922746
22 2 0.995045309342
23 2 0.995175695938
24 3 0.995371275833
25 2 0.995501662429
26 3 0.995697242323
27 2 0.99582762892
28 3 0.996023208814
29 3 0.996218788709
Here are the outlier pages that took 30 seconds or more to service, with
their timestamps.
20020721000038 194.499 /wiki/Astronomer
20020721000439 67.529 /
20020721000905 210.965 /
20020721000946 83.656 /w/wiki.phtml?title=Talk:Palestine&diff=0&oldid=0
20020721001035 512.128 /wiki/Nicolaus_Copernicus
20020721001354 199.017 /wiki/Astronomy_and_astrophysics
20020721001715 70.412 /wiki/Galileo_Galilei
20020721001745 31.105 /w/wiki.phtml?search= football world cup 2002
20020721002017 122.17 /wiki/Astronomer
20020721002314 134.53 /wiki/Hipparchus
20020721002316 65.866 /wiki/Astronomers_and_Astrophysicists
20020721002354 171.431 /wiki/Astronomer
20020721002729 36.432 /wiki/Johannes_Kepler
20020721002825 110.615 /wiki/Astronomer
20020721003134 68.176 /wiki/Astronomer
20020721003725 35.912 /wiki/Astronomers_and_Astrophysicists
20020721003758 159.654 /wiki/Astronomer
20020721004535 142.506 /wiki/Sir_Francis_Bacon
20020721010559 193.656 /wiki/Balance
20020721010559 233.676 /wiki/Measuring instrument
20020721010919 67.078
/w/wiki.phtml?title=Special:Recentchanges&days=3&limit=500
20020721013508 32.212
/w/wiki.phtml?title=Special:Recentchanges&days=14&limit=1000
20020721013817 49.659
/w/wiki.phtml?title=Special:Recentchanges&days=14&limit=1000
20020721015410 57.194
/w/wiki.phtml?title=Special:Recentchanges&days=3&limit=2000
20020721015558 43.563
/w/wiki.phtml?title=Special:Recentchanges&days=3&limit=2000
20020721015705 43.681
/w/wiki.phtml?title=Special:Recentchanges&days=3&limit=2000
20020721015828 42.666
/w/wiki.phtml?title=Special:Recentchanges&days=3&limit=2000
20020721020143 56.721
/w/wiki.phtml?title=Special:Recentchanges&days=3&limit=2000
20020721020456 93.871 /wiki/2001 U.S. Attack on Afghanistan
20020721021425 52.227 /wiki/Osama_bin_Laden
20020721021601 30.76 /wiki/Particle_physics&action=print
20020721022123 32.322 /wiki/Taliban
20020721023913 43.72
/w/wiki.phtml?title=Special:Recentchanges&days=3&limit=500
20020721024042 31.297 /wiki/September_11%2C_2001_Terrorist_Attack
20020721024151 35.013 /
20020721024352 31.545 /wiki/Special:Recentchanges
20020721024626 94.781 /wiki/Special:Recentchanges
20020721024637 31.751 /wiki/Special:Recentchanges
20020721025053 30.813 /wiki/September_11%2C_2001_Terrorist_Attack
20020721030026 30.034 /wiki/Cantor_Fitzgerald
20020721030106 60.296 /wiki/2001 U.S. Attack on Afghanistan
20020721031414 42.459 /wiki/Special:Wantedpages
20020721031427 55.526 /wiki/Special:Wantedpages
20020721031428 37.361 /wiki/Computing_timeline
20020721032613 32.112 /wiki/Earth_impacts
20020721032628 30.052 /wiki/20th_century
20020721033849 39.932
/w/wiki.phtml?title=Special:Recentchanges&days=3&limit=1500
20020721035505 146.219 /
20020721035824 31.713 /wiki/Special:Recentchanges
20020721035853 52.951 /wiki/Special:Recentchanges
20020721040232 224.979 /wiki/British_Open
20020721040525 54.24 /
20020721043452 104.493 /wiki/Special:Recentchanges
20020721043455 95.882 /wiki/India
20020721050334 81.403 /wiki/Philosophy
20020721052726 32.934
/w/wiki.phtml?title=Special:Wantedpages&limit=500&offset=100
20020721053950 45.24 /wiki/Free_On-line_Dictionary_of_Computing/C_-_D
20020721054101 37.168 /wiki/Free_On-line_Dictionary_of_Computing/C_-_D
Concentrating on just one page that appears more than once in the
outliers, and looking at all accesses in the sample whether outlines or
not, it is interesting to look at the variation in service times for
this page.
20020721000038 0194.499 /wiki/Astronomer
20020721001207 0018.859 /wiki/Astronomer
20020721002017 0122.170 /wiki/Astronomer
20020721002354 0171.431 /wiki/Astronomer
20020721002825 0110.615 /wiki/Astronomer
20020721003134 0068.176 /wiki/Astronomer
20020721003758 0159.654 /wiki/Astronomer
**Most** accesses to this particular page seem to have taken **over 1
minute**, and yet there appears to be nothing pathological about the
contents of the page. This is interesting!
Neil
If you visit http://www.google.com/search?q=site%3Awikipedia.com+edit,
you'll notice that google indexes all our edit pages. I find this
moderately annoying, as I don't want to find the edit link when I search
and new people not familiar with wikipedia won't know what to do when
they're confronted with a textarea. Also, I don't know what percentage
of accesses are from bots, but you may be able to cut down on useless
accesses.
I have a solution in mind: make an apache rewrite rule that rewrites
something like:
http://www.wikipedia.com/edit/Wikipedia:Bug_reports
into:
http://www.wikipedia.com/wiki.phtml?title=Wikipedia:Bug_reports&action=edit
Make the edit links go to the first version, then use robots.txt to
request that bots not harvest anything under /edit/*
I'm sending this here as it's not a feature request for the php script,
but rather site-specific for wikipedia. If there's a better place,
please tell me.
BTW, great work on the script. It just keeps getting better. I look
forward to Software Phase III. :)
--Dan Keshet
It is possible in the current software for two people to edit the same page,
and both to see their changes, and only one go into the database. This
happened with Artiodactyla; Josh Grosse deleted a few lines at the same time
that I added the new reference (which I have forgotten, assuming it was
recorded) and Moschidae. Is this possible with the new software?
Also, the current software first displays the article, then writes the new
timestamp, then writes the article. Does the new software use BEGIN ... END
to make sure that the timestamp cannot be updated without the article, nor
vice versa?
phma
The mySQL server appears to have too many connections, and the
resulting page full of error messages isn't to pleasant. I don't know
if this is fixed in the newcode, but I think if it isn't we should have
some standard "error page" and just include the details in the
source of the HTML.
(I've included a copy of the page that was generated at
<http://bits.bris.ac.uk/imran/wiki/wikimeta.htm>)
Imran
--
TheOpenCD Project
Promoting Open Source on Windows
http://www.theopencd.org