Jan said that I have caching turned off, which surprised me because I thought
it was on. Now I've looked at the code and I still think it is on.
wikiSettings.php:
$useCachedPages = true ;
wikiLocalSettings.php:
# $useCachedPages = false; # Disable page cache
(This is commented out, right?)
-----------------
Playing with benchmarking, grabbing a normal article 100 times:
(I know that this type of benchmarking is not very scientific, since conditions
may change on the live server due to someone else doing something big at the same
time, etc. But I think it gives an indication.)
As the site is running:
/apache/bin/ab -n100 -c1 http://www.wikipedia.com/wiki/Alabama
Requests per second: 0.95 [#/sec] (mean)
Now I will set $useCachedPages to false by uncommenting the line in wikiLocalSettings.php.
/apache/bin/ab -n100 -c1 http://www.wikipedia.com/wiki/Alabama
Requests per second: 0.97 [#/sec] (mean)
So I see no material difference.
How can I easily tell if caching is actually on or off? Am I doing something wrong
here?
From: "Jimmy Wales" <jwales(a)bomis.com>
> Jan Hidders wrote:
> > Can I suggest we simply stop with the whole caching thing? It
complicates
> > things unnecesarily. Keeping the code simple should be one of our top
> > priorities. Jimbo doesn't have it turned on at the moment anyway,
>
> I have no strong opinion about this, but I wanted to say that I
> thought I did have it turned on. If it's off, that's a mistake.
My mistake. I saw that the language links worked, but hadn't realized I was
looking at the page just after I edited it. I see now that you do have it on
because upon reloading the page the language links are missing, so I get
apparently the cached page.
> Tell you what, I'll benchmark with and without, on the live server,
> and report the numbers.
Yes, that would be very very welcome.
-- Jan Hidders
From: "Magnus Manske" <Magnus.Manske(a)epost.de>
>
> The parser has to be brought up to speed. I'll also have a look into
> connecting the PHP script with the C++ parser I wrote (did I mention 0.05
> secs for rendering "Signal transduction", with fetching it from the
> database, searching the database for existing topics, and adding the
> "framework"?;)
As I said I'd rather keep it in PHP, but it's your project of course. Does
your parser put any requirements on the syntax. Should it be LL(1) or
LALR(1)? Are you going to use yacc, or is it just a simple recursiev descent
parser?
What we could improve in PHP for example is that the current parser parses
the string paragraph by paragraph. (But please don't use the function
explode() for that because that is a memory killer.) Most replace-functions
could be limited to only one paragraph and the rest can be dealt with by
making the parser a little context-sensitive. Standard Wiki matrup is
supposed to be limited to one paragraph anyway. The HTML markup is a bit
harder, but there you can remember the nesting depth and type of nesting,
and once you see that that tags are not balanced you go back in the string
and replace the < and > with entities. This will be expensive but it is an
exception, so it won't hurt.
-- Jan Hidders
I noticed that the other-language links (links in the form [[fr:Japon]]
[[en:Japan]] [[eo:Japanio]] etc which are hidden in the article body but
listed by language name in the header bar, pointing to the article on
the current subject in the other-language wikis) are vanishing on cached
pages, because they're scanned and listed during the wiki->html link
parsing which of course doesn't occur when loading a cached page.
I've a hackish fix for that which explicitly seeks out the other-
language links for cached pages, but I don't like it very much. It's
inelegant, and two sets of code have to be maintained to do the same
thing in different contexts.
What I'd like to do is add a column to the cur table, something like
cur_links_languages which would be analogous to cur_links_linked and
cur_links_unliked. The list of inter-language links for a page would be
stored when the page is saved, then easily loaded up again along with
the cache. This would also make it easy to provide statistics on the
degree of linkage between language wikis. (No change in current
user-visible behavior except in fixing the obvious bug of vanishing
links, and potentially providing more information in special:Statistics
etc.)
Alternatively, we might have a separate database which contains nothing
but lists of connected articles. This could facilitate keeping the
other-language links consistent; if somebody adds an article "Japón" to
the Spanish wikipedia, it shouldn't be necessary to separately add
[[es:Jap%f3n]] to the English, French, Esperanto, etc. articles. Keeping
a central repository would mean that it only needs to be linked in with
the others once, and all linked articles will immediately benefit by
being able to list it without manual editing. Upside: added simplicity
for article writers, who don't have to maintain as many links. Downside:
added complexity for site maintainers, who have to run a second database
or not get all the other-language links. Also might be more difficult to
remove incorrectly linked articles.
An alternative to the separate link database might be a robot/automatic
process that occasionally looks through all the wikipedias checking for
consistency in the other-language links and automatically adding (or
alerting a human that one ought to add) new other-language links where
needed.
So what do people think? Should we try one of these, or should I just
check in my hackish fix for the meantime?
-- brion vibber (brion @ pobox.com)
I just tried to refresh the page when I got this :
(The page was last refreshed just -161981 minutes ago; please wait another
161986 minutes and try again.)
That's about 4 month...
I've rewritten wikiPage::removeHTMLtags again. (Checked into CVS, diff:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/wikipedia/phpwiki/fpw/wikiPa…)
Exciting new features:
* Removes unwanted tag attributes, such as the scripting attributes
(onmouseclick, onmouseout, etc) which can be used to create fake links
or automatically redirect the browser to another web site (see the
previous version of the [[Goatse.cx]] article for an example)
* Makes a more serious attempt to fix mismatched open/close tag pairs.
Related, makes some attempts at normalization of tables. ie, <tr> not
allowed outside of <table> etc.
* Nested tables now work.
The function feels more weighty than it ought to be, but it works on
everything I've tried throwing at it so far, which is an improvement
over the previous versions.
I also threw in fixes for:
* Character entities in <pre> sections
* ISBN numbers with letters in them
* == Section headers == at the edges of HTML tags
-- brion vibber (brion @ pobox.com)
----- Forwarded message from Neil Harris <neil.harris(a)tonal.clara.co.uk> -----
From: Neil Harris <neil.harris(a)tonal.clara.co.uk>
Date: Tue, 12 Feb 2002 09:38:03 +0000
To: jwales(a)bomis.com
Subject: Walone2.ico - a re-send of the missing favicon.ico file
Dear Jimbo,
This is the favicon file which got lost during the software upgrade.
It should be the most recent 16x16 version, and should be installed at
http://www.wikipedia.com/favicon.ico
In an ideal standards-compliant world, generated pages would have
<link rel="shortcut icon" href="http://www.wikipedia.com/favicon.ico">
added to their HEAD sections.
* M$ browsers are hard-wired to find the favicon at the location given
_unless_ the 'link rel' stuff is in the page, in which case they will
use that
* pure W3C standards-compliant browsers won't find the icon unless the
link rel stuff is in the page
* Mozilla changes its policy from release to release...
I hope this is useful.
-- Neil
----- End forwarded message -----
I'm not sure if the live site uses caching already (or an intermediate
proxy), but I have repeatedly noticed that the served version of a
page does not always completely agree with the current wiki version.
Right now, it is happening at the main page: all previous versions and
diffs show "Winter Olymplics" under Current Events, but the served
version of the page omits it. I'm pretty sure that nobody deleted that
text.
Axel
I was trying today to sort out the [[Swedish monarchs]], when I
stumbled on this page:
http://www.wikipedia.com/wiki/Bgustav/bus+Adolphus+of+Sweden
It looks like this URL is the result of a bug during the conversion.
Perhaps this is interesting evidence for Magnus? (BTW, we have a few
old kings of Sweden named Magnus...)
There is a king of Sweden who's name is sometimes spelled Gustavus
Adolphus, but sometimes Gustav Adolf. It seems this URL is the result
of someone trying to link to [[<b>Gustav</b>us Adolphus of Sweden]],
but this is just my guess.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik
Teknikringen 1e, SE-583 30 Linköping, Sweden
tel +46-70-7891609
http://aronsson.se/http://elektrosmog.nu/http://susning.nu/
From: "Magnus Manske" <Magnus.Manske(a)epost.de>
> I fixed it already. Look at the user preferences.
No, that's not it. It happens when your preference is "ignore minor edits".
There you might not get all the changes during a day if your maxCnt is not
very high. I know already what the problem is (I should not select the first
maxCnt there because I am sorting by title_name, not by timestamp.) I hadn't
noticed this because my testing database wasn't that big. I know I can solve
this, but it takes some thinking.
-- Jan Hidders