I am running the current CVS version locally and updated my database
scheme. Search, RecentChanges, edit and preview works fine, but if I
try to save my changes, they are all lost and the file is back to the
previous version. It doesn't happen for all files, and I don't know
what the difference is (two example problem files are the main page
and "British Queen"). It's not a browser cache issue, since when I try
to edit the page again, the wiki text is also reverted to the old
version. I can write to the database, since the page counters change
correctly. Does anybody see that problem?
Axel
With my new checkin permission, I went to work happy as a clam.
I added a README file and the GPL license in COPYING.
The file wikiTextEn.php uses the $THESCRIPT variable, and it therefore
has to be included at the end of wikiSettings.php, after
wikiLocalSettings.php has been read. I made a new variable
$wikiLanguage which can be set in wikiLocalSettings and which
determines which wikiText file is included later.
Axel
On mer, 2002-02-13 at 17:24, Jimmy Wales wrote:
> Brion Vibber wrote:
> > I'm not sure what advantage would be gotten out of storing a version
> > that has had HTML tags worked over, but still needs the wiki code
> > converted into HTML every time we load it. We get more speed by caching
> > the completely parsed version, or more storage savings by reparsing it
> > every time and not storing anything but the the editable text.
>
> It's worth noting that on the live server, I see no material difference when
> I turn caching on or off.
Interesting. I have to wonder whether this means caching is for some
reason not working at all... It seems to be disabled and/or broken at
the moment, unless someone sneaked in and fixed the other-languages bug
while I wasn't looking.
I ran "ab -n 10" on a couple pages running on my test server with
various states: caching on, caching off w/ no removeHTMLtags() call,
caching off with the old removeHTMLtags() code, and caching off with my
new as-yet unoptimized but more secure version of removeHTMLtags(). The
pages per second figures from three trials each:
Beryllium (large HTML table, various other tags)
* cached 2.06 2.06 2.16
* none 0.94 0.95 0.95
* old 0.90 0.90 0.89
* new 0.47 0.48 0.48
Esperanto-wiki mainpage: (a few <b>, <i>, and <font> tags)
* cached 3.26 3.13 3.47
* none 1.84 1.83 1.76
* old 1.82 1.80 1.80
* new 1.58 1.62 1.58
> Also, space is really cheap these days. And we're not in any immediate danger
> of running out of it.
Very true.
-- brion vibber (brion @ pobox.com)
Just as the announcements page says, Wikipedia is a lot faster now.
Thanks to everyone who made it possible. A quick-loading Recent Changes
page is a beautiful thing.
Larry
I just ran "ab n=10" for an atricle (with cache turned off) and deactivated
some functions to see where the slow parts are.
Full rendering : 4.99 sec
removeHTMLtags turned off : 3.319 sec
It seems removeHTMLtags is responsible for 1/3 of the *total* runtime, which
includes apache, php calling, and a thousand other things that can't be
avoided.
So, if these HTML tags are *never* used anyway, why can't we replace them
with < and > just prior to saving an edited article?
I'll be gone tomorrow until Saturday, and I doubt I can hack it today, so
it's up to you...
Magnus
Jan said that I have caching turned off, which surprised me because I thought
it was on. Now I've looked at the code and I still think it is on.
wikiSettings.php:
$useCachedPages = true ;
wikiLocalSettings.php:
# $useCachedPages = false; # Disable page cache
(This is commented out, right?)
-----------------
Playing with benchmarking, grabbing a normal article 100 times:
(I know that this type of benchmarking is not very scientific, since conditions
may change on the live server due to someone else doing something big at the same
time, etc. But I think it gives an indication.)
As the site is running:
/apache/bin/ab -n100 -c1 http://www.wikipedia.com/wiki/Alabama
Requests per second: 0.95 [#/sec] (mean)
Now I will set $useCachedPages to false by uncommenting the line in wikiLocalSettings.php.
/apache/bin/ab -n100 -c1 http://www.wikipedia.com/wiki/Alabama
Requests per second: 0.97 [#/sec] (mean)
So I see no material difference.
How can I easily tell if caching is actually on or off? Am I doing something wrong
here?
From: "Jimmy Wales" <jwales(a)bomis.com>
> Jan Hidders wrote:
> > Can I suggest we simply stop with the whole caching thing? It
complicates
> > things unnecesarily. Keeping the code simple should be one of our top
> > priorities. Jimbo doesn't have it turned on at the moment anyway,
>
> I have no strong opinion about this, but I wanted to say that I
> thought I did have it turned on. If it's off, that's a mistake.
My mistake. I saw that the language links worked, but hadn't realized I was
looking at the page just after I edited it. I see now that you do have it on
because upon reloading the page the language links are missing, so I get
apparently the cached page.
> Tell you what, I'll benchmark with and without, on the live server,
> and report the numbers.
Yes, that would be very very welcome.
-- Jan Hidders
From: "Magnus Manske" <Magnus.Manske(a)epost.de>
>
> The parser has to be brought up to speed. I'll also have a look into
> connecting the PHP script with the C++ parser I wrote (did I mention 0.05
> secs for rendering "Signal transduction", with fetching it from the
> database, searching the database for existing topics, and adding the
> "framework"?;)
As I said I'd rather keep it in PHP, but it's your project of course. Does
your parser put any requirements on the syntax. Should it be LL(1) or
LALR(1)? Are you going to use yacc, or is it just a simple recursiev descent
parser?
What we could improve in PHP for example is that the current parser parses
the string paragraph by paragraph. (But please don't use the function
explode() for that because that is a memory killer.) Most replace-functions
could be limited to only one paragraph and the rest can be dealt with by
making the parser a little context-sensitive. Standard Wiki matrup is
supposed to be limited to one paragraph anyway. The HTML markup is a bit
harder, but there you can remember the nesting depth and type of nesting,
and once you see that that tags are not balanced you go back in the string
and replace the < and > with entities. This will be expensive but it is an
exception, so it won't hurt.
-- Jan Hidders
I noticed that the other-language links (links in the form [[fr:Japon]]
[[en:Japan]] [[eo:Japanio]] etc which are hidden in the article body but
listed by language name in the header bar, pointing to the article on
the current subject in the other-language wikis) are vanishing on cached
pages, because they're scanned and listed during the wiki->html link
parsing which of course doesn't occur when loading a cached page.
I've a hackish fix for that which explicitly seeks out the other-
language links for cached pages, but I don't like it very much. It's
inelegant, and two sets of code have to be maintained to do the same
thing in different contexts.
What I'd like to do is add a column to the cur table, something like
cur_links_languages which would be analogous to cur_links_linked and
cur_links_unliked. The list of inter-language links for a page would be
stored when the page is saved, then easily loaded up again along with
the cache. This would also make it easy to provide statistics on the
degree of linkage between language wikis. (No change in current
user-visible behavior except in fixing the obvious bug of vanishing
links, and potentially providing more information in special:Statistics
etc.)
Alternatively, we might have a separate database which contains nothing
but lists of connected articles. This could facilitate keeping the
other-language links consistent; if somebody adds an article "Japón" to
the Spanish wikipedia, it shouldn't be necessary to separately add
[[es:Jap%f3n]] to the English, French, Esperanto, etc. articles. Keeping
a central repository would mean that it only needs to be linked in with
the others once, and all linked articles will immediately benefit by
being able to list it without manual editing. Upside: added simplicity
for article writers, who don't have to maintain as many links. Downside:
added complexity for site maintainers, who have to run a second database
or not get all the other-language links. Also might be more difficult to
remove incorrectly linked articles.
An alternative to the separate link database might be a robot/automatic
process that occasionally looks through all the wikipedias checking for
consistency in the other-language links and automatically adding (or
alerting a human that one ought to add) new other-language links where
needed.
So what do people think? Should we try one of these, or should I just
check in my hackish fix for the meantime?
-- brion vibber (brion @ pobox.com)