Wikitech-l April 2002

wikitech-l@lists.wikimedia.org

15 participants
36 discussions

by lcrocker＠nupedia.com

Back in January we had some discussion about how difficult it was to edit multiple cross-linked pages about subjects within a context now that subpages are gone. There were several suggestions, but none of them really clicked and none were ever implemented. The issue has come up again, and there are now more pages with disambiguating contexts now, so I think now is a good time to revisit. I also have a proposal that I like better than all the earlier ones (including mine). Rather than add a special tag like Base or Context, and rather than using a special character, let's just change our interpretation of links with a missing portion on either side of the pipe, that is [[ link| ]] and [[ |link]]. Here's the proposal: On pages whose titles end with (context) in parentheses, [[ |link]] is interpreted as [[link (context)|link]]. On all pages, [[link (context)| ]] is interpreted that way as well. All other uses of [[|link]] or [[|link]] are simply interpreted as [[link]]. That will make fixing all the links in the Middle Earth, Poker, and other pages much easier, and I don't think it will add any temptation to over-categorize or cause other problems. It is an open question whether these links are interpreted at save-time or render-time; the latter makes things easier I think, but the former has advantages too. 0

22 years

Move function

by Magnus Manske

I just commited a function (sysops only, I hope;) to move a page to a new title, with complete history. A checkbox is used to create a redirect. Now: 1. Please check for errors. 2. Please make it use variables instead of English text constants (Biron, I know you wouldn't be happy if you couldn't do that yourself ;] 3. The function doesn't move subpages, as there are no subpages, officially. Neither does it move talk: pages. Might be worth adding. Have fun with the coding, Magnus

22 years

Word Wide Lexicon

by Kurt Jansson

Dear Wikipedians! Today I read an article in a mailinglist which lead me to a slashdot discussion about a project called "World Wide Lexicon". There seem to be some wrong expectations about it so I mailed the author of WWL to ask him and to point him to Wikipedia. I think it's a very interresting project, but you can take a look at it yourself: the project: www.worldwidelexicon.org the /. discussion: http://slashdot.org/articles/02/04/05/1911255.shtml?tid=95 the answers to my emails: -----Ursprüngliche Nachricht----- Von: Brian McConnell <brianmsf(a)yahoo.com> An: Kurt Jansson <kurt(a)jansson.de> Gesendet: Sonntag, 7. April 2002 19:55 Betreff: RE: Why "Lexicon"? Kurt, Thank you for your email. I called it the worldwide lexicon because the system can be used to retrieve definitions for words as well as translations. For example, if you are doing a monolingual search, you can submit several different types of queries to a WWL server, including: - syn : returns synonymous words and phrases - ant : returns antonymous words and phrases - def : returns verbose description for a word or phrase - pcat : returns parent categories that the word, phrase or resource locator belongs to - ccat : returns child categories that are associated with the entry - vis : returns words that represent visually similar objects I like Wikipedia, and would like to talk to someone about joining it to the WWL system. I think it could be very useful in processing monolingual queries. All they will need to do is write a PHP script that recognizes several SOAP simple methods. I would also like to talk to the wikipedia software developer about the possibility of modifying the system to be used as a translation dictionary. I don't like to reinvent the wheel, and it seems that the system they have built can be modified to host a user supported database of language pair translations. The benefit of joining Wikipedia is the system will appear as a data source along with other web dictionaries, lexicons and semantic network servers. The most useful feature of our system is that it will enable client applications, a browser plug in for example, to locate WWL data sources on the fly, and then submit standardized queries to them. Thus, one fairly simple piece of code can talk to lots of dictionaries throughout the web (you might use it one day to lookup translations for words in a Spanish document, and another to look for verbose definitions for words in your home language). The main goal of WWL is to create a GNUtella like system for locating and communicating with dictionary and semantic network servers on the web (there are many). The problem today is that each system has its own proprietary front end, so all of this information is fragmented. By creating a simple protocol for locating and talking to systems, it is possible to create what appears to be a single worldwide dictionary/semantic network that can be accessed with a few lines of code. Thanks for writing. Best regards, Brian Mcconnell -----Ursprüngliche Nachricht----- Von: Brian McConnell <brianmsf(a)yahoo.com> An: Kurt Jansson <kurt(a)jansson.de> Gesendet: Sonntag, 7. April 2002 23:53 Betreff: RE: Why "Lexicon"? Kurt, Thanks for the quick reply. Another point... WWL does not do full text translation. It is designed to assist word and phrase translation, as well as monolingual dictionary or encyclopedia searches. As you know, translating full text without human intervention is a very difficult problem. While I could see translation systems using WWL to query dictionaries (to expand the scope of their vocabularies), the WWL specification does not say anything about full text translation. Our primary goal is to create a distributed dictionary/encyclopedia protocol that is very easy to implement in client and server software, and that does not require dictionary servers to make changes to their systems besides writing a few scripts to generate SOAP responses instead of HTML. WWL's purpose is to make it easy to automatically locate and communicate with WWL-aware dictionary and semantic net servers. I like to describe this as GNUtella for dictionaries. You are welcome to forward this my email to the wikipedia list or developers. As I mentioned, I think you could do some interesting things by making your systems accessible via the WWL SOAP interface. Thanks again for your email. Best regards. Brian McConnell

22 years

[phma@webjockey.net: HTTP response from wikipedia takes too long]

by Jimmy Wales

Pierre, thanks for your comment... I'm forwarding it to wikitech-l, which is where the developers hang out. Coincidentally, we are just now discussing performance issues. Would you be interested in joining us? ----- Forwarded message from Pierre Abbat <phma(a)webjockey.net> ----- From: Pierre Abbat <phma(a)webjockey.net> Date: Thu, 11 Apr 2002 09:14:25 -0400 To: webmaster(a)wikipedia.com Subject: HTTP response from wikipedia takes too long I am trying to read Wikipedia and Konqueror frequently times out, resulting in an edit conflict if I'm trying to submit something. Trying to access ross.bomis.com does not result in long times. Can you fix it? phma --- [phma@neofelis abi]$ time webserver ross.bomis.com Server: Apache/1.3.23 (Unix) PHP/4.0.6 mod_fastcgi/2.2.12 0.11user 2.00system 0:10.61elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1095major+404minor)pagefaults 0swaps [phma@neofelis abi]$ time webserver ross.bomis.com Server: Apache/1.3.23 (Unix) PHP/4.0.6 mod_fastcgi/2.2.12 0.09user 0.10system 0:00.91elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1052major+404minor)pagefaults 0swaps [phma@neofelis abi]$ time webserver ross.bomis.com Server: Apache/1.3.23 (Unix) PHP/4.0.6 mod_fastcgi/2.2.12 0.10user 0.09system 0:00.44elapsed 42%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1052major+404minor)pagefaults 0swaps [phma@neofelis abi]$ time webserver www.wikipedia.com Server: Apache/1.3.23 (Unix) PHP/4.0.6 mod_fastcgi/2.2.12 0.11user 0.22system 0:32.39elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (957major+376minor)pagefaults 0swaps [phma@neofelis abi]$ time webserver www.wikipedia.com Server: Apache/1.3.23 (Unix) PHP/4.0.6 mod_fastcgi/2.2.12 0.10user 1.02system 1:17.12elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1125major+404minor)pagefaults 0swaps [phma@neofelis abi]$ time webserver www.wikipedia.com Looking up www.wikipedia.com. Making HTTP connection to www.wikipedia.com. Sending HTTP request. HTTP request sent; waiting for response. Alert!: Unexpected network read error; connection aborted. Can't Access `http://www.wikipedia.com/' Alert!: Unable to access document. lynx: Can't access startfile Command exited with non-zero status 1 0.10user 2.28system 5:16.30elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1176major+406minor)pagefaults 0swaps ----- End forwarded message -----

22 years

Re: [Wikitech-l] Wikipedia performance and caching of responses -- some questions

by Brion L. VIBBER

Jim accidentally sent this just to me, I'm sending it back to the list: On mer, 2002-04-10 at 18:27, Jimmy Wales wrote: > Brion L. VIBBER wrote: > > > My best guess is that the parsing and lookups on regular pages are > > > currently the main load, not editing or exotic database queries -- is > > > this right? > > > > Not a clue. Initially, the database certainly was the main load, but I > > haven't heard any newer figures. Jimbo? > > I'll reset the slow-query log and make a new version available after a few > hours of data collection. > > > We used to cache rendered articles, but Jimbo disabled this feature some > > time ago, claiming he was unable to find a performance advantage. (See > > mailing list archives circa February 13.) > > But, I'm willing to try it again. > > > Personally, I've always find that idea suspicious; caching is definitely > > faster on my test machine, and is going to be a particularly big help > > with, say, long pages full of HTML tables! But then, my test machine has > > a much much lower load to deal with than the real Wikipedia. :) > > Nonetheless, if cacheing really isn't helping, that's because it's not > > doing something right. It should be found, fixed, and reenabled. > > I would say that I agree with that. > > Here's a question for everyone. > > Let's say we have some portion of the page pre-calculated and cached. > Is it faster to keep that cached text *in the database*, or *on the > hard drive*? > > I'm very strongly biased towards thinking that keeping it on the hard > drive is faster, and by a significant margin, but only because I've > never tested it and because I know (from long experience at Bomis) that > opening up a text file on disk and spitting it out can be *really* fast, > if the machine has enough ram such that the filesystem can cache lots of > popular files in memory. > > But, everything I read about MySQL talks about how screamingly fast it > allegedly is, so... > > --Jimbo >

22 years

patch for wikiSettings.php

by Zundark

Here's a patch for wikiSettings.php that fixes the problem of variables being used before they are defined. See my previous post for some of the rationale for this patch. This fix makes the code actually work, so the pages end up being in the colours assigned here. But we don't really want multicoloured pages, so I've changed most of the colours to #FFFFFF so that there is no effective change of page colouring. This patch also fixes the problem of using $wikiCharset before it is defined. I've just used "iso-8859-1" and "Latin-1" instead. Zundark *** wikiSettings.php.old Tue Feb 26 18:17:10 2002 --- wikiSettings.php Wed Apr 10 17:46:38 2002 *************** *** 21,33 **** $wikiDBconnection = ""; # global variable to hold the current DB # connection; should be empty initially. ! # Namespace backgrounds $wikiNamespaceBackground = array () ; ! $wikiNamespaceBackground[$wikiTalk] = "#eeFFFF" ; ! $wikiNamespaceBackground["user_talk"] = $wikiNamespaceBackground["talk"] ; ! $wikiNamespaceBackground["wikipedia_talk"] = $wikiNamespaceBackground["talk"] ; ! $wikiNamespaceBackground[$wikiUser] = "#FFeeee" ; ! $wikiNamespaceBackground[$wikiWikipedia] = "#eeFFee" ; $wikiNamespaceBackground["log"] = "#FFFFcc" ; $wikiNamespaceBackground["special"] = "#eeeeee" ; --- 21,30 ---- $wikiDBconnection = ""; # global variable to hold the current DB # connection; should be empty initially. ! # Namespace backgrounds. (Those with variable indices are assigned later.) $wikiNamespaceBackground = array () ; ! $wikiNamespaceBackground["user_talk"] = "#FFFFFF" ; ! $wikiNamespaceBackground["wikipedia_talk"] = "#FFFFFF" ; $wikiNamespaceBackground["log"] = "#FFFFcc" ; $wikiNamespaceBackground["special"] = "#eeeeee" ; *************** *** 41,48 **** include_once ( "wikiLocalSettings.php" ) ; # Initialize list of available character encodings to the default if none was set up. ! if ( ! isset ( $wikiEncodingCharsets ) ) $wikiEncodingCharsets = array($wikiCharset); ! if ( ! isset ( $wikiEncodingNames ) ) $wikiEncodingNames = array($wikiCharset); # Localised names # # This file loads up the default English message strings --- 38,47 ---- include_once ( "wikiLocalSettings.php" ) ; # Initialize list of available character encodings to the default if none was set up. ! if ( ! isset ( $wikiEncodingCharsets ) ) ! $wikiEncodingCharsets = array("iso-8859-1"); ! if ( ! isset ( $wikiEncodingNames ) ) ! $wikiEncodingNames = array("Latin-1"); # Localised names # # This file loads up the default English message strings *************** *** 54,59 **** --- 53,68 ---- include_once ( "wikiText" . ucfirst ( $wikiLanguage ) . ".php" ) ; } + # More namespace backgrounds, now that the required variables have + # been defined. We must be careful not to overwrite any values that + # have been assigned elsewhere. + if ( ! isset ( $wikiNamespaceBackground[$wikiTalk] ) ) + $wikiNamespaceBackground[$wikiTalk] = "#FFFFFF" ; + if ( ! isset ( $wikiNamespaceBackground[$wikiUser] ) ) + $wikiNamespaceBackground[$wikiUser] = "#FFFFFF" ; + if ( ! isset ( $wikiNamespaceBackground[$wikiWikipedia] ) ) + $wikiNamespaceBackground[$wikiWikipedia] = "#FFFFFF" ; + # Functions # Is there any reason to localise this function? Ever?

22 years

Wikipedia performance and caching of responses -- some questions

by Neil Harris

I have been thinking about the performance of Wikipedia, and how it might be improved. Before I go off and investigate in detail, I'd just like to check my basic concept of how the code works, (based on reading this list -- I haven't pulled down the CVS to look at it yet). === Total guesswork follows === Am I right in thinking that, for each ordinary page request, * the raw text is pulled out of the database * the taxt is parsed and reformatted * links are looked up to see if they are linked and treated appropriately * final page generation to HTML, with page decorations as per theme is added My general impressions about activity rate is: * about 100 pages per day are created or deleted * roughly one edit every 30 seconds * roughly one page hit every second Packet loss seems negligible, so you don't seem to be running out of bandwidth. Although I guesstimate the hit rate at around one-per-second, pages seem to be taking around 5 seconds to serve, suggesting that the system is probably running at a loadav of say 5 or so. My best guess is that the parsing and lookups on regular pages are currently the main load, not editing or exotic database queries -- is this right? Jimbo has mentioned that the machine has a lot of RAM, so disk I/O is unlikely to be the bottleneck: it's more likely to be CPU and inter-process locking problems. If so, I think careful page content caching could greatly improve performance, by reducing the number of page parsings, renderings and lookups across the board, at the cost of a slight increase in the cost of page deletion and creation. However, by freeing up resources, performance should improve across the board on all operations. If I'm right, I think suitably intelligent caching could be applied not only to ordinary pages, but also to some special pages, without any major redesign or excessive complexity. Before I start to look at things in more detail, could anyone confirm whether I am even vaguely making sense? -- Neil

22 years

Re: [Wikitech-l] Warnings by the dozen

by Zundark

Magnus Manske wrote: > I think we all work with "standard settings", and there are no warnings > showing up, just like the Bomis server uses standard and doesn't show > anything like that, either. But at least one major Wikipedia bug was caused by ignoring these warnings, so this certainly isn't a good idea. (In development, that is. The Bomis server is obviously a different matter.) > > Here, $wikiTalk, $wikiNamespaceBackground["talk"], $wikiUser and > > $wikiWikipedia are all undefined. I've no idea how to clean this > > up, because I don't understand what it's supposed to look like. > > Why are some of the indices variables and other constants? > > In particular, what is the intended distinction between > > $wikiNamespaceBackground["talk"] and $wikiNamespaceBackground[$wikiTalk]? > > What should be done with this code? > > The reason (without looking at the code right now) is probably the missing > "global" statement at the beginning of the function. No, it's caused by using them before they are defined. The file wikiTextEn.php which defines them is included later on. (This doesn't apply to $wikiNamespaceBackground["talk"], which isn't defined anywhere. I assume it's a mistake for $wikiNamespaceBackground[$wikiTalk].) For the same reason, $wikiCharset is also used before being defined. So the values in $wikiNamespaceBackground need to be assigned after wikiTextEn.php (and any other language-specific setting file) has been included. But these files can modify $wikiNamespaceBackground (at least, the Esperanto one does - perhaps it shouldn't), so the only solution appears to be to declare $wikiNamespaceBackground first, then include the language-specific files, then assign values to $wikiNamespaceBackground, but making sure not to overwrite any values that have already been assigned. I'll post a patch for this later. -- Zundark

22 years

Wikipedia performance and caching of responses -- some questions

by Neil Harris

22 years

Cologne Blue

by Axel Boldt

Right now, the search box and last date of change appears to the right, not centered. Is that intentional? Also, I noticed on Mozilla that the article text is very close to the left border of the browser, much closer than in the other skins. It looks strange. Axel

22 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l April 2002