Wikitech-l February 2002

wikitech-l@lists.wikimedia.org

10 participants
70 discussions

by Larry Sanger

We just never gave the programmers a good set of requirements for the uploader. On Mon, 25 Feb 2002, Lars Aronsson wrote: > Larry Sanger wrote: > > I agree 100% that this is a problem. Last night I deleted several dozen > > files that someone had overwritten as being (obviously) inappropriate for > > Wikipedia articles. I was a little concerned from the beginning that > > having virtually no restrictions on the upload function would have this > > effect, so it's not too surprising that this is happening. > > >From a general, Wiki-philosophical-social aspect, it is interesting > that the upload function gets abused, while general Wiki pages do not. Actually, there's a good reason for it: the images aren't obviously linked to anything in any article. This is an ABSOLUTELY essential piece of information to have: what articles *use* the image in question? If no article uses an image after 24 hours, perhaps we should delete the image (or put it in a queue to be deleted by a human). So, the point is, without a context, unless some image is at face value obviously worthless to any Wikipedia article (e.g., porn advertisements), it's difficult for us to tell whether an image really is appropriate for the 'pedia. It would even make it easier for us to determine whether an image is copyrighted. One way around this would be to attach images to unique articles, so that the uploading of an image would be logged in a particular article's history. I don't know if I like this suggestion, though, I'm just throwing it out there for your consideration. Here's another thing we need in that upload form. We should ask people to choose: (1) I have created this image and release it under the GNU FDL (or contribute it to Wikipedia); (2) I personally certify that this image is public domain (if checked, add a text box requiring that a source be given--a URL or else a book title, say); (3) other? If none are checked, then the uploader wouldn't accept the article. Under some schemes we might want (1) to require that the uploader identify which article the image is going to be used in, and (2) to check that the image title is linked to from that article. But (1) might be done automatically, I guess... Doing these things would remove a fair bit of the abuse. It would certainly make it a lot easier for the community to act as a check on the abuse. > Perhaps the uploads should be visible in the RecentChanges list? They already are, sort of--but each one individually should be, which isn't the case now. > Perhaps there should be a "view other versions" for each upload? Maybe--would prevent people from uploading porn in place of legit images, for instance. > Perhaps a Wikipage in the upload: namespace for each uploaded object? Maybe...? Larry

22 years, 2 months

Updating the database schema

by Jimmy Wales

O.k., I'm going to install the latest software. What do I need to do, in order to update the database schema before I install the new version? --Jimbo

22 years, 2 months

Re: [Wikitech-l] New case conversion functions

by Brion Vibber

I'm reposting this to wikitech-l so that discussion doesn't get lost. On mer, 2002-02-20 at 15:20, lcrocker(a)nupedia.com wrote: > You Wrote: > > Please take a > > look at the non-English non-ISO-8859-1 wikipedias sometime. > > > >Hundreds of pages, with correct charset headers: > > ISO-8859-2: > > http://pl.wikipedia.com/ > > UTF-8 with a custom conversion function for certain character > > sequences: > > http://eo.wikipedia.com/ > > > You're right. Last time I looked at these, the test pages I retrieved > gave 404s, and the 404 page is still served as ISO-8859-1, but the > headers of contentful pages are indeed as you say: 8859-2 for "pl" > and UTF-8 for "eo", etc. > > OK, then, I guess we do have to wade into the morass of national > character sets. Unless you want to switch to UTF-8, that is a given. > I have little or no experience using actual foreign- > made computers; but I /do/ have extensive knowledge about character > sets and communication protocols, so I'm just trying to make sure we > don't make the same mistakes hundreds of others have made in the past > by not getting this stuff right up front, but just diving headlong > into coding without stepping back a moment to design something that > will be usable and maintainable in the future. > > The way it is now, for example, we won't be able to cut-and-paste > between wikis if, say, I wanted to include a quote from some Polish > leader or something. Sad but true. > Maybe that's a reasonable sacrifice for ease of > editing on those wikis. Lee, let me put it this way. Imagine, if you will, that history had gone somewhat differently. Let's say that the first computers had been developed in a politically free, economically strong, highly industrialized Russia and the standard computer character set around the world had been based on the Cyrillic alphabet. In our hypothetical world, there's a Russian version of what we would have called Wikipedia. They set up some subsites in other languages, one of which is English, which uses the Latin alphabet. Now, you want to add some articles to the English site, but the site administrators have declared that only the standard cyrillic character set is to be used, with special markup to allow other characters through the use of numerical codes. This means: * Pages display fine for viewing, but when you edit, you see nothing but numeric escape codes. * You can't type *a single letter of English text* without using a special numeric escape code. * All page titles have to be transliterated into Cyrillic, because the escape codes aren't allowed in titles. Now, can you honestly tell me that you expect the average English-speaking wiki contributor to edit a page that looks something like this: [[уикипздиа:Узлкомз нзукомзрс|Welcome]] to [[Уикипздиа|Wikipedia]], a collaborative project to produce a complete encyclopedia from scratch. We started in January 2001 and already have over '''23,000 articles'''. ? I can't imagine that you would expect that to be acceptable to anyone else! You'll notice that the two non-ISO-8859-1-language 'pedias that have actual content (Polish and Esperanto) both use the Latin alphabet with a few diacritics. So theoretically, they would be the *most* amenable to using HTML entities -- you can almost read text in the edit box that way -- yet users of both wikipedias took the effort to tweak the program to make their customary character encodings work so that they could actually find people who would be willing to edit pages. HTML entities are great for tossing in an occasional foreign letter or word, but at the user level they are poor for regularly used diacritics and utterly useless for text in other alphabets. > We could, alternatively, serve UTF-8 on all > of them, but that would risk breaking older browsers. There are side > issues of what is stored in the database, and what is allowable in > titles/URLs, etc. Another alternative is to use the entities internally in the database, but work some mojo to make them appear as normal characters in the edit box. Which means you get zero advantage over simply using the national character set -- you still have to send a character set header, you have to know which Unicode characters can be passed through safely and which need to be escaped, the search engine still breaks words, you still can't capitalize non-ISO8859-1 titles, you still can't cut-n-paste, etc etc etc. All of the pain, none of the gain. > We really need to sit down and spec this out before we get too far > down the road. That's one reason why I posted the proposed policy on > foreign characters for the English Wiki; it is explicitly for the > English one only, but we need something equivalent for the other > ones. > > We had a lot of discussion about these topics in the early months of > the project: I don't want us to ignore everything we learned back > then just because the folks working on the code now weren't around > back then. > 0 Indeed. What were the conclusions of these discussions, and the reasoning behind them? -- brion vibber (brion @ pobox.com)

22 years, 2 months

Re: [Wikipedia-l] colons

by Larry Sanger

While I have to agree that it would be nicer to have the use of colons in titles, I don't think it would be better to have "content:" preceding every Wikipedia title. Colonless page titles could be automatically converted to content: titles (so that [[foo]] would be saved automatically as [[content:foo]]), but it would make the system more complicated and more importantly it would make the titles and the URLs uglier. This is really more of a technical issue than a policy one. I mean, from a policy standpoint, it's great to have the use of colons in titles. If there aren't any technical objections, we should do it. But, of course, there might be technical objections. (Cc'ing wikitech-l.) Larry > > > Thus instead of the ":" being a reserved character anywhere in a > > > title, only "user:", "talk:", "wikipedia:" etc. need to be > > > reserved. Any other uses of colons should be fine. This will let us > > > have entries for books with standard formatting of the subtitle (e.g., > > > "The Muggles: A Tale of Woe") or other natural uses of the colon. > > > > There is a simpeler solution. The actual contents of Wikipedia get the > > namespace "content:". If everything is prefixed with a namespace then the > > first colon is always the end of the namespace. > > > That's perfect.

22 years, 2 months

NuNupedia

by Magnus Manske

Dear fellow programmers, now that the wikipedia software is running (and quite well), thanks to many volunteers, and work on that software turns from bug-fixing to fine-tuning, I would like to open another front ;) For some time now I've been working on a completely rewritten version of the Nupedia software (http://www.nupedia.com). For those of you who don't know, Nupedia was the original, peer-reviewed Bomis encyclopedia project, and wikipedia was a "spin-off". But, the Nupedia approval process proved to be too slow and complicated, and the project came to a standstill, with only a dozen or so articles actually online. Recently, the Nupedia group voted about a streamlined article review process, and a draft policy is currently in writing. Why does this require a new software? Well, it doesn't. But I worked on the current Nupedia software, and it strikes me as a multi-redundant HTML-PHP-mix. It is working, and it can be altered, but IMO it will be more work to change the old software to a revised review process than to write a new one. Also, some other changes will have to be made, like multi-language interfaces and databases, which would further complicate a software change. So, silently, I created a new SourceForge project, using "nunupedia" as a working title (the encyclopedia title will remain "Nupedia"!). Both Jimbo and Larry agreed that this can eventually become the official Nupedia software. You can see a demo at http://nunupedia.sourceforge.net Only some parts of it do currently work. Under "articles in progess", you can see a demo article with online discussion. You can become a member and sumbit articles (but, don't use valuable stuff;) I would have preferred to complete a basically working version before "going public", like I did with the Wikipedia PHP script, but currently, I bluntly don't have the time for that :( So, I ask you, to help me with that. Remember, Nupedia will most likely become a "stable version" of wikipedia ;) If you are interested in developing this software, mail me or the list. Don't forget your SourceForge name, so I can give you CVS write access. Thanks for listening, Magnus

22 years, 2 months

the cur_(un)linked_links columns have officially been dropped

by Jan Hidders

L.S., As the subject says these columns have now been removed from the database schema and all the code that used them has been replaced by code that uses the new (un)linked tables. This completes a major change in the database scheme, so I will now wait until Jimbo has installed the latest CVS version, and see what bugs/problems appear. There is still a lot I can do in terms of speeding certain pages up, such as the short-pages page, the long-pages page, the orphans page and the most-wanted page. But for this I would again have to extend the database scheme, so I first suggest we freeze the scheme and after the bugs have been ironed out, I suggest we determine what would be the special page we want to speed-up first. -- Jan Hidders

22 years, 2 months

update: new special page "watch page links" is ready

by Jan Hidders

L.S., The code donated by Dan Keshet has been integrated. You can find it in the Quick bar under the name ''watch page links". Not a very good name, but I couldn't think of a better one that would fit. I have also optimzed the SQL for the what-links-here page for the new linked/unlinked tables. Finally, I added some code to the watchlist page to deal with pages on the watch-list that no longer exist. A remaining problem is that users cannot delete such pages from their watch list. -- Jan Hidders

22 years, 2 months

recent changes linked

by Dan Keshet

Hi Wikitechers, I've had this itch for a while, and I've finally scratched it: being able to display the recent changes of all pages that are linked to from a particular page. You can see the function I wrote at work here: http://www.projectmosaic.org/pwiki/wiki.phtml?title=Main_Page&action=recent… You could use this feature to get an overview of the activity in a given section (say, "Mathematics") without having to load/unload the entire section into your watch list. Alternately, you could use this to build group or public watch lists, the wiki way. Would you like this feature in the php script? Should I send someone the diff? -- Dan Keshet PS: (Please to: or cc: me; I'm not on the list.) PPS: Great job everybody who's worked on the script. Wikipedia's looking really great. :)

22 years, 2 months

update: lonley pages

by Jan Hidders

Highly esteemed colleagues, :-) I've optimzed the database access of the lonely-pages-page and committed it to CVS. You can now actually wait for the result to appear. :-) -- Jan Hidders

22 years, 2 months

Re: [Wikitech-l] recent changes linked

by Jan Hidders

----- Original Message ----- From: "Magnus Manske" <Magnus.Manske(a)epost.de> To: "Jan Hidders" <hidders(a)uia.ua.ac.be> Sent: Friday, February 22, 2002 3:48 PM Subject: RE: [Wikitech-l] recent changes linked > I checked the online example, and it would be great to have such a thing in > the sidebar! > > Jan, as you are working on the link table, could you get the code and try to > integrate it, if you have time? But of course. Send me the code and I'll adapt it and add it to the sidebar. -- Jan Hidders

22 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2002