I think the current handling of interlanguage links is problematic and
not very scalable. If we have n copies of an article, we need need n*n-1
interlanguage links. For 10 languages, that would be 90 links! All of
these links have to be added to separate pages, by people speaking
different languages, who often don't even have an account on the
Wikipedia in question.
As should be obvious, we are already missing interlanguage links for
many, if not most, of the translations we have.
The scalable solution requires us to have a meta-table for interlanguage
links that can be accessed by all Wikipedias. This table could look like
this:
language1 article1 language1 article2
------------------------------------------------------------
en Main Page de Hauptseite
fr Accueil en Main Page
fr Accueil es Portada
...
Let's call it shared.ilinks for the moment.
Instead of adding interlanguage links on top of articles, we would have
a separate text line below article bodies:
Interlanguage links (syntax: [[<code>:<article name>]])
The syntax would remain the same so that the link line can be cut and
pasted from the body. But this line would not be stored in that form in
the database.
Display of interlanguage links
------------------------------
Say I visit [[Main Page]] on en.wikipedia.org. Now, in order to show the
list of links, the shared.ilinks table is queried:
SELECT * from shared.ilinks where (language1=en and article1="Main
Page") or (language2=en and article2="Main Page")
That is, a single SELECT allows us to find all translations of the word
"Main Page". But don't we only save relatively little time, as we still
have to tell *every* Wikipedia that homepage means "Main Page" in
English? No, because we can now leave this to the code.
When a user edits a page, the same list of links is generated, but this
time in the wiki syntax ([[fr:Accueil]] [[de:Hauptseite]] and so on).
This can be edited by anyone. When the list has been edited, and the
page is saved, the following is done:
1)
The same SELECT as above is run:
SELECT * from shared.ilinks where (language1=en and article1="Main
Page") or (language2=en and article2="Main Page")
2)
Now, for each translation we get, another similar SELECT is run, so that
we find further translations into other languages.
3)
Every new translation we discover is stored in a new English (in our
example)/<new translation> table row, so that we can do the quick,
one-time SELECT to display the interlanguage links.
The result: If we have a page in 10 translations, the minimum effort we
have to go to is to add exactly one translation on every language
Wikipedia. That is, a minimum of 9 as opposed to 90 links! The other
translations are automatically discovered.
Example:
Someone creates a new page about Phil Collins on fr.wikipedia.org. This
person knows that there's already an English page about him on
en.wikipedia.org, so they type [[en:]] (suggested short syntax for "same
name as here"). "fr:Phil Collins->en:Phil Collins" is inserted into the
shared.ilinks table. This already means that the link is also shown on
en.wikipedia.org. But it gets better: Now someone on de.wikipedia.org
creates a Phil Collins page as well. He links to en.wikipedia.org's
[[en:]] entry. Zap!, after saving the entry, the French translation is
automatically discovered. Now the French translation has a link to the
German page and vice versa as well.
Editing links
-------------
What happens if the folks on fr.wikipedia.org move one of their pages?
The "Move this page" command now needs to automatically change every
instance of the page to something else (e.g. Accueil->Homepage) in the
shared.ilinks table.
What happens if someone on en.wikipedia.org decides that they do not
want to link to a page on nl.wikipedia.org because it contains obsolete
information, or because of "link-vandalism"? To unilaterally remove a
link to one translation, there would have to be a special interlanguage
link, like [[nl::]]. When saved, the link would be cleared and not
"rediscovered" until someone removed the [[nl::]] link. Such empty links
would not be copied.
If [[nl:Hoofdpagina]] is deleted, all instances of it in the
shared.ilinks table are removed as well.
What about links where there is no 1:1 relationship? Say I have a page
about "evolution" and "theory of evolution" on one wiki (English) and
only a page about "evolution" on another (French). So I add the
following to en.wikipedia.org on both pages:
[[fr:Théorie de l'évolution]]
In the shared.ilinks table, I therefore get entries:
Evolution Théorie de l'évolution
Theory of Evolution Théorie de l'évolution
When I visit the "Evolution" page, I get a clear match: Théorie de
l'évolution. But when I visit the "Théorie de l'évolution", I get two
matches. In this case, we could actually show both links on the French
page:
English: [1],[2]
Or in edit mode:
[[en:Evolution]][[en:Theory of Evolution]]
It may not be desirable to autocopy these duplicate links. So, if we
cannot discover an exact match, we may want to wait until someone
specifies a precise translation.
Discussion
----------
The process described above is complex from a technical perspective,
because it has to be respected during all changes to articles (move,
delete, edit etc.) It also requires us to run a separate database server
specifically for this shared information. There may be scenarios that I
have not yet covered in the above proposal, although I am sure solutions
can be found for every problem.
There are numerous advantages to this approach. Compared with the
current handling, we should quickly get an accurate representation of
interlanguage links on all wikis. We do not have to pick a single
language as "key" language, which would require a key entry in that
language to exist for all pages. [1]
There may be simpler solutions that I cannot see - if so, I would love
to hear about them. But I really think we should consider redesigning
the interlanguage links before the problem grows out of control.
Regards,
Erik
[1] Although that would expose us to charges of anglocentrism, I am open
to discussing this alternative.
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
1. move everything to Postgres
2. move everything to common database, with tables foo_cur, foo_old etc.,
where foo are language names
3. make single user table (needs some tweaking to allow slightly different
preferences), single logging system, single recent changes, and all other
nice things we can do with that
4. move everything to UTF-8, so we don't have to use %escapes in English Wikipedia
5. create table interwiki (source_lang, target_lang, source_title, target_title)
6. convert all cur_text by removing interwiki links from page tops,
and add apropriate entries to interwiki.
7. compute transitive symmetric closure of foo_interwiki, and display that
as interwiki links. If there will be many articles of the same language
in it display it like: "English (Astronomy), English (Astrophysics), German"
In practive it shouldn't be such a big problem.
8. when editing add interwiki links on top of page or in separate box.
add Javascript button to add all from links from transitive symmetric closure
of interwiki
9. add some magic functionality that allows changing many interwiki links at time.
It is needed as transitive symmetric closure often contains many
copies of the same. "delete all interwiki links to this page" and
"change all interwiki links from X to Y" would probably be enough.
Computing symmetric transitive closure will require a bit of magic.
I think we should keep results in database and change only - editing
may be a bit slower, as long as viewing is not any worse than now.
Some of the special pages, like Most Wanted, have been deactivated
during busy times. Would it be a good idea to store the results of the
last generated version as a "regular" article?
Example: I request "Special:Most Wanted" when it is available. The link
list I get is written into "Wikipedia:Most Wanted".
I think the current image handling is slightly messed up, for the
following reasons:
1) There are too many different ways to link small/large versions.
a) There are usability problems with Brion's suggested approach of
including the larger version of an image on the image page:
- The headline may say something like image_small.jpg whereas
the actual image displayed is large
- Clicking through a second time leads to another (usually
empty) page
- Captions effectively have to be entered up to three times
2) Users have to go to too much effort in order to create small versions
of images. This is not something that researchers and authors should
have to waste time with. It also impedes uploading of high resolution
images, which can really hurt us when we start thinking about a printed
edition of Wikipedia.
3) Content of image pages is neglected because it is "hidden" most of
the time. Many people treat image descriptions like changelog entries
(relatively carelessly).
The fact that it even took me a while to understand the current handling
of images doesn't bode well for the usability of the concept.
I propose the following changes:
--------------------------------
1) As suggested earlier, an image page should always display the image
it refers to.
2) Smaller versions of images should be auto-generated in a separate
directory similar to the math/ directory used for texvc's images. The
small versions would be viewed on the article where the [[Image]] tag is
included, whereas the image would link to the original size version.
We could use the GD library functions for creating thumbnails. See, for
example:
http://www.onlinetools.org/articles/creating_thumbnails_all.php
However, auto-determining thumbnail sizes is problematic because a
useful size often depends on context. A proper way to handle this may be
to support the following variants of the [[Image]] tag:
[[Image:foo.jpg width=100 height=100]]
[[Image:foo.jpg width=100]]
[[Image:foo.jpg height=100]]
-> height or width autocalculated as per aspect ratio
[[Image:foo.jpg size=10%]]
The smaller versions would be generated as necessary and stored in a
temporary directory. The matching original image information (date,
size) would be stored in a table so that they can be updated on demand.
3) The image page content should be included by default below the image
(preceded by a <BR>). That way when you type
[[Image:foo.jpg]]
You get
<img src="http://../foo.jpg"><BR>
<I>This is an ugly photo!</I>
To suppress this and type a manual caption, you would have to do
something like:
[[Image:foo.jpg notext]]
That way, you can have
- the standard case: image with a simple caption; no need to update
twice
- the extended case: image with a short caption on the page where it is
embedded and a longer discussion on its image page.
Discussion
----------
The approach discussed above has almost no obvious disadvantages. The
following problems may ensue, though:
- Existing image pages will have to be re-edited to remove now redundant
image content. Existing thumbnail images can be deleted.
- It is somewhat counter-intuitive to have the caption rendered
implicitly on a page that includes an [[Image:foo.jpg]] tag. The
alternative would be to do away with image pages as regular
content-pages altogether. (Realistically, having a separate image
namespace may have been a bad idea in the first place.)
However, having lots of redundant (and often neglected) content is
clearly the least preferable choice.
There would, in my opinion, be massive advantages to having
auto-generated small versions of images. This would greatly increase the
usability on many pages, and make the traditional "click to view larger
version" approach be usable almost anywhere.
Is the GD library installed on Wikipedia's server?
I would appreciate feedback on this proposal. I'd be willing to give the
autogeneration a try, if no one else volunteers.
Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Magnus wrote:
>Erik Moeller wrote:
>>I propose the following changes:
>>--------------------------------
>>
>>1) As suggested earlier, an image page should always
>>display the image it refers to.
>>
>Makes sense.
See my earlier post on this. IMO the only time an
image should be single-click through is when an image
is intentionally displayed on the image description
page. Otherwise a user should have to click twice to
get to the image description page (alt text should
work in both cases though).
>>2) Smaller versions of images should be
auto->>generated in a separate directory similar to
the >>math/ directory used for texvc's images. The
small >>versions would be viewed on the article where
the
>>[[Image]] tag is included, whereas the image would
>>link to the original size version.
>
>Two items with this one:
>1. A thumbnail should be generated upon upload, so we
>don't have to wade through thaton every page display,
>2. *if* and *only if* that is necessary. The images
>DW uploaded lately to replace mine don't really need
>a thumbnail ;-)
IMO this isn't the best way to do things. As described
below smaller images should be created on-the-fly by
using markup to specify desired width (and deleted
after a specified period of not being used). Then if
the [[Image:foo.jpg width=100px]] syntax is used
/then/ the thumbnail can be clicked once to get to the
image description page (which contains the full-sized
displayed image). If there is no image displayed on
the image description page then the user would have to
click twice to get there (with no changing of the
mouse pointer to the little hand). This would give
users the greatest amount of control and flexibility.
Doing things automatically upon upload would be a
nightmare (esp. for images that are inserted in
tables; often a very precise image width is needed).
>>However, auto-determining thumbnail sizes is
>>problematic because a useful size often depends on
>>context. A proper way to handle this may be to
>>support the following variants of the
>>[[Image]] tag:
>>
>> [[Image:foo.jpg width=100 height=100]]
>>
>> [[Image:foo.jpg width=100]]
>> [[Image:foo.jpg height=100]]
>> -> height or width autocalculated as >>per aspect
ratio
>>
>> [[Image:foo.jpg size=10%]]
>
>Why not say: *If* we need a thumbnail, it has a width
>of, say, 150 pixel (just to have a number).
>Width is the "problematic" factor, on smaller
>screens. So, for every image wider than this, a
>thumbnail is used, otherwise the original
>image.
Again, doing this automatically will cause a great
deal of trouble. There are many cases where images
larger than even 250 pixels are used and appropriate
especially if the images are centered or otherwise do
not have text flowing around them. (the optimal range
of widths for images /with/ text flowing around them
is 150-250 pixels with image detail and type usually
playing the deciding factor for the resulting width).
>>....
>...
>>- It is somewhat counter-intuitive to have the
>>caption rendered implicitly on a page that includes
>>an [[Image:foo.jpg]] tag. The alternative would be
>>to do away with image pages as regular content-pages
>>altogether.
>>(Realistically, having a separate image
>>namespace may have been a bad idea in the first
>>place.)
>>
>How about the alt tag thingy I installed at the test
>site?
IMO alt tags are needed for images with and without
larger versions displayed on the image description
page. But please to not get rid of the image
description pages. Very often wiki markup is used
along with external links. These links are not usable
in the form of mouse over text. However, for whatever
reason, if at least one image is /not/ displayed on an
image description page then the image displayed in the
article should have to be clicked twice in order to
get to the image description page (with no display of
the little hand when the mouse pointer is over the
image; just the display of the alt text).
>>However, having lots of redundant (and often
>>neglected) content is clearly the least preferable
>>choice.
>>
>>There would, in my opinion, be massive advantages to
>>having auto-generated small versions of images. This
>>would greatly increase the usability on many pages,
>>and make the traditional "click to view larger
>>version" approach be usable almost anywhere.
>
>I agree. We'll have to think about what image to use
>on "printable version" - the thumbnail to keep
>layout, or the large one for resolution?
Don't forget WYSIWYG. The person should be given the
same article layout they see in regular article as
with the print version. Using the larger version in
the print version would destroy the layout of tables
that have images in them (not to mention that the
larger versions of images with text flowing around
them would result in pages with two word lines next to
huge images). IMO on the image description pages we
should have "Printable version: [Small image] [Large
image]"
-- Daniel Mayer (aka mav)
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
> > Current talk pages are extremely unfriendly.
> > I think they should be abandoned and replaced by
> > more "natural" system of posting.
>
> You mean, not a wiki?
> I think not.
I agree talk pages must remain wiki style for three reasons.
First, adding another system for talk pages makes users learn another
system and that makes things inherently more complex.
Second, talk pages are intended to help people make articles, and
therefore the requirement to learn wiki syntax to post on talk pages
just keeps out those who wouldn't take time to learn to contribute to
actual articles.
Ever since Ward's Wiki is that there is some minimum requirement to
participate -- namely the ability to figure out wiki syntax. This is
not hard, but it does require a little effort. And many have argued
that this is one of the reasons that wiki's can maintain a higher level
of discourse than newsgroups and web based discussion groups.
Third, an important part of the "wiki way" is refactoring, I have
refactored a number of large talk pages removing the dross, keeping
sustentative contributions, and putting it all together to flow more
naturally. This is only possible because of the flexibility of the wiki
system.
Hi,
I am beginning to be confused about the sequence of events when a new
feature is introduced. It seems that one or two members of wikitech-l
(developers) support a new feature, write the code, put it on
test.wikipedia.org and wait for feedback.
At times this wait is less than a day. The feature is usually implemented.
Then, it is announced or noticed by someone on wikipedia-l, and 60 e-mails
follow discussing and arguing over it.
If someone has a new feature they want to see implemented, why don't they
present it to the whole membership first and then allow a few days for
discussion. After everyone has a chance to think about it and raise their
objections, modifications, etc, then implement it, if most members want it.
Recently, there have been remarks on a number of changes, that it might
have been better to have a broader-based discussion, before implementation,
when the usual flood of e-mails followed the announcement of a new feature.
Just because many members of wikipedia do not have the skills to make such
changes, it doesn't mean that they don't have an opinion on them.
Tonight I am watching this process proceed on two changes on the
edit/preview pages. All discussion is on wikitech-l. It has been suggested that
at least one change will probably be implemented tomorrow. Meanwhile, the main
membership has no idea that any such change/s are about to happen.
As Ever,
Ruth Ifcher
--