Wikitech-l January 2011

wikitech-l@lists.wikimedia.org

94 participants
75 discussions

by Magnus Manske

I get long wait times for images from Commons (>10sec even for small thumbnails). The browsers (Commons, Firefox) also seem to load the same image repeatedly, even though it should be in cache. I'm in the UK. Everything else seems normal (well, I'm in the UK, but otherwise... ;-).

13 years, 3 months

Category sorting and first letters

by Tim Starling

In r80443 I added a feature allowing categories to be sorted using the Unicode Collation Algorithm (UCA). I wanted to briefly talk about the potential user impact, the design choices and the caveats. Sorting was the easy part. The hard part was providing a "first letter" concept which would be reasonably sane. The idea I came up with was to compile a list of first letters, themselves sorted using the UCA. Then the "first letter" of a given string is the nearest letter in the list which sorts above the string. For instance if you have letters A, B, C, and a string Aardvark, if you sort them you get: A Aardvark B C So we know that A is the first letter of Aardvark because Aardvark sorts immediately below A. This algorithm gives us a number of nice properties: * It automatically drops accents, since accented letters sort the same as unaccented letters (at the primary level). Same with case differences, hiragana/katakana, etc. * You can work out the initial Jamo of a Hangul syllable character by just omitting the composed syllables from the "first letter" list. Previously this was done with a special-case hack in Language::firstChar(). * Vowel reordering in Thai and Lao is automatically supported. So "แก" sorts under heading "ก" and "แข" sorts under heading "ข". * The collation can be expanded to support all sorts of other crazy features, and the first letter feature will keep working in a sane way. For instance, you could have an English collation which removed "the" from the start of a title. I compiled a list of 14,742 suitable header characters, identified by processing various Unicode data files. That list probably still needs lots of tweaks. There is a down side to this scheme. The default UCA table gives all characters with a similar logical function to the digits 0-9 the same primary sort order as the corresponding ASCII digits. So a page like [[१९२०]] on the Bihari Wikipedia will sort under a heading of "1" instead of "१". There may be other instances of accidental cultural imperialism. However, this can be fixed by compiling language-dependent lists of header characters. The UCA default table is not meant to sort any language correctly, it's just a compromise collation. Support for language-specific collations can easily be added. Whether we get language-specific collations or not, I'd like to think about enabling this feature on Wikimedia. The most glaring omission from the UCA default tables is sensible sorting of the unified Han. In a Chinese context, there's an obvious way to sort characters, and that's by their order in the KangXi dictionary. The Unihan database gives such an ordering, and it's used within code blocks. But it's not used between code blocks. So if you sort by code point, all the Han characters that aren't in the U+4E00 to U+9FFF block will sort incorrectly. That's what the default UCA does, with a few minor exceptions. In a Japanese context, the way to sort ideographic characters is to convert them to phonetic hiragana and then to sort the resulting string. I don't know if there is any free software for doing this. On the Japanese Wikipedia, they achieve the same result by manually setting the sort key of every page to be the hiragana version of the title. There's lots of room here for other people to get involved, especially if you know a language other than English. -- Tim Starling

13 years, 3 months

Minimum PHP now 5.2 in trunk (was: [Mediawiki-l] about requiring PHP 5.2)

by Chad

On Wed, Nov 3, 2010 at 3:10 AM, Tim Starling <tstarling(a)wikimedia.org> wrote: > I don't think JSON support is particularly important since it can > easily be simulated, and I don't think you should use the filter > extension in MediaWiki, regardless of whether it is supported. > I agree about filter. Having native JSON support is a nicety though, it's faster than a userland implementation. > However, I can think of a good argument for moving to PHP 5.2, which > is to stop the high rate of bit rot in 5.1 support. In particular, > support for callbacks with double-colons to indicate static method calls: > > call_user_func( 'Foo::bar' ) > > was added in PHP 5.2.3. Developers often use these, and don't realise > that they are breaking PHP 5.1 support. So I think there's a good > argument for making 5.2.3 the minimum. > +1 here. a::b syntax is less keystrokes having to use an array. Also lets us remove the stupid hack from r68760[0] (probably similar things elsewhere in the code) > Another example of bit rot: the trunk has 3 calls to > array_fill_keys(), with no simulation in GlobalFunctions.php; it was > added in 5.2.0. Developers should really check the versions in the > manual when they use a function, otherwise 5.2.x will soon be broken > as well, in favour of 5.3.x. But in theory we can weed out calls to > newly-added functions with grep. The 5.2.3 callback change was more > subtle. > Other reasons 5.2 is cool: - setcookie() allows httponly cookies (we conditionally support this) - __toString() works properly - Memory management improved - Lots of other stuff here [1] The consensus last time we brought this up (November) was fairly strong that we can start phasing out 5.1 support. After talking again on IRC with people today, I think we can safely break 5.1 in trunk (although lets not backport it). Once the 1.17 release is out, we should find a way to better update [2] so we can indicate that 1.17 will be the last release with 5.1 support. -Chad [0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/68760 [1] http://php.net/migration52 [2] http://www.mediawiki.org/wiki/Manual:Installation_requirements

13 years, 3 months

WMDE Developer Meetup moved to May

by Daniel Kinzler

Hi all after some discussion, Wikimedia Germany decided not to hold a developer's meet-up around the Chapter's conference in March. We just couldn't fit this in nicely with the venue and the overall organization. Don't despair though: This is what we will do instead: * There will be a hackathon hosted by Wikimedia Germany in (late) May, probably in Berlin, but that's not decided yet. This will mostly about hacking, with a strong focus on GLAM related stuff. There will be little in terms of presentations. * There will be the hacking days attached to Wikimania in Haifa, August 3./4. I'm in charge of setting up the program for that, and I'll try to make it a nice mix of discussing technology and actually hacking. I would also like to have a get-together with thechies and chapter folks at some point during Wikimania. I hope that this way, we can give the hacking events the attention they deserve. Let me know what you think. -- daniel

13 years, 3 months

X-Frame-Options DENY / SAMEORIGIN and cross-wiki userscripts

by Jérémie Roquet

Hello everyone, happy new year. Following #26561 [1] and the MediaWiki security release 1.16.1 [2], some cross-wiki userscripts of mine do not work anymore. Namely, these scripts are: - iKiwi [3] which is used to retrieve all interwikis of a local article from another wiki and is extensively used by the French interwikification project [4]; - xmsg [5] which is used to check new messages on other wikis when login on (and I'm probably the only person to use). Both of them use a trick with an iframe to allow javascript requests across the wikipedia.org subdomains (something that is not possible using AJAX). So, my question are: - Does anybody know if having X-Frame-Options set to SAMEORIGIN would allow such tricks while still preventing clickjacking attacks from other domains (the actual question is: `would it work'?) - If so, is there any reason to use DENY instead of SAMEORIGIN, ie. is there any pragmatic reason to forbid frames from the very same domain (wikipedia.org)? Any other idea on how to make such tools work again would of course be highly appreciated. Thanks in advance, [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=26561 [2] http://lists.wikimedia.org/pipermail/mediawiki-announce/2011-January/000093… [3] http://en.wikipedia.org/wiki/User:Arkanosis/iKiwi.js [4] http://fr.wikipedia.org/wiki/Projet:Interwikification [5] http://fr.wikipedia.org/wiki/User:Arkanosis/xmsg.js -- Jérémie

13 years, 3 months

Extending WikiEditor toolbar

by Alex

I'm the developer of the "RefToolbar" gadget on the English Wikipedia, which adds additional functionality to the editing toolbar to add references. The script itself is at http://en.wikipedia.org/wiki/User:Mr.Z-man/refToolbar_2.0.js with some more in the /base.js subpage imported near the top. It adds an additional section to the toolbar with a dialog for common reference templates. I tried to integrate it as neatly as possible, though the lack of documentation hasn't helped. Everything works fine, except when using IE. In IE, it just inserts the ref in some random location on the page, sometimes *near* where the cursor/highlight is, but sometimes its not even close. It seems to work differently with different compatibility mode settings, but even that isn't consistent. I've looked at some of the WikiEditor code and can't figure out why it isn't working, when the other dialogs in the standard toolbar work just fine. The relevant code, from the dialog object: buttons: { 'cite-form-submit': function() { var ref = CiteTB.getRef(false, true); $j.wikiEditor.modules.toolbar.fn.doAction( $j(this).data( 'context' ), { type: 'replace', options: { pre: ref } }, $j(this) ); $j(this).dialog( 'close' ); }, Any thoughts would be appreciated. -- Alex (wikipedia:en:User:Mr.Z-man)

13 years, 3 months

Re: [Wikitech-l] Wikitech-l Digest, Vol 90, Issue 33

by Edmund Fisher

Huh??? www.englishfreeroam.co.cc On 17 Jan 2011, at 17:41, wikitech-l-request(a)lists.wikimedia.org wrote: > Send Wikitech-l mailing list submissions to > wikitech-l(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > or, via email, send a message with subject or body 'help' to > wikitech-l-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wikitech-l-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikitech-l digest..." > > > Today's Topics: > > 1. Category sorting and first letters (Tim Starling) > 2. Re: From page history to sentence history (Bryan Tong Minh) > 3. Re: From page history to sentence history (Alex Brollo) > 4. WMDE Developer Meetup moved to May (Daniel Kinzler) > 5. Re: WYSIFTW status (Aryeh Gregor) > 6. Re: [Toolserver-l] WMDE Developer Meetup moved to May > (Daniel Kinzler) > 7. Re: June 8th 2011, World IPv6 Day (Aryeh Gregor) > 8. Re: WMDE Developer Meetup moved to May (Chad) > 9. Re: From page history to sentence history (Aryeh Gregor) > 10. Re: From page history to sentence history (Anthony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 18 Jan 2011 02:00:09 +1100 > From: Tim Starling <tstarling(a)wikimedia.org> > Subject: [Wikitech-l] Category sorting and first letters > To: wikitech-l(a)lists.wikimedia.org > Message-ID: <ih1lhs$pmn$1(a)dough.gmane.org> > Content-Type: text/plain; charset=UTF-8 > > In r80443 I added a feature allowing categories to be sorted using the > Unicode Collation Algorithm (UCA). I wanted to briefly talk about the > potential user impact, the design choices and the caveats. > > Sorting was the easy part. The hard part was providing a "first > letter" concept which would be reasonably sane. The idea I came up > with was to compile a list of first letters, themselves sorted using > the UCA. Then the "first letter" of a given string is the nearest > letter in the list which sorts above the string. > > For instance if you have letters A, B, C, and a string Aardvark, if > you sort them you get: > > A > Aardvark > B > C > > So we know that A is the first letter of Aardvark because Aardvark > sorts immediately below A. This algorithm gives us a number of nice > properties: > > * It automatically drops accents, since accented letters sort the same > as unaccented letters (at the primary level). Same with case > differences, hiragana/katakana, etc. > > * You can work out the initial Jamo of a Hangul syllable character by > just omitting the composed syllables from the "first letter" list. > Previously this was done with a special-case hack in > Language::firstChar(). > > * Vowel reordering in Thai and Lao is automatically supported. > So "??" sorts under heading "?" and "??" sorts under heading "?". > > * The collation can be expanded to support all sorts of other crazy > features, and the first letter feature will keep working in a sane > way. For instance, you could have an English collation which removed > "the" from the start of a title. > > I compiled a list of 14,742 suitable header characters, identified by > processing various Unicode data files. That list probably still needs > lots of tweaks. > > There is a down side to this scheme. The default UCA table gives all > characters with a similar logical function to the digits 0-9 the same > primary sort order as the corresponding ASCII digits. So a page like > [[????]] on the Bihari Wikipedia will sort under a heading of "1" > instead of "?". There may be other instances of accidental cultural > imperialism. However, this can be fixed by compiling > language-dependent lists of header characters. > > The UCA default table is not meant to sort any language correctly, > it's just a compromise collation. Support for language-specific > collations can easily be added. Whether we get language-specific > collations or not, I'd like to think about enabling this feature on > Wikimedia. > > The most glaring omission from the UCA default tables is sensible > sorting of the unified Han. > > In a Chinese context, there's an obvious way to sort characters, and > that's by their order in the KangXi dictionary. The Unihan database > gives such an ordering, and it's used within code blocks. But it's not > used between code blocks. So if you sort by code point, all the Han > characters that aren't in the U+4E00 to U+9FFF block will sort > incorrectly. That's what the default UCA does, with a few minor > exceptions. > > In a Japanese context, the way to sort ideographic characters is to > convert them to phonetic hiragana and then to sort the resulting > string. I don't know if there is any free software for doing this. On > the Japanese Wikipedia, they achieve the same result by manually > setting the sort key of every page to be the hiragana version of the > title. > > There's lots of room here for other people to get involved, especially > if you know a language other than English. > > -- Tim Starling > > > > > ------------------------------ > > Message: 2 > Date: Mon, 17 Jan 2011 16:29:58 +0100 > From: Bryan Tong Minh <bryan.tongminh(a)gmail.com> > Subject: Re: [Wikitech-l] From page history to sentence history > To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> > Message-ID: > <AANLkTi=w=6we2xngMMNikuFfMTH8KRtiVzXRSibJU-pX(a)mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Mon, Jan 17, 2011 at 3:49 PM, Anthony <wikimail(a)inbox.org> wrote: >> How would you define a particular sentence, paragraph or section of an >> article? ?The difficulty of the solution lies in answering that >> question. >> > > Difficult, but doable. Jan-Paul's sentence-level editing tool is able > to make the distinction. It would perhaps be possible to use that as a > framework for sentence-level diffs. > > > Bryan > > > > ------------------------------ > > Message: 3 > Date: Mon, 17 Jan 2011 16:40:28 +0100 > From: Alex Brollo <alex.brollo(a)gmail.com> > Subject: Re: [Wikitech-l] From page history to sentence history > To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> > Message-ID: > <AANLkTi=WhAZ1d5ty9hbkdD-7LkfSd_Fy0VtEvjxAdPQn(a)mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > 2011/1/17 Bryan Tong Minh <bryan.tongminh(a)gmail.com> > >> >> Difficult, but doable. Jan-Paul's sentence-level editing tool is able >> to make the distinction. It would perhaps be possible to use that as a >> framework for sentence-level diffs. >> > > Difficult, but diff between versions of a page does it. Looking at diff > between pages, I simply thought firmly that only diff paragraphs were > stored, so that the page was built as updated diff segments. I had no idea > how this could be done, but all was "magic"! > > Alex > > > ------------------------------ > > Message: 4 > Date: Mon, 17 Jan 2011 17:11:12 +0100 > From: Daniel Kinzler <daniel(a)brightbyte.de> > Subject: [Wikitech-l] WMDE Developer Meetup moved to May > To: wikitech-l(a)lists.wikimedia.org, toolserver-l(a)lists.wikimedia.org, > MediaWiki announcements and site admin list > <mediawiki-l(a)lists.wikimedia.org> > Cc: Nicole Ebber <nicole.ebber(a)wikimedia.de>, Pavel Richter > <pavel.richter(a)wikimedia.de> > Message-ID: <4D346A20.107(a)brightbyte.de> > Content-Type: text/plain; charset=UTF-8 > > Hi all > > after some discussion, Wikimedia Germany decided not to hold a developer's > meet-up around the Chapter's conference in March. We just couldn't fit this in > nicely with the venue and the overall organization. Don't despair though: > > This is what we will do instead: > > * There will be a hackathon hosted by Wikimedia Germany in (late) May, probably > in Berlin, but that's not decided yet. This will mostly about hacking, with a > strong focus on GLAM related stuff. There will be little in terms of presentations. > > * There will be the hacking days attached to Wikimania in Haifa, August 3./4. > I'm in charge of setting up the program for that, and I'll try to make it a nice > mix of discussing technology and actually hacking. I would also like to have a > get-together with thechies and chapter folks at some point during Wikimania. > > I hope that this way, we can give the hacking events the attention they deserve. > Let me know what you think. > > -- daniel > > > > ------------------------------ > > Message: 5 > Date: Mon, 17 Jan 2011 11:31:27 -0500 > From: Aryeh Gregor <Simetrical+wikilist(a)gmail.com> > Subject: Re: [Wikitech-l] WYSIFTW status > To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> > Message-ID: > <AANLkTikudZhXBHndkeHEwsUqHvCqBZ2VESTKM7xoZTn2(a)mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Sun, Jan 16, 2011 at 7:16 PM, Magnus Manske > <magnusmanske(a)googlemail.com> wrote: >> There is the question of what browsers/versions to test for. Should I >> invest large amounts of time optimising performance in Firefox 3, when >> FF4 will probably be released before WYSIFTW, and everyone and their >> cousin upgrades? > > Design for only the fastest browsers. Other browsers could always > just be dropped back to the old-fashioned editor. > > > > ------------------------------ > > Message: 6 > Date: Mon, 17 Jan 2011 17:39:31 +0100 > From: Daniel Kinzler <daniel(a)brightbyte.de> > Subject: Re: [Wikitech-l] [Toolserver-l] WMDE Developer Meetup moved > to May > To: toolserver-l(a)lists.wikimedia.org > Cc: MediaWiki announcements and site admin list > <mediawiki-l(a)lists.wikimedia.org>, wikitech-l(a)lists.wikimedia.org, > Asaf Bartov <asaf.bartov(a)gmail.com>, Pavel Richter > <pavel.richter(a)wikimedia.de>, Nicole Ebber <nicole.ebber(a)wikimedia.de> > Message-ID: <4D3470C3.4040304(a)brightbyte.de> > Content-Type: text/plain; charset=ISO-8859-1 > > On 17.01.2011 17:14, Asaf Bartov wrote: >> Correction: Haifa Hacking Days are to be held August 2nd-3rd. >> Wikimania itself will be Aug 4th-6th. > > Gah! Thanks Asaf. > > There I went and looked it up, and then wrote the wrong thing into the email. > Curses. > > -- daniel > > > > ------------------------------ > > Message: 7 > Date: Mon, 17 Jan 2011 11:44:28 -0500 > From: Aryeh Gregor <Simetrical+wikilist(a)gmail.com> > Subject: Re: [Wikitech-l] June 8th 2011, World IPv6 Day > To: Happy-melon <happy-melon(a)live.com>, Wikimedia developers > <wikitech-l(a)lists.wikimedia.org> > Message-ID: > <AANLkTikk20OAKv-vreinxD-oBmfnzLbo97=xROQebpDX(a)mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Sun, Jan 16, 2011 at 7:12 PM, Happy-melon <happy-melon(a)live.com> wrote: >> I don't entirely understand the point of this. ?The plan seems to be """get >> a large enough fraction of 'the internet' to make a change which breaks for >> some people all at the same time, so that those people get angry with the >> ISPs that haven't got off their arses to fix said breakage, rather than >> angry with the broken sites""", which is fair enough. > > No, the point is to test what happens if IPv6 is supported on a large > scale. It's known from small-scale testing that this will break > things for some small percentage of users, but no one's sure what the > consequences are of switching this on fully for everyone. > >> But AFAICT, the >> breakage won't occur if your connection can't 'do' IPv6, but only if your >> connection can't 'do' both IPv4 *and* IPv6 on the same site at the same >> time. ?Surely that's not actually the problem that we need to solve if we're >> to be able to migrate smoothly onto IPv6? ?When the IPv4 addresses run out, >> we need to be able to start setting up websites which are *only* v6, surely? > > There are many more clients in the world than servers, and servers > have always been able to get dedicated IPv4 addresses much more easily > than clients. A server Internet connection in America will typically > come with as many IPv4 addresses as you need, while you usually can't > get a dedicated residential IP address unless you pay extra. (And > America has more IP addresses allocated per capita than anywhere else > in the world, since it originally developed the Internet.) > > So as IPv4 addresses become scarcer, the pressure to use IPv6 only > will fall mostly on residential users. Clients with only an IPv6 > address will only be able to get direct connections to IPv6-enabled > servers. The way servers are supposed to do this is serve both A and > AAAA records for the same domain, so IPv4 clients use the A record and > IPv6 clients use the AAAA record. > > Unfortunately, someone at some point decided that if the client > supports both IPv4 and IPv6, and the server publishes both A and AAAA > records, the client should connect via IPv6. In practice, almost no > sites use IPv6, so the infrastructure is much less well-tested. > Clients that think they have IPv6 connections might actually have the > connection eaten by a middlebox, or just be slower or less reliable. > So sites don't turn on the AAAA records in practice because it > degrades service for clients with IPv6 connections, which means the > servers aren't accessible to IPv6-only clients without workarounds. > > IPv6 day is an attempt to see what happens if major sites publish AAAA > records for a while. Stuff will break, but hopefully not too > horribly, and it will give both site operators and ISPs the chance to > analyze what's wrong with their IPv6 support and what they can do to > fix it. This is a step toward major sites publishing AAAA records all > the time, which is necessary to support IPv6-only clients. > > Something like that, anyway. I'm hardly an expert on these things. > > > > ------------------------------ > > Message: 8 > Date: Mon, 17 Jan 2011 11:45:33 -0500 > From: Chad <innocentkiller(a)gmail.com> > Subject: Re: [Wikitech-l] WMDE Developer Meetup moved to May > To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> > Cc: toolserver-l(a)lists.wikimedia.org, MediaWiki announcements and site > admin list <mediawiki-l(a)lists.wikimedia.org> > Message-ID: > <AANLkTim3Q5CS20O=CRVo0A2z7nNbqftrhaUFFgvBq2+g(a)mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Jan 17, 2011 at 11:11 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote: >> * There will be a hackathon hosted by Wikimedia Germany in (late) May, probably >> in Berlin, but that's not decided yet. This will mostly about hacking, with a >> strong focus on GLAM related stuff. There will be little in terms of presentations. >> > > Late May? That's actually *really* awesome. Now I don't have > to miss school to come :D > > -Chad > > > > ------------------------------ > > Message: 9 > Date: Mon, 17 Jan 2011 11:47:35 -0500 > From: Aryeh Gregor <Simetrical+wikilist(a)gmail.com> > Subject: Re: [Wikitech-l] From page history to sentence history > To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> > Message-ID: > <AANLkTinBdUX_v4d0gvxzm=BF_LE+1aQrMmjhk8xsvFE8(a)mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Jan 17, 2011 at 5:55 AM, Alex Brollo <alex.brollo(a)gmail.com> wrote: >> Before I dig a little more into wiki mysteries, I was absolutely sure that >> wiki articles were stored into small pieces (paragraphs?) so that a small >> edit into a long long page would take exactly the same disk space than a >> small edit into a short page. But I discovered soon, that things are >> different. :-) > > Wikimedia stores diffs using delta compression, so actually this is > basically what happens. The size of the edit is what determines the > size of the stored diff, not the size of the page. (I don't know how > this works in detail, though.) IIRC, default MediaWiki doesn't work > this way. > > > > ------------------------------ > > Message: 10 > Date: Mon, 17 Jan 2011 12:41:22 -0500 > From: Anthony <wikimail(a)inbox.org> > Subject: Re: [Wikitech-l] From page history to sentence history > To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> > Message-ID: > <AANLkTinfD+PEoAWN1T4XyZaeCwPO1_NeXm0EoDgLjzoH(a)mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Mon, Jan 17, 2011 at 10:40 AM, Alex Brollo <alex.brollo(a)gmail.com> wrote: >> 2011/1/17 Bryan Tong Minh <bryan.tongminh(a)gmail.com> >> >>> >>> Difficult, but doable. Jan-Paul's sentence-level editing tool is able >>> to make the distinction. It would perhaps be possible to use that as a >>> framework for sentence-level diffs. >>> >> >> Difficult, but diff between versions of a page does it. Looking at diff >> between pages, I simply thought firmly that only diff paragraphs were >> stored, so that the page was built as updated diff segments. I had no idea >> how this could be done, but ?all was "magic"! > > Paragraphs are much easier to recognize than sentences, as wikitext > has a paragraph delimiter - a blank line. To truly recognize > sentences, you basically have to engage in natural language > processing, though you can probably get it right 90% of the time > without too much effort. > > And to recognize what's going on when a sentence changes *and* is > moved from one paragraph to another, requires an even greater level of > natural language understanding. Again though, you can probably get it > right most of the time without too much effort. > > Wikitext actually makes it easier for the most part, as you can use > tricks such as the fact that the periods in [[I.M. Someone]] don't > represent sentence delimiters, since they are contained in square > brackets. But not all periods which occur in the middle of a sentence > are contained in square brackets, and not all sentences end with a > period. > > I'd say "difficult but doable" is quite accurate, although with the > caveat that even the state of the art tools available today are > probably going to make mistakes that would be obvious to a human. I'm > sure there are tools for this, and there are probably some decent ones > that are open source. But it's not as simple as just adding an index. > > > > ------------------------------ > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > End of Wikitech-l Digest, Vol 90, Issue 33 > ******************************************

13 years, 3 months

June 8th 2011, World IPv6 Day

by Maarten Dammers

"""On 8 June, 2011, Google, Facebook, Yahoo!, Akamai and Limelight Networks will be amongst some of the major organisations that will offer their content over IPv6 for a 24-hour "test drive". The goal of the Test Drive Day is to motivate organizations across the industry – Internet service providers, hardware makers, operating system vendors and web companies – to prepare their services for IPv6 to ensure a successful transition as IPv4 addresses run out. """ See http://isoc.org/wp/worldipv6day/ . Shouldn't Wikimedia participate in this event? What needs to be done to make this possible? Maarten

13 years, 3 months

Re: [Wikitech-l] [Toolserver-l] WMDE Developer Meetup moved to May

by Daniel Kinzler

On 17.01.2011 17:14, Asaf Bartov wrote: > Correction: Haifa Hacking Days are to be held August 2nd-3rd. > Wikimania itself will be Aug 4th-6th. Gah! Thanks Asaf. There I went and looked it up, and then wrote the wrong thing into the email. Curses. -- daniel

13 years, 3 months

Image.php is deprecated need to replace with something else

by Beebe, Mary J

Within one of our older internal extensions, we have these 2 lines: $targetAsImage = Image::newFromTitle($onePage); $allPagesLinkedToTarget = $targetAsImage->getLinksTo(); We were trying to get a list of wiki titles that link to an image. This does not seem to work anymore with media wiki 1.15. Now that it is truly deprecated, what should we replace this with? We are using the following: MediaWiki<http://www.mediawiki.org/> 1.15.3 PHP<http://www.php.net/> 5.2.8 (cgi-fcgi) MySQL<http://www.mysql.com/> 5.1.40-community Thanks, Mary Beebe

13 years, 3 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l January 2011