Hmm I don't think this thread is a good place for fighting language
wars. Your POV is that PHP is a bad language; my POV is that PHP
offers reasonable trade-off between performance, standards support,
cost, reliability, complexity and so on. However, I did some research
and it looks like the first prize in the category "Best support of
Unicode in regular expressions" goes to Perl (Perl is cited as example
many times by
Unicode.org). Unfortunately PHP clearly sucks at the
moment (even with mbstring extension). Perhaps version 6 will change
that. So it might make sense to rewrite standalone component of a new
diff engine in Perl.
Also it looks like some people don't understand punctuation issue. In
Unicode *standard* punctuation marks can be below 0xc0 as well as
*above*. If you look at code written by Tim Starling you'll see:
// Punctuation and control characters
if (ch < 0xc0) return false;
So basically code above assumes that punctuation marks can only have
codes below 0xc0 which is incorrect. On the other hand if you type in
MS Word left single quotation mark then sequence of letters then right
single quotation mark only sequence of letters will be spell checked.
Which is nice and shows that MS Word developers respect at least
Unicode standard. In other words Word sees difference between *all*
Unicode punctuation marks and all Unicode letters.
But you won't be able to repeat same trick with Mediawiki. Current
diff engine considers all punctuation marks with codes above 0xc0 to
be letters and makes them part of a word. Tim Starling in his defence
says that high numbered punctuation is rare and the fact it is
processed incorrectly won't do much damage. Well to a certain extend
it's a good defence but if you accept it then you should also accept
statements like "Opera is rarely used browser so if Wikipedia renders
incorrectly in Opera it wouldn't do much damage" or "Supporting of
just IE and FF is sufficient enough". BTW I noticed few glitches with
how Wikipedia is displayed in Opera.
Probably I've drunk too much open source Kool-Aid but here is a good
example of proprietary product (manufactured by so-much-hated
Microsoft) obeying standards and open source software that selectively
supports standards.
Someone suggested to me to fix it. Well I'm afraid I'm more on a bug
creating side of things :)
In fact, I was expecting that "Unicode Nazis" will rush to fix it.
Instead all I got were "who cares" type of responses. I guess I
should add more water to my Kool-Aid next time …
Also small suggestion to all new participants in this thread – please
state whether you like or not feature in question (you can find a
description in the original e-mail).
On 10/06/06, Timwi <timwi(a)gmx.net> wrote:
Roman Nosov wrote:
I'm totally agree with Timwi – proper Unicode
support is a requirement
not a feature. However can someone tell me why PHP comes with no
appropriate out-of-box support for such vital feature in 21 century?
Well, you see, that (the fact that PHP misses out on the most basic
vital features, not just Unicode specifically) is kind of why no
sensible 21st-century programmer would ever recommend PHP to start
writing something new, and why the only people who choose or even
recommend PHP are amateurs. Now obviously we are "stuck" with MediaWiki
written in PHP, so we have to use it and live with its severe
shortcomings...
For example look no further than Wikipedia's
current diff engine.
You have mentioned some properties of the current diff engine, but I'm
afraid I don't see how any of them are in any way a problem or an issue.
Timwi
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l