Um, I asked earlier but don't know what became of it. Sometimes I
want to quickly eyeball a html document for text content (to test CGI
script output when run from the command line, for instance). I like
to have lynx around to do that because it saves me having to wade
through loads of html tags. Could we have that, please?
Happy New Year to everybody.
I've resisted asking on this list since there seemed to be a system
[1] for gaining accounts, but since it's been some time and I've not
heard whether I would be accepted or not, I'm asking now. :)
Are accounts still available, and if so, what qualifications should I
present? I've been doing some work, but so far, it's not useful. ;-)
[1]
http://meta.wikimedia.org/wiki/Toolserver#People_interested_to_join
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
By any chance, are there any plans to give Subversion hosting to people
with Toolserver accounts? That would assist in project development,
releasing and collaboration.
- --
Edward Z. Yang Personal: edwardzyang(a)thewritingpot.com
SN:Ambush Commander Website: http://www.thewritingpot.com/
GPGKey:0x869C48DA http://www.thewritingpot.com/gpgpubkey.asc
3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
iD8DBQFDtGKmqTO+fYacSNoRAkVbAJ4nQs5VCH39wdu62HcTfxOSY5MKFQCdFBx+
rBnlLBib8xnsHRKpgQRPi6E=
=iPd/
-----END PGP SIGNATURE-----
hi.
people already familiar with SQL's transaction isolation level and MVCC
databases can ignore this post. anyone who doesn't may want to read it.
the problem:
a lot of the queries used on zedler are report queries which examine a lot
of records (possibly every page, link or revision) to generate the result.
not only do these take a long time to run, but they slow down the SQL
server for other uses, since InnoDB queries are transactional, and the
server must ensure that changes to the database don't affect the query.
in most cases -- like on the production site -- this is what you want, and
the performance penalty is acceptable. however, for other uses, you may be
more interested in the query completing quickly than having entirely
consistent results (especially when your result is going to be several
hours out of date anyway).
for this purpose, MySQL provides a command, SET TRANSACTION ISOLATION LEVEL.
this allows you to control (for the current connection only) how consistent
queries will be. this and other related InnoDB issues are explained in
detail at:
http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-model.html
(section 14.2.10.3 in particular) but i'll provide a brief summary here.
the default isolation level is "repeatable read". this means that (mostly)
everything you do will be consistent relative to the first query you run in
each transaction. if the table changes, your transaction will not see the
changes.
the least consistent isolation level is "read uncommitted". this does not
perform any locking or multi-versioning for the query. the downside of
this is that you may see dirty data (slightly out of date or uncommitted)
in the result. however, the advantage is that the query will run much
faster.
you can change the current isolation level with the command:
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
if you're running slow queries on the database, you may like to test this
and see how much difference it makes.
you may also want to review the manual sections linked above for more
details.
k.
users on zedler are being moved to a different zone than mysql. the
user-visible effects of this are as follows:
* you can no longer log in to zedler.knams.wikimedia.org. instead, log in
to login-services.zedler.knams.wikimedia.org (you may find
"tools.wikimedia.de" easier to type).
* the IP for zedler will be changing to:
2001:610:672:1:145:97:39:142 (IPv6)
145.97.39.142 (IPv4)
this will probably not affect you unless you have a separate domain hosted
on zedler, in which case you will need to change the IP(s).
* MySQL can no longer be accessed at "localhost". instead, use "sql" as the
host.
* home directories are now in /home. compatibility links in /u01/u exist.
there may be some residual problems resulting from this change. please
inform me of anything which does not work as expected.
please note that DNS for the new hostname may take up to an hour to update.
you can use the IP if you need to log in before then.
k.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
This particular rule piqued my interest:
[[meta:Toolserver/Rules]] wrote:
> The installation of large web applications (phpmyadmin, mediawiki,
...) is not allowed.
In this context, what does "installation" and "large" mean? It probably
would be okay to be storing the codebase of MediaWiki in order to
piggyback off of parsing functions or use Revision::getRevisionText (as
noted in [[meta:Toolserver/For users]])... so what's the catch?
- --
Edward Z. Yang Personal: edwardzyang(a)thewritingpot.com
SN:Ambush Commander Website: http://www.thewritingpot.com/
GPGKey:0x869C48DA http://www.thewritingpot.com/gpgpubkey.asc
3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
iD8DBQFDs050qTO+fYacSNoRAl9CAJ4g42oygq4KRVuHmPGFT9QoxnlsHwCggHDV
Pr1/eu+Agi4S2/QwAKn+24k=
=m7nh
-----END PGP SIGNATURE-----
My vandalism analysis tool, which uses a simple but powerful
methodology developed by Brian0918, analyses edit summaries on
articles to spot probable vandalism reverts by recognising the summary
patterns of standard rollbacks, and edits labelled "rvv", "rv v" or
"rvc". It was developed for English Wikipedia but probably has
applications beyond that, and the methods developed here have obvious
utility beyond the recognition and reporting of vandalism.
You can visit it here:
http://tools.wikimedia.de/~tony_sidaway/
Please try to break it, and tell me what happened. There is a link to
a discussion page for that purpose.
The rationale is that, while vandalism is difficult to recognise
electronically, a pretty easy and reasonably reliable way to track
vandalism on a popular wiki article is to examine edit summaries and
count the proportion of them that indicate that the editors apparently
believed themselves to be reverting vandalism.
A highly experimental adaptation of this script to recognise (only)
rollbacks on the German Wikipedia is here:
http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi
The text of the latter CGI script is currently in English, although it
is analyzing German text. As I know nothing about
internationalization I have no idea whether it will always perform
correctly if UTF-8 multibyte characters (such as o-umlaut) are
entered.
This simple test seems to suggest that it does work:
http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi?article=Köln
Wikipedia is an international project and I welcome any and all
testing input on this.
Presently I don't know of any edit summary patterns that
non-administrators on the German Wikipedia use to indicate that
they're reverting what we on English Wikipedia would recognise as
simple vandalism--as I'm unfamiliar with their practises I'm not even
certain that they draw the same distinctions that we do on English
Wikipedia between intentional and overt disruptive edits (simple
vandalism) and more subtle vandalism or trolling.
Any help on this that German speakers can offer would be most welcome.
Although I address the German Wikipedia prominently because its
community is highly advanced and well organized, its content
comparable to that of the English Wikipedia, and (not least) Deutsche
Wikipedia hosts the tool server, I would also love to produce useful
tools for as many languages as possible--the skills I learn can be put
to use in tools of more general use than the current one. The scripts
I write can easily be internationalized. I cannot write good German
(whenever I try, native German speakers beg me to stop!) but I can
write good French and reasonable Spanish. I am particularly
interested in Chinese, Indian languages, and Russian.
i am planning to upgrade Zedler to Solaris 10 Update 1 at some point,
probably tonight/tomorrow morning (UTC).
assuming everything goes as planned, there should be no disruption during
the upgrade, but a reboot will be required when it's finished.
apologies for any inconvenience.
k.
The program presented here is simple and should be easy to translate
into Perl, C or any other reasonable computer language.
A basic knowledge of lisp list processing operators (car, cdr, etc) is
all that is required to understand this program, which is written in
the Guile dialect of Scheme.
The program analyzes the revision table and uses vandalism reverts as
a proxy for vandalism. This relies on the assumption that vandalism
is quickly detected and corrected by reverting to an earlier version,
and also assumes that administrators do not abuse the rollback
facility to perform non-vandalism reverts. It further assumes that
editors do not incorrectly label edit warring as vandalism. These
assumptions are broadly valid for the article chosen, [[en:George W.
Bush]], but may not be true for all popular articles.
The program is configured to work in English, but it may be possible
to apply the same methods in other languages by changing the match
patterns for vandalism reverts.
Multibyte characters may break this program. Sorry, it isn't my area
of expertise.
http://en.wikipedia.org/wiki/User:Tony_Sidaway/Dubya_vandalism
hi.
i've noticed some users seem to be unaware of either rules specific to
Zedler, or general php security issues.
please be aware that:
* you must not install third-party web applications on Zedler. this
includes putting mediawiki source code in your public_html, even if
you don't configure it. this also includes phpmyadmin. this also
includes applications protected by passwords or other access
restrictions. there are no exceptions to this. (if you believe you
have a very good reason to do this, ask me first.)
if you must use it, put it elsewhere, and keep it up to date. DO NOT
provide access to it via HTTP. the only valid reason for installing
MediaWiki is to run maintenance scripts from the command line, or using
MW libraries in your own applications.
this is extremely important. i will start disabling applications
which do not conform to this rule.
* do not place sensitive information (such as passwords) in
world-readable files. since CGI scripts, including PHP, run as your
uid, there is no need to do this.
* when you use data from $_GET, $_POST, etc. in SQL queries, you MUST
escape it. please familiarise yourself with this function:
http://uk.php.net/mysql_real_escape_string
* when you print user-supplied data in HTML, you must also escape it:
http://uk.php.net/manual/en/function.htmlspecialchars.php
neither of the last two are specific to PHP, but for some reason PHP code
seems to be a lot worse, on average.
if you have not already done so, please ensure you are familiar with the
rules for Zedler users:
http://meta.wikimedia.org/wiki/Toolserver/Rules
k.