I've co-authored two extensions that need an additional hook and
additional parameters to two existing hooks. I am requesting commit
access so that I may upload the extensions and amend the hooks.
The first is a spellchecker that I posted here a few months ago:
http://code.google.com/p/wikimediaspellchecker/
along with some contributed improvements that I've received.
I understand (based on reading the archives) that this won't be
included on wikipedia but it is useful for people who have their own
mediawiki installations, at least until ff 2.0 or ie7 become standard
everywhere. Some users of our wiki prefer this spellcheck extension
over the browser spell checkers and google spell check because of its
UI and ability to have separate custom dictionaries per wiki.
Ideally I'd like to commit this to the extensions directory so that it
can be more easily improved by interested persons. If that isn't
possible I'd like to at least commit a hook addition so that users of
this extension don't have to change the mediawiki source every time
they download a new version of mediawiki.
The second is a "track changes" / "svn blame" view that shows who is
responsible for what portions of an encyclopedia entry.
Here are some screenshots:
http://people.virginia.edu/~kjl3d/newlinks.pnghttp://people.virginia.edu/~kjl3d/trackchangesview.png
This is not ready for wikipedia yet but is, again, useful for smaller
local instances of mediawiki. Ideally I'd like to get this checked in
to the extensions directory so that parties interested in improving it
can do so. I'll be updating and improving it for a while but it is
usable (and being used locally) as is. I'd also like to commit some
additional parameters to a few hooks that this extension uses for the
reasons outlined above (users don't have to change the mediawiki
source every time they update their mediawiki version).
Therein lies my request for commit access.
I'm about to try to implement a view similar to the one I've already
posted but instead of displaying the ownership of raw text, it will
display ownership of rendered text. The raw-text ownership extension
already provides an association between raw wiki text and its owner. I
am looking for advice on how to proceed. I have a couple of ideas.
The first is to modify the association as the parser does its thing,
performing the same substitutions as the parser. The difficulty is that
the parser would have to relay precisely what text it has replaced and I
am not sure how this would work. I have considered using offset from
beginning of the text block but this seems to be difficult, problematic,
and finnicky.
In addition to keeping an association between raw wiki text and
authorship, an association between parsed text and ownership could be
constructed. In this case the word-level (or char level or whatever)
diff engine would compare parsed texts, not raw texts. In this case, a
post processor could modify the parsed text (just ignoring html tags) to
mark up who authored it. I am not at this time concerned with the
problem of the performance of the diff engine.
Ideas are welcome.
I just wrote a new extension, see here:
http://www.mediawiki.org/wiki/Extension:WhatsDown
I'll be happy to get feedback from the list about this extension.
seems to me that it's very useful (although I don't think that it'll
replace HP Open View :))
--Yedidia
FYI. Having this code integrated into mediawiki or wikiadmin tools may
be useful for tracking vandals or establishing
forensic evidence for folks who are misusing wikimedia sites.
Jeff
An automated run of parserTests.php showed the following failures:
Reading tests from "/home/brion/src/wiki/phase3/maintenance/parserTests.txt"...
Running test TODO: Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)... FAILED!
Running test TODO: Link containing double-single-quotes '' (bug 4598)... FAILED!
Running test TODO: message transform: <noinclude> in transcluded template (bug 4926)... FAILED!
Running test TODO: message transform: <onlyinclude> in transcluded template (bug 4926)... FAILED!
Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED!
Running test TODO: HTML bullet list, unclosed tags (bug 5497)... FAILED!
Running test TODO: HTML ordered list, unclosed tags (bug 5497)... FAILED!
Running test TODO: HTML nested bullet list, open tags (bug 5497)... FAILED!
Running test TODO: HTML nested ordered list, open tags (bug 5497)... FAILED!
Running test TODO: Parsing optional HTML elements (Bug 6171)... FAILED!
Running test TODO: Inline HTML vs wiki block nesting... FAILED!
Running test TODO: Mixing markup for italics and bold... FAILED!
Running test TODO: 5 quotes, code coverage +1 line... FAILED!
Running test TODO: dt/dd/dl test... FAILED!
Running test TODO: Images with the "|" character in the comment... FAILED!
Running test TODO: Parents of subpages, two levels up, without trailing slash or name.... FAILED!
Running test TODO: Parents of subpages, two levels up, with lots of extra trailing slashes.... FAILED!
Running test TODO: Don't fall for the self-closing div... FAILED!
Running test TODO: Always escape literal '>' in output, not just after '<'... FAILED!
Reading tests from "/home/brion/src/wiki/phase3/extensions/Cite/citeParserTests.txt"...
Reading tests from "/home/brion/src/wiki/phase3/extensions/Poem/poemParserTests.txt"...
6 new PASSING test(s) :)
* <poem>
* <poem> with recursive parsing
* <poem> with leading whitespace
* Horizontal rule
* nested <poem><nowiki>
* nested <poem><nowiki> with formatting
Passed 458 of 477 tests (96.02%)... FAILED!
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I'm finally going ahead and adding a user_editcount field to the user table.
This is primarily meant to be used in heuristic checks such as:
* ability to set an edit count trigger for 'autoconfirmed' user group
* SUL migration primary account determination
As of r18325 the field is simply added, and incremented on edit (with
lazy initialization). It's not yet used to read anything, but I wanted
to slip it in there before we do database updates for other recently
added fields (recentchanges rc_old_len/rc_new_len fields) to avoid
having to do an extra master switch on the live machines.
Since people looooove their edit counts and there's toolserver crud for
it anyway, we may give in and allow this value to be displayed
somewhere. Sigh. :P
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFgVEjwRnhpk1wk44RAuuhAKCyBajEKSUD+tGAVCbs79Fuv+yC1QCgp8D8
wdN8/Y/ts34tLByWLNw4MB4=
=WKLR
-----END PGP SIGNATURE-----
Hello!
You are receiving this email because your project has been select to
take part in a new effort by the PHP QA Team to make sure that your
project still works with PHP versions to-be-released. With this we
hope to make sure that you are either aware of things that might
break, or to make sure we don't introduce any strange regressions.
With this effort we hope to build a better relation between the PHP
Team and the major projects.
If you do not want to receive these heads-up emails, please reply to
me personally and I will remove you from the list; but, we hope that
you want to actively help us making PHP a better and more stable tool.
The first release candidate of PHP 5.2.1 was released today, it can
be downloaded from http://downloads.php.net/ilia/. The focus of this
release has been primarily stabilization with many bug fixes and very
features being added, so a short release cycle is anticipated. If you
discover any (we hope not) please notify PHP's QA team at "php-
qa(a)lists.php.net".
In case you think that other projects should also receive this kinds
of emails, please let me know privately, and I will add them to the
list of projects to contact.
Best Regards,
Ilia Alshanetsky
5.2 Release Master
Okay, I've got some better numbers (and a better method of building the
category intersections table from an existing database).
I build a table with 2.6 million records, all the pages from the snapshot I
had. It's 90% populated with categories (this will make more sense in a
minute) and I'm still getting good results from intersections queries. For
instance:
Showing rows 0 - 1 (2 total, Query took 0.3459 sec) SQL query: SELECT *
FROM `pageindex`
WHERE MATCH (catlist
)
AGAINST ( '+Living_People +People_from_Maine +1956_births'
IN BOOLEAN
MODE
)
LIMIT 0 , 30
(which has two results, btw, Cynthia McFadden and David Kelley).
This takes consistently a third of a second. Not bad. And that's for
2.6million rows. I think we should restrict it to just current
articles in the
main namespace, but that's my opinion.
I figured out a better way to populate the table, too:
UPDATE pageindex,categorylinks SET pageindex.catlist=CONCAT_WS(' ',
pageindex.catlist,categorylinks.cl_to) WHERE (
pageindex.pageid=categorylinks.cl_from AND INSTR(pageindex.catlist,
categorylinks.cl_to) =0)
You have to run this query for a number of times equal to the greatest
number of distinct categories a page has. Most pages have less than 7, but
a few apparently have outrageous numbers. After doing a lot of searching
around for a solution to concatenating multiple rows into one value, I think
this is pretty good.
Aerik
b> Is there some _particular place_
< I think he is talking about image pages.
Well, lots of places, e.g.,
$ GET http://commons.wikimedia.org/wiki/Category:Pressed_flowers|
grep -c 'alt=""'
13
Anyways I read these pages on various devices, all with image
downloading turned off, and I would like to know (at) what (point in
the text) I am missing (seeing something, though I don't want to look
at it anyway, I just want to know it was there and what it was).