2011/2/5 River Tarnell <r.tarnell(a)ieee.org>
In article
<AANLkTikWLU5Y8C2UokYRN=v1-zwhb1ktHNXi4xtbmXja(a)mail.gmail.com>om>,
David Gerard <dgerard(a)gmail.com> wrote:
On 5 February 2011 15:12, Alex Brollo
<alex.brollo(a)gmail.com> wrote:
Just to let you know that Aubrey just prestented
it.source idea for
wikicaptcha into wikisource-l
What would it take to get this into place? What's
the captcha load on
WMF sites? Would e.g. the toolserver melt under the load? Perhaps on
one project at a time?
I don't think this should be hosted on the Toolserver; as CAPTCHAs are a
core part of the site, they should not rely on the TS to work.
- river.
IMHO, it could be an opportunity to think again to the role of Commons as a
central library. I imagine something like this:
1. as soon as a djvu file with a text layer is uploaded, a complete set of
pages text layers is extracted, saving words coordinates too;
2. such text layers could be browsed by a script, extracting all words
marked as doubtful (usually with a ^ characters), but extracting too words
which don't match with a good dictionary;
3. a dynamic recaptcha database is updated and word images are submitted to
wiki contributors, both as a formal captcha for unlogged user edits, and as
a volunteer job to help wikisource projects; updates will fix text files;
4. a tool should be build, to upload "pure text" from such text files into
any wikisource project;
5. finally refined text could be re-uploaded into djvu file, so converting
it into a "djvu file with a wiki text layer".
Alex
4.