Wikisource-l April 2010

wikisource-l@lists.wikimedia.org

13 participants
16 discussions

by Lars Aronsson

The Swedish Wikisource is copying scanned books from various sources. You typically find a PDF or DJVU file, containing both scanned images and raw OCR text, that you upload to Commons, create an Index: page with the <pagelist/> tag. Some of these books have pretty miserable OCR text, perhaps because the Norwegian National library scanned a Swedish book with their OCR software set to Norwegian. Somebody with an OCR program needs to run a new OCR on these images. Fortunately, it is quite easy to feed the PDF or DJVU file into an OCR program such as Finereader, and use a bot to update the pages. We now have one user on sv.wikisource doing this. For these Index: pages, I created a category:OCR-kö (meaning: queue of OCR requests). When trying to interwiki link, I found a similar category on de.wikisource, but similar categories on fr, en, and pt had been removed. What's the story behind that? Don't you need OCR requests in these languages? The comment on the English page mentions an OCR robot on the toolserver. Really? Exist: http://de.wikisource.org/wiki/Kategorie:OCR-Anfragen http://sv.wikisource.org/wiki/Kategori:OCR-k%C3%B6 Have been removed in June 2009: http://en.wikisource.org/wiki/Category:OCR_Requests http://fr.wikisource.org/wiki/Cat%C3%A9gorie:Demandes_d%27OCR http://pt.wikisource.org/wiki/Categoria:!Pedidos_de_OCR -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

14 years

Experience, funding, outreach

by Lars Aronsson

At the Wikimedia chapter meeting in Berlin last week, Wikisource was mentioned as an interesting project in several different settings. I know a lot of interesting projects and attempts are being tried in various languages of Wikisource, but perhaps there isn't enough coordination and exchange of ideas and experience between all volunteers. How could we improve this? I personally think this mailing list is the first place to start. We could all write short notices of any new idea or project that we are undertaking. Then we should probably get together at a session during the Wikimania conference in Gdansk this summer. The Wikimedia "chapters" are national or regional membership associations that provide means to go beyond the ordinary project volunteer communities, for example when expenses need to be covered for travel or equipment, or when contracts need to be signed. One example is that Wikimedia France recently signed a deal with the Bibliothèque nationale de France to provide access to scanned images of books, that can be proofread in fr.wikisource.org. Deals of this kind fit in with a larger pattern, where chapters seek collaboration with galleries, libraries, archives and museums (= GLAM), hoping that they will contribute free images to Wikimedia Commons and Wikipedia. Or where museums will allow wikipedians to take photos of their collections. Other chapters are buying scanners for volunteers to use. But perhaps digital cameras are more useful than scanners these days. How many know how to use them correctly? Maybe we need workshops. Many of the chapters are now growing fast and are quite successful at fundraising. This creates an interesting challenge to fund projects that really make a difference. The chapters need good projects to fund, that they can show off to donors at the coming fundraiser in the fall 2010/winter 2011. I think Wikisource has a lot of potential for supplying chapters with good projects to fund. -- Lars Aronsson (lars(a)aronsson.se) Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/

14 years

Strategic Planning Office Hours

by Philippe Beaudette

Hi everyone, The next strategic planning office hours are: Tuesday, 6 April, from 20:00-21:00 UTC, which is: -Tuesday (1-2pm PDT) -Tuesday (4-5pm EDT) Office hours will be a great opportunity to discuss the work that's happened as well as the work to come. As always, you can access the chat by going to https://webchat.freenode.net and filling in a username and the channel name (#wikimedia-strategy). You may be prompted to click through a security warning. It's fine. More details at: http://strategy.wikimedia.org/wiki/IRC_office_hours Thanks! Hope to see many of you there. ____________________ Philippe Beaudette Facilitator, Strategy Project Wikimedia Foundation philippe(a)wikimedia.org Imagine a world in which every human being can freely share in the sum of all knowledge. Help us make it a reality! http://wikimediafoundation.org/wiki/Donate

14 years

PDF/Djvu to Index

by Lars Aronsson

It is increasingly common to add books to Wikisource by finding a PDF or Djvu file, uploading it to Commons, and then to create an Index: page on Wikisource for proofreading. But this would be much easier if: 1) The fields (author, title, etc.) of the Index page were filled in from the data already given on Commons. (Yes, those could be wrong or need additional care, but this could always be edited afterwards, if initial values are fetched from Commons.) 2) The <pagelist/> tag was already in the "pages" box. 3) All pages were created in automatically with the OCR text from Commons, instead of leaving a long list of red links. (This would require the text for each page to be extracted, something that pdftotext can do in seconds, but Commons takes weeks to do.) Could this be automated? Is there already some tool or bot that does this? -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

14 years, 1 month

Re: [Wikisource-l] Are the Page Views statistics fair enough ?

by Michel Morin

Ok for this one, then. But many others on other wikisources are, I guess, unfair. Does any admin or developper have the rights to modify this stuff on each domain ? Regards, Syagrius De : Michael Jörgens Envoyés : 15.03.10 18:18 À : discussion list for Wikisource, the free library Objet : Re: [Wikisource-l] Are the Page Views statistics fair enough ? {{externesBild}} is quite understandable. In german language wikisource it is mandatory that the scan of the page is public available and linked, I think it is something like Special:Filepath/XX. A lot of old books a scanned by universities und we link to the pages there, but our main source is commons for the scans- greetings 2010/3/15 <syagrius(a)gmx.fr> Since the wikisource sub-domains are now classified by page views count, I guess it should at least be fair. If we look to the statistics for December 2009, we see that an important part of the English wikisource traffic (http://stats.grok.se/en.s/ http://stats.grok.se/en.s/ ) comes from "Special:AutoLogin", of the Russian one (http://stats.grok.se/ru.s/ http://stats.grok.se/ru.s/ ) from "Special:Filepath/XXpng", of the German one (http://stats.grok.se/de.s/ http://stats.grok.se/de.s/ ) from "{{{EXTERNESBILD}}}" (I don't know what it is) and some "png", and of the Spanish one (http://stats.grok.se/es.s/ http://stats.grok.se/es.s/ ) from "Special:Filepath/XXpng", etc. There was the same problem on French wikisource some months ago, and it was fortunately corrected. Could anyone correct all this on every sub-domain ? Regards, Syagrius _______________________________________________ Wikisource-l mailing list Wikisource-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l https://lists.wikimedia.org/mailman/listinfo/wikisource-l

14 years, 1 month

Which scanner for books?

by Marie-Lan Tay Pamart

Hi everyone, I hope I'm not off topic on this mailing-list. I'm looking for a scanner to scan books for Wikisource. I'm rather confused by the quantity of products on the market, plus many reviews focus on film scanners. Also, I need the scanner to be Mac compatible. Would you have some advance ? Thanks in advance. -- Marie-Lan / Jastrow

14 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l April 2010