Hi Tisza.
thanks a lot for your answer. A Chromium based solution is certainly one
of the best you can get. Its cheap in computational resources and
updates should be available for a long time. Sorry for creating
unnecessary work for you. I just figured out from the following link
that the new renderer was based on mwlib and reportlab. But that dates
back to April 2018 and was last updated in August 2018 and obviously
this information is outdated now.
https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality
also these two pages seem to contain the same outdated information.
https://www.reportlab.com/opensource/
https://www.reportlab.com/casestudies/wikipedia/
Yours Dirk
On 3/17/19 8:14 PM, Tisza Gergő wrote:
There are two different PDF renderer tools: the single
page PDF
renderer ("Download as PDF" link in the sidebar, via the
ElectronPdfService extension [1]) and the article collection renderer
("Create a book" link, via the Collection extension [2]).
The single page renderer is today served by a tool called Electron
[3]; it's in the process of being replaced by a new tool called Proton
[4]. These are both node.js services which manage headless Chromium
instances - which means the actual rendering engine will stay the
same, so no user-facing changes are expected. The switch is for
operational reasons: Electron crashes periodically, and has been
written before the Chromium project provided an official library for
remote-controlling headless browsers, so it didn't take advantage of
that. Proton is currently getting mirrored traffic (ie. it is deployed
in production for testing purposes, and both it and Electron render
the PDF files requested by users, but only the one from Electron is
returned).
The collection renderer used to be served by a tool called OCG [5],
which has been decommissioned about a year ago. It also functions as a
frontend to PediaPress [6], who create print-on-demand books of
Wikipedia content. They use mwlib internally (and are the main
developers of it). I believe they plan to provide PDF download
functionality eventually.
So in short, the WMF is not involved with mwlib development, you
should probably contact PediaPress (see [7]) if you have questions
about that. The PDF renderer project at the WMF is not related to
mwlib and not affected by the Python 2 life cycle.
[1]
https://www.mediawiki.org/wiki/Extension:ElectronPdfService
[2]
https://www.mediawiki.org/wiki/Extension:Collection
[3]
https://www.mediawiki.org/wiki/Electron
[4]
https://www.mediawiki.org/wiki/Proton
[5]
https://www.mediawiki.org/wiki/Offline_content_generator
[6]
https://meta.wikimedia.org/wiki/Book_tool/Help/Books/Frequently_Asked_Quest…
[7]
https://pediapress.com/code/