Hello,
I did some investigation on how to compile MediaWiki to LaTeX. In this
Email I will discuss only the problems caused by the fact that MediaWiki
uses Unicode and how to use Unicode with LaTeX.
1) At first Unicode uses the same codepoint for different glyphs in
Chinese Japanse Korean. In Wikipedia there are special templates to work
around this problem, but there are many cases where these templates are
not used, so this causes essentially an unsolvable problem. In LaTeX you
got all needed glyphs available but if you just got the codepoint you
cannot know which one to chose.
2) There are currently three good LaTeX compilers. I think it is hard to
chose one, because each of them has got a significant disadvantage. One
point to understand here is microtype. It is about applying tiny changes
to glyphs to get better margins and better line breaking, which is
something very often done in professionally printed books, but something
only the pdflatex and lualatex compilers can do. The remaining xelatex
compiler can't do it. pdflatex can basically not really do unicode. I
made it do unicode by hacking the cjk package, but this requieres a
special hacked font, which legal under gpl, but it is still a hack and
will surely never make it into debain. I had a long discussion with the
developer of the CJK package, and essentially we didn't find any way to
make pdflatex do unicode in a way acceptable by Debian. The remaining
compiler is lualatex. This does not allow the change of fonts in the
current version of Ubuntu. But it does so in the current testing version
of Debian. But here is consumes a little bit more than one GByte of RAM
when changing fonts, which is also reported by other users and does not
seem to be a memory leak.
So what choises are there:
1) A wired Hack -> pdflatex
2) No microtype -> xelatex
3) 1GByte Memory Consumption and debian testing -> lualatex
If you can decide for one of these options, I will work towards an
offical debian package doing that. I personally prefer lualatex.
Yours Dirk
Show replies by date