Text layer is stored in img_metadata, which means it can be retrieved
by the API (using ?action=query&prop=imageinfo&iiprop=metadata).
However when I tried to test this, it didn't seem to work. Maybe
trying to return the entire text layer hit some max api result size
limit or something. (It'd be really nice if we had some nicer place to
store information about files, especially for huge things like the
text layer which we don't generally want to load the entire thing all
the time. There's a bug about that somewhere in bugzilla land).
Indirect mode (From what I can find out from google) is when you have
an index djvu file that has links to all the pages making up the djvu
file, so you can start viewing immediately and pages are only
downloaded as needed. I'm not sure how such a format would work in
terms of uploading it. Unless we convert it on the server side, how
would we upload all the constitutiant files (I suppose we could tell
people to upload tarballs. Then we have to make sure to validate the
contents, and communicate to people that the tarball is only for
uploaded djvu files). [Of course until 5 minutes ago I'd never heard
of an indirect djvu file, so I could be misunderstanding]
-bawolff
I use a lot djvuLibre library on my pc, both from console and from python
scripts; so I can tell you that it will be very simple to convert a
"bundled" djvu file into an "indirect" file. Obviously this should be
transparent for uploader, being a server fully automatic job.
About text layer: it's very, very interesting even if complex. There are
command-line DjvuLibre routines to do anything you want, both to read and
to edit it. What we get is simply the most banal output (full text); from
any IA djvu file you can get much more, t.i. gerarchic text structure (al
page, column, region paragraph, line, and single word detail) with
coordinates of any element at any detail level; but you can get/insert too
structured metadata, both as "global metadata" and page-specific metadata.
Any djvu extraction/editing function runs both on bundled and on indirect
djvu file, and obviosuly any read/edit is much faster when a small,
single-page file is addressed.
Coordinates of text elements and gerarchic structure of text are extremely
interesting, since such set of data could be used to "guess formatting": ie
you could "guess" centered text, tables, sections alignment,
headers/footers, poems, paragraphs, and font-sizes too. Inter line spacing
could be used to "guess" chapter titles. "Empty text areas" are often
simply areas covered by illustrations, so that an intelligent algorithm
could guess their size and position.
I imagine that thumbnail generation/purging too would be much more
effective and fast.
In brief, we have a Ferrari but are using it with a speed limit of 10
miles/hour. :-)
Alex