Je Mardo 20 Majo 2003 03:53, Alfio Puglisi skribis:
I just subscribed (I'm the wikipedia user
At18) to ask about the
automatic html dump function. ...
> If anyone is interested, I have a rudimental Perl script that is
> capable of reading the downloadable SQL dump and output all the
> articles as separate files in a number of alphabetical directories.
> It's not very fast, but it works.
That's great!
> What's missing from the script: wikimarkup
-> HTML conversion,
You should be able to call the existing PHP code that generates
HTML to do this.
A tool that generated the entire Wikipedia, in static HTML format,
would make it trivial to generate the "Plucker" format
for Palm PDAs. Plucker is offline web browser for Palm PDAs;
it's open source software/Free Software (OSS/FS) released under the
GPL.
It can handle HTML, as well as PNG, GIF, JPEG, txt, and a few others;
HTML is usually rendered as you'd expect (hypertext, italics, bold,
font size changes, lists, indenting all work).
It'd be very nice if the Wikipedia were available in Plucker format;
that would mean that an OSS/FS reader could be used to view the text
on a Palm PDA.
Plucker is available at: "http://www.plkr.org".
I have a Palm, and it is the MOST important program I use by far.
One minor problem is that Plucker doesn't have an index
facility. That could be solved by creating HTML pages that link to
sorted articles, e.g., "Master Index" could list "A, B, C...";
clicking on "A" would reach "Index A" which would list "AA, AB,
AC...".
Then, modify the static version of the main main page so you
could quickly jump to the master index.
Internally, Plucker will break long pages (>32K) into multiple
pages with front and back link - but that'll be automatic and
won't affect anything.
I don't know of an automatic way to download the Wikipedia
images (which, in my mind, is a serious problem). Hopefully there
will soon be a way to download the images other than trawling.
However, for a Palm you'd have to drop the images in general anyway,
so for that particular use it wouldn't matter.