Moin,
it occured to me recently that there is a file leak in any extension that
creates external files, like graphviz, graph, or possible even math
(creating PNGs).
It works like that:
Article A contains one <graph> object, called AA
Article B contains two <graph> objects, called BA and BA
When you edit article A, the following happens:
* the new graph code is sent to the extension
* it is hashed
* the hash is the filename that will be used to generate the file,
lets call it "ABCD" for now
* file AB/CD/ABCD is generated, and included in the output
(The hashes are done for two reasons: to save a file if two articles
contain the same text, and to convienently generate short, unique file
names)
Likewise for article B, except that BCDE and BCDF are the hashes, so we
get BC/DE/BCDE and BC/DE/BCDEF as files.
No problem so far, but now what happens if you edit file A again and
change something: A new hash results, like ABXY. This results in the file
AB/XY/ABXY generated.
Note that the file ABCD was never cleaned off. In fact, it is impossible
for the current scheme to clean it up for the following reasons:
* ABCDE could be as well used by page B, since only the content go into
the hash, not the article name. Deleting the file should only be done if
it is not used from any other article. (if the file ever vanishes, a
null-edit is nec. to re-generate it!)
* the extension doesn't even get to know the old text, or the filenames
used on the page, so it can't simple know which file to potentional to
delete
The end effect is that the file cache gets bigger and bigger, and there is
no easy way to clean ununsed files out of it.
Here are a few ideas how to deal with that:
* peridically clean off al files until you are left with X files. (there
is at least on extension already doing this). This does not work, since
the deletion cannot guarantue that the files left over are really used,
and the files delete are no longer used. It's an ugly hack and creates
more problems than it solves.
* Somehow we could track of all filenames used on all articles. Just
think of article B;
first edit creates two entries in the table under "B"
second edit:
* first time extensio runs, it cleans table "B", and adds
new hash
* second run cleans table again, and adds a new HASH
The problem here is that the exention cannot decide which text to
convert is the first on the page (and thus when to clean the table)
* Various other schemes that gen. the hash based on the article name plus
per-article unique ID (potentially given by the user creating the text
ala <graph id="1">). These also require somehow a real big table
listing
which files are all used.
The last idea I had are data-urls. These allow emebedding the content
inline, instead linked via a file:
http://en.wikipedia.org/wiki/Data:_URL
This would work beautyfull, except for a few bits:
* we would lose the savings that if article A and B contain the same text,
it would no embedded twice.
* the data is in mysql, not on the file system
* it is not supported by IE at all (bummer :-(
* Opera apparently only supports these up to 4K,which is way to little for
being practically usefull :-(
Anyway, the problem needs to be solved, even my testwikie which contains
only 3 SVG graphs already accumulated thousand little files in
images/graph due to the many edits done on these three articles.
Best wishes,
Tels
--
Signed on Fri Apr 7 10:53:56 2006 with key 0x93B84C15.
Visit my photo gallery at
http://bloodgate.com/photos/
PGP key on
http://bloodgate.com/tels.asc or per email.
"Call me Justin, Justin Case."