Timwi-
Oh, *that* kind of profiling. I have done that with
C++ applications
under Windows, but I never thought it'd be necessary for webserver
applications... Clearly, at least our *current* problem - and also the
current problem of LiveJournal (sorry I keep mentioning LiveJournal, but
it's the first and only other major website I've major contributed at) -
is database performance, not CPU usage.
Well, that's not necessarily true. CPU usage on Larousse (the webserver
for En:) has been very high, and our page parser is very ugly and slow.
Clearly we need optimizations on both fronts.
Please do take a look at the Wookee parser module that Tarquin pointed to.
It uses a syntax very similar to Wikipedia's.
http://wiki.beyondunreal.com/wiki/Wookee
Yes, point taken, but I don't expect the switch to
happen soon :-) In
fact, I don't really /expect/ the switch to happen at all, I regard it
all as an experiment. Even if Wikipedia won't ever use my software, it
was still a lot of fun and very educational to write it.
Sure. And if it's a decent wiki, you can add it to
http://c2.com/cgi/wiki?WikiEngines
and hope that people use it for other purposes.
>> I think it's better to store them in a
database, possibly one
>> separated from the rest.
> I never understood this approach. It only seems to
be associated with
> increased risks (reduced performance, increased risk of data corruption)
> and have no benefits. What exactly is the disadvantage of just storing a
> pointer to the local filesystem in the DB?
I suppose you're thinking there's an increased
risk of data corruption
because the database is all one file, and you're thinking if one bit of
the file gets corrupted, it's all gone.
Not necessarily. But if the database file is corrupted, the database may
no longer process the table correctly, in which case the binary data would
have to be hand-extracted to rescue it. If you see the database as another
layer on top of the file system, and argue that the same risks apply to
the DB as to the filesystem, then you have doubled your risks by adding a
DB layer. You would triple them by adding a database within the database,
and so on. (We actually have done this with the Wikipedia table structure,
where user properties are a kind of CSV table within the database. Really
ugly.)
The "reduced performance" argument is
something I don't know anything
about; it is possible that this is a good argument, but I'm not convinced.
Well, then test it. Throw some 10 megabyte files into the database and
compare the reading performance with multiple threads to direct Apache
server access.
Now to your question, "What exactly is the
disadvantage of just storing
a pointer to the local filesystem in the DB?" - The disadvantage is that
it is more difficult to maintain a consistent state, i.e. a database
without "dead links" and a file system free of orphans.
Sure, but given that you only have to deal with create, move, delete
implementation complexity is minimal, and the associated risks should be
low. In addition, I *like* being able to just zip the entire image
directory instead of having to extract the files from a MySQL table.
Hm. I haven't even thought of this. Thanks for
pointing it out soon
enough. :) I'll try to make things as object-oriented as possible. That
will add to the educational nature of my experience, because - although
I've done extensive object-oriented programming before - I've never done
that in Perl (though I've read code that uses it). Should be fun :)
Perl-OOP is a bit ugly, but reasonably powerful. Check out perldoc
perltoot for details. You'll want to look at "tie" especially, as this
allows you to do some cool stuff with properties.
Regards,
Erik