Re: [Wikitech-l] Re: Phase IV

26 Jul 2003

Timwi-
...
  Oh, *that* kind of profiling. I have done that with
C++ applications
 under Windows, but I never thought it'd be necessary for webserver
 applications... Clearly, at least our *current* problem - and also the
 current problem of LiveJournal (sorry I keep mentioning LiveJournal, but
 it's the first and only other major website I've major contributed at) -
 is database performance, not CPU usage. 
Well, that's not necessarily true. CPU usage on Larousse (the webserver  
for En:) has been very high, and our page parser is very ugly and slow.  
Clearly we need optimizations on both fronts.

Please do take a look at the Wookee parser module that Tarquin pointed to.  
It uses a syntax very similar to Wikipedia's.
http://wiki.beyondunreal.com/wiki/Wookee

...
  Yes, point taken, but I don't expect the switch to
happen soon :-) In
 fact, I don't really /expect/ the switch to happen at all, I regard it
 all as an experiment. Even if Wikipedia won't ever use my software, it
 was still a lot of fun and very educational to write it. 
Sure. And if it's a decent wiki, you can add it to
http://c2.com/cgi/wiki?WikiEngines
and hope that people use it for other purposes.

...
 >> I think it's better to store them in a
database, possibly one
>> separated from the rest. 
...
 > I never understood this approach. It only seems to
be associated with
> increased risks (reduced performance, increased risk of data corruption)
> and have no benefits. What exactly is the disadvantage of just storing a
> pointer to the local filesystem in the DB? 
...
  I suppose you're thinking there's an increased
risk of data corruption
 because the database is all one file, and you're thinking if one bit of
 the file gets corrupted, it's all gone. 
Not necessarily. But if the database file is corrupted, the database may  
no longer process the table correctly, in which case the binary data would  
have to be hand-extracted to rescue it. If you see the database as another  
layer on top of the file system, and argue that the same risks apply to  
the DB as to the filesystem, then you have doubled your risks by adding a  
DB layer. You would triple them by adding a database within the database,  
and so on. (We actually have done this with the Wikipedia table structure,  
where user properties are a kind of CSV table within the database. Really  
ugly.)

...
  The "reduced performance" argument is
something I don't know anything
 about; it is possible that this is a good argument, but I'm not convinced. 
Well, then test it. Throw some 10 megabyte files into the database and  
compare the reading performance with multiple threads to direct Apache  
server access.

...
  Now to your question, "What exactly is the
disadvantage of just storing
 a pointer to the local filesystem in the DB?" - The disadvantage is that
 it is more difficult to maintain a consistent state, i.e. a database
 without "dead links" and a file system free of orphans. 
Sure, but given that you only have to deal with create, move, delete  
implementation complexity is minimal, and the associated risks should be  
low. In addition, I *like* being able to just zip the entire image  
directory instead of having to extract the files from a MySQL table.

...
  Hm. I haven't even thought of this. Thanks for
pointing it out soon
 enough. :) I'll try to make things as object-oriented as possible. That
 will add to the educational nature of my experience, because - although
 I've done extensive object-oriented programming before - I've never done
 that in Perl (though I've read code that uses it). Should be fun :) 
Perl-OOP is a bit ugly, but reasonably powerful. Check out perldoc  
perltoot for details. You'll want to look at "tie" especially, as this  
allows you to do some cool stuff with properties.

Regards,

Erik

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Phase IV