Ya, that's why I was thinking of a job queue source rather than trying
to come up with some hack job on how to integrate jobs into a shared db.
So that job sources that make use of message queueing systems can be
used just by programming a job source that uses them instead of the db.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
I'd recommend making use of existing message
queuing systems such as
ActiveMQ which already provide infrastructure for distributing messages to
multiple clients, redelivering after a failed attempt, etc. We've had pretty
good luck with this for StatusNet, where we run a lot of processing on
individual messages through background queues to keep the frontend
responsive.
One difficulty is that we don't have a good system for handling data for
multiple sites in one process, so it may need an intermediate process to
spawn out children for actual processing.
I think Tim did a little experimental work with GearMan; did that pan out?
-- brion
On Nov 15, 2010 4:27 AM, "Daniel Friesen"<lists(a)nadir-seen-fire.com>
wrote:
There was a thought about the job queue that
popped into my mind today.
From what I understand, for a Wiki Farm, in order to use runJobs.php
instead of using the in-request queue (which on high traffic sites is
less desireable) the Wiki Farm has to run runJobs.php periodically for
each and every wiki on the farm.
So, for example. If a Wiki Farm has 10,000 wiki it's hosting, say the
Wiki Host really wants to ensure that the queue is run at least hourly
to keep the data on the wiki reasonably up to date, the wiki farm
essentially needs to call runJobs.php 10,000 times an hour (ie: one time
for each individual wiki), irrelevantly of whether a wiki has jobs or
not. Either that or poll each database before hand, which in itself is
10,000 database calls an hour plus the runJobs execution which still
isn't that desireable.
What do people think of having another source class for the job queue
like we have for file storage, text storage, etc...
The idea being that Wiki Farms would have the ability to implement a new
Job Queue source which instead derives jobs from a single shared
database with the same structure as the normal job queue, but with a
farm specific wiki id inside the table as well.
Using this method a Wiki Farm would be able to set up a cron job (or
perhaps a daemon to be even more effective at dispatching the job queue
runs) which instead of making 10,000 calls to runJobs outright, it would
fetch a random job row from the shared job queue table, look at the wiki
id inside the row and execute a runJobs (perhaps with a limit=1000)
script for that wiki to dispatch the queue and run some jobs for that
wiki. It would of course continue looking at random jobs from the shared
table and dispatching more runJobs executions serving the role of trying
to keep the job queues running for all wiki on the farm, but without
making wasteful runJobs calls for a pile of wikis which have no jobs to
run.
> Any comments?
>
> --
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>