My longest bot job on enwiki lasts one week, way less than.... 12
weeks. Processing a few thousand pages on small wikis takes only a few
hours. I don't know any bot job running for a longer time than the
time between two dumps of the considered wiki.
The more time you put between two dumps, the more changes there are,
the longer are usually the bot jobs. It also means that having a dump
let's say every week for small wikis do not add much for bot jobs : if
the job consists in fixing a single type of mistake, chances are that
during one week, only tens of these mistakes would have been
introduced, and the bot job is likely to run really quickly
For bot jobs, I really don't see any advantages in reducing the time
between dumps for small wikis. There is not a lot of activity, meaning
not a lot to do.
Other applications might require fresher updates of small wiki dumps,
but I don't know any bot tasks needing a faster dump rate.
2008/10/12 Bence Damokos <bdamokos(a)gmail.com>om>:
On Sat, Oct 11, 2008 at 6:32 PM, Thomas Dalton
<thomas.dalton(a)gmail.com>wrote;wrote:
2008/10/11 Nicolas Dumazet
<nicdumz(a)gmail.com>om>:
So this increases the frequency of dumps for
small wikis, great.
But this means that the time beetween two dumps of the big wikis is
_at_least_ the sum of the times needed to dump each one of the big
wikis... more than 10, 12 weeks, not counting any failure ? I don't
think that you really want to do this
Exactly. The only way you can speed up the smaller dumps is the slow
down the bigger ones (or throw more money at the problem), and no-one
has given any reason why we should prioritise smaller dumps.
Processing a huge wiki for the bot owners, etc. takes a longer, not having a
fresh dump so often would not be felt until the jobs run on the previous are
complete.
On the other hand, jobs on smaller or medium complete far faster, so the bot
owners would be idle for much more time, than a bot owner working on a
larger wiki.
Downloading the whole Wikipedia article by article after a certain size
becomes too slow to be a comfortable option (it costs and wastes bandwith;
and more importantly the valuable time of the editor overseeing the given
bot downloading, analysing articles to do nothing until it finds its target
[as opposed to finding all the targets fast, and then working on just
them]).
Bence Damokos
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l