On Fri, Sep 2, 2011 at 5:47 PM, Daniel Friesen
<lists(a)nadir-seen-fire.com> wrote:
On 11-09-02 05:20 PM, Asher Feldman wrote:
> When using for analysis, will we wish the new columns had partial indexes
> (first 6 characters?)
Bug 2939 is one relevant bug to this, it could
probably use an index.
[1]
https://bugzilla.wikimedia.org/show_bug.cgi?id=2939
I generally suspect that a standard index is going to be a waste for
the most urgent uses of this. It will rarely be interesting to search
for common hashes between articles. The far more common case will be
to search for duplicate hashes within the history of a single article.
My understanding is that having a normal index on a table the size of
our revision table will be far too expensive for db writes.
This is the first I've heard of partial indexes (/me researches) I
don't know if a partial index is going to be cheap enough that we can
use it, and useful enough that we'd want to. Would this be a faster
query in a world with a partial index on the first six characters?
SELECT rev_id FROM revision WHERE rev_page=12345 AND
rev_sha1='4cdbd80be15fcfff139fb8a95f2ca359520939ee'
...or would we have to run a query like this to get the benefit of the index?
SELECT rev_id FROM revision WHERE rev_page=12345 AND rev_sha1 like '4cdbd8%'
...and would either of these queries be considered too expensive to
run without a partial index? How about with a partial index?
Rob