@Lucas: cidr-trie was exactly what I wanted; thank you!
@Strainu: In our use case (fawiki), the number of distinct IPs that
make an edit in each day is not too large (usually a few hundred).
Therefore, the memory intensity of using a CIDR-trie is minimal.
The code at [1] has been updated to include caching in the way that I
had desired.
Thanks again,
Huji
[1]
https://github.com/PersianWikipedia/fawikibot/blob/master/HujiBot/findproxy…
-----------
I'm very curious if you can run at Wikipedia scale with such a trie in
memory on a normal computer (e.g. with only tens of GiB of memory). Please
let us know if you actually get this into production (or just submit the
script for inclusion in the framework, it sounds really useful)
Strainu
Pe vineri, 12 iulie 2019, Lucas Werkmeister <mail at
lucaswerkmeister.de
<https://lists.wikimedia.org/mailman/listinfo/pywikibot>> a
scris:
* You probably want to use a trie
<https://en.wikipedia.org/wiki/Trie <https://en.wikipedia.org/wiki/Trie>> for
*>* this – I found several available Python implementations, but I don’t know
*>* what their advantages or disadvantages are, so I’ll just list them in
*>* alphabetical order:
*>>* - cidr-tree <https://github.com/Figglewatts/cidr-trie
<https://github.com/Figglewatts/cidr-trie>>
*>* - py-radix <https://github.com/Figglewatts/cidr-trie
<https://github.com/Figglewatts/cidr-trie>>
*>* - pysubnettree <https://github.com/zeek/pysubnettree
<https://github.com/zeek/pysubnettree>>
*>* - pytricia <https://github.com/jsommers/pytricia
<https://github.com/jsommers/pytricia>>
*>>* Cheers,
*>* Lucas
*>* On 12.07.19 04:43, Huji Lee wrote:
*>