[Licom-l] SecurePoll Tallying Post-Mortem

Sun May 17 22:54:52 UTC 2009

Tim & Brion,

The Licensing Update Vote is over and tallied.  However, I want to
note for the record that in order to do the tallying we had to abandon
Mediawiki and reimplement the decryption routine as a Python script.
(This is also why the tally procedure that I expected to take 15
minutes ended up being delayed more than a day.)

The core problem is that the SecurePoll implementation does not scale
well.  It worked fine in testing with 15 votes, and massively failed
with 17000.

The vote "dump" with 17000 votes was 30 MB (or ~1.8 kB per vote).
Given that the actual content of each vote was basically an integer,
this is a massive size increase associated with the encryption.  This
led to problems both downloading the dump from SPI and uploading it
the wiki being used to decrypt it.  For downloading, browsers saw
repeated dropped connections.  I assume this is because asking PHP to
generate and send a 30 MB file was timing out the process.  I
eventually managed to get the data with wget on a fast University
connection, but I had almost given up before finally getting that to
work.  Uploading had the same issues, and in that case I did give up.
I uploaded the file by hand and patched Mediawiki to look for it as a
local file rather than using the upload file dialog.

The next major issue was the discovery that gpg is way too slow to be
used this way.  The current tally process requires one call to gpg per
vote.  I timed the gpg decryption routine as taking 0.144 s per call,
which translates to 40 minutes to decrypt 17000 votes.  (This is
clearly a function of key complexity, a similar test with a shorter
key gave 0.035 s per call, though 10 minutes for 17000 calls would
still be a problem.)

Once I realized how slow gpg was, I also realized that the web based
interface to tally votes was never going to work.  It was at this
point that Mediawiki was abandoned and the decryption routine was
rewritten as a script in 20 or 30 lines of Python.  Running a custom
script was good enough to tally the vote this time, but clearly isn't
ready to work in the generic case.

These failures to scale are basically a problem with how SecurePoll
was designed.  The interface for running and managing the poll seemed
to work pretty well, but the backend for actually determining the
result does not function at the scale that WMF needs.

I plan to eventually file a number of bugs / feature requests
associated with our SecurePoll experience, but I wanted to go ahead
and let people know now that the tallying functions built into
SecurePoll aren't effective at the large scale.

-Robert Rohde