Hi everyone,
I have a list of about 1.8 million images which I have to download from commons.wikimedia.org. Is there any simple way to do this which doesn't involve an individual HTTP hit for each image?
Many thanks in advance.
Mihai
TL;DR: A few ideas follow on how we could possibly help legit editors
contribute from behind Tor proxies. I am just conversant enough with
the security problems to make unworkable suggestions ;-), so please
correct me, critique & suggest solutions, and perhaps volunteer to help.
The current situation:
https://en.wikipedia.org/wiki/Wikipedia:Advice_to_users_using_Tor_to_bypass…
We generally don't let anyone edit or upload from behind Tor; the
TorBlock extension stops them. One exception: a person can create an
account, accumulate lots of good edits, and then ask for an IP block
exemption, and then use that account to edit from behind Tor. This is
unappealing because then there's still a bunch of in-the-clear editing
that has to happen first, and because then site functionaries know that
the account is going to be making controversial edits (and could
possibly connect it to IPs in the future, right?). And right now
there's no way to truly *anonymously* contribute from behind Tor
proxies; you have to log in. However, since JavaScript delivery is hard
for Tor users, I'm not sure how much editing from Tor -- vandalism or
legit -- is actually happening. (I hope for analytics on this and thus
added it to https://www.mediawiki.org/wiki/Analytics/Dreams .) We know
at least that there are legitimate editors who would prefer to use Tor
and can't.
People have been talking about how to improve the situation for some
time -- see http://cryptome.info/wiki-no-tor.htm and
https://lists.torproject.org/pipermail/tor-dev/2012-October/004116.html
. It'd be nice if it could actually move forward.
I've floated this problem past Tor and privacy people, and here are a
few ideas:
1) Just use the existing mechanisms more leniently. Encourage the
communities (Wikimedia & Tor) to use
https://en.wikipedia.org/wiki/Wikipedia:Request_an_account (to get an
account from behind Tor) and to let more people get IP block exemptions
even before they've made any edits (< 30 people have gotten exemptions
on en.wp in 2012). Add encouraging "get an exempt account" language to
the "you're blocked because you're using Tor" messaging. Then if
there's an uptick in vandalism from Tor then they can just tighten up again.
2) Encourage people with closed proxies to re-vitalize
https://en.wikipedia.org/wiki/Wikipedia:WOCP . Problem: using closed
proxies is okay for people with some threat models but not others.
3) Look at Nymble - http://freehaven.net/anonbib/#oakland11-formalizing
and http://cgi.soic.indiana.edu/~kapadia/nymble/overview.php . It would
allow Wikimedia to distance itself from knowing people's identities, but
still allow admins to revoke permissions if people acted up. The user
shows a real identity, gets a token, and exchanges that token over tor
for an account. If the user abuses the site, Wikimedia site admins can
blacklist the user without ever being able to learn who they were or
what other edits they did. More: https://cs.uwaterloo.ca/~iang/ Ian
Golberg's, Nick Hopper's, and Apu Kapadia's groups are all working on
Nymble or its derivatives. It's not ready for production yet, I bet,
but if someone wanted a Big Project....
3a) A token authorization system (perhaps a MediaWiki extension) where
the server blindly signs a token, and then the user can use that token
to bypass the Tor blocks. (Tyler mentioned he saw this somewhere in a
Bugzilla suggestion; I haven't found it.)
4) Allow more users the IP block exemption, possibly even automatically
after a certain number of unreverted edits, but with some kind of
FlaggedRevs integration; Tor users can edit but their changes have to be
reviewed before going live. We could combine this with (3); Nymble
administrators or token-issuers could pledge to review edits coming from
Tor. But that latter idea sounds like a lot of social infrastructure to
set up and maintain.
Thoughts? Are any of you interested in working on this problem? #tor on
the OFTC IRC server is full of people who'd be interested in talking
about this.
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation
I've been accepted to Hacker School <https://www.hackerschool.com>, a
writers' retreat for programmers in New York City. I will therefore be
taking an unpaid personal leave of absence from the Wikimedia Foundation
via our sabbatical program. My last workday before my leave will be
Friday, September 27. I plan to be on leave all of October, November,
and December, returning to WMF in January.
During my absence, Quim Gil will be the temporary head of the
Engineering Community Team. Thank you, Quim! I'll spend much of
September turning over responsibilities to him. Over the next month I'll
be saying no to a lot of requests so I can ensure I take care of all my
commitments by September 27th, when I'll be turning off my wikimedia.org
email.
If there's anything else I can do to minimize inconvenience, please let
me know. And -- I have to say this -- oh my gosh I'm so excited to be
going to Hacker School in just a month! Going from "advanced beginner"
to confident programmer! Learning face-to-face with other coders, 30-45%
of them women, all teaching each other! Thank you, WMF, for the
sabbatical program, and thanks to my team for supporting me on this. I
couldn't do this without you.
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation
Hello and here's your latest edition of the deployment highlights email!
Full schedule here:
https://wikitech.wikimedia.org/wiki/Deployments#Week_of_September_30th
== Monday ==
* VisualEditor will be enabled by default for Logged-in users on the
following wikis:
Bulgarian (bg), Catalan (ca), Cebuano (ceb), Czech (cs), Danish (da),
Estonian (et), Basque (eu), Finnish (fi), Galician (gl), Croatian
(hr), Hungarian (hu), Indonesian (id), Latvian (lv), Malay (ms),
Norwegian - Nynorsk (nn), Norwegian - Bokmål (no), Simple English
(simple), Slovak (sk), Slovenian (sl), Turkish (tr), Ukrainian (uk),
Volapük (vo), Waray-Waray (war), Modern Greek (el), Neopolitan (nap),
Venetian (vec), Sicilian (scn)
See: https://www.mediawiki.org/wiki/VisualEditor#Timeline
* MediaWiki 1.22wmf19 will deployed to all non-Wikipedia wikis (eg:
Commons, Wikitionary, Wikisource, etc)
== Wednesday ==
* MobileFrontEnd: Of note: the Mobile Team will be transitioning to
simply 'riding the MW Core train' soon. This means there will not be a
separate Mobile Team deploy window each week but instead all of their
changes will ride along with the MediaWiki 1.22wmfXX version update.
== Thursday ==
* MediaWiki 1.22wmf19 to group2 (all Wikipedias)
* MediaWiki 1.22wmf20 to group0
(test/test2/testwikidata/loginwiki/mediawiki/)
* Growth Team deploy window consisting of:
https://etherpad.wikimedia.org/p/Growth_Deployment_2013-10-03
Let me know if you have any questions,
Greg
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |
Hello everyone!
Google Summer of Code 2013 came to an official end today with final reviews
and code submissions.
I wrote a blog post summarizing the summer internship, the deliverables,
and the thank-yous:
http://moriel.smarterthanthat.com/tips/google-summer-of-code-2013-summary/
I'd like to thank my great mentors, Amir Aharoni and Inez Korczyński and
the whole of the VisualEditor team!
Thank you all who were involved, and everyone who kept me company in
late-late-late-night documentation-hunting. This summer has been a blast!
Moriel
(aka mooeypoo)
--
No trees were harmed in the creation of this post.
But billions of electrons, photons, and electromagnetic waves were terribly
inconvenienced during its transmission!
The granularity of my project is down to an open number (they will
probably span on multiple servers) of small groups (mainly of one
individual) of protected, *wiki pages* of which the presentation may
be associated under various thematic gateways.
I am interested in community advices concerning:
- the existence of similar deployments I could study.
- the most appropriate solution between MySQL, Postgres and SQLite.
Also to know if NoSQL ports have been investigated?
- the most appropriate extension for individual/small group page
protection at author level (no fancy cross protection scheme).
- the best way to associate the page skin, logo and footers link to
the access gateway.
Deep thanks
jfc
Sumana Harihareswara recently provided me assistance with some code I wanted to write and invited me to join the wikitech mailing list, and she also suggested I share my response to her with the rest of the list subscribers, which I have reproduced (with some mild alterations) below:
Thanks for the mailing list link, Sumana, just submitted a subscription request.
As for those links you provided, I have bookmarked them and will be
studying them extensively, your assistance is greatly appreciated :)
As for my wiki (https://mediawikitesters.orain.org/wiki/Main_Page), I like to test gadgets, user scripts, and extensions,
and the wiki is basically for me and anyone else who is interested to
display useful code, test it in a live environment, and hopefully swap
notes with each other on how to improve it it.
The wiki is still rather new, but anyone interested is welcome.
As for the gadgets section of my wiki, I find gadgets to be
incredibly useful and am trying to compile as many gadgets as I can that
can be used by anyone with MediaWiki that I find universally useful and
adaptable to any wiki, as I prefer code that
works out of the box and is compatible as possible, especially with the
more recent versions of MediaWiki, and my goal is to make it available
to anyone for any wiki.
Most are from WMF projects or have been adapted from them to be useful on any wiki.
So in the interest of keeping our branches from expanding forever I'm
thinking we should
stop creating new branches for each deploy cycle.
Instead, I'm thinking we should keep like three wmf branches. Let's call
them wmf-foo,
wmf-bar and wmf-baz for purposes of this e-mail, we can bikeshed later.
We'd basically
be having the two active branches we have now, plus the previous branch we
deployed.
When we start a new cycle, the "old" branch becomes the branch new branch,
merging
everything from master like we do when making a new branch. In practice
this would
map out to the following:
wmf-foo -> 1.22wmf19
wmf-bar -> 1.22wmf20
wmf-baz -> 1.22wmf21
wmf-foo -> 1.22wmf22
wmf-bar -> 1.22wmf23
And so on and so forth. When creating the new branch we can tag the old one
in the
same wmf/1.22wmf29 format so it's there for posterity. We could delete all
the old
branches and turn them into tags.
On the deployment side this is nice because we can just consistently have 3
branches
live on the cluster rather than lingering old ones. Switching would be a
matter of checking
out and scapping rather than cloning and scapping.
Too confusing? Alternative ideas? I'm open to most anything that will stop
us making
new branches all the time :)
-Chad