Hello again!
Ok, we're actually going to do this this time. As far as we know, people who need
access to private webrequest data have migrated their stuff over to stat1002.eqiad.wmnet.
The private webrequest data that currently exists on stat1 will soon be deleted.
Soon is August 7th. That's in 1 week. We announced this back in May, so there
should have been plenty of notice. If you are still using the webrequest logs in
/a/squid/archive on stat1, find me on IRC (ottomata) or email me and we can work together
to make sure you can continue to do your work on stat1002.
On Wednesday August 7th, we will be removing private webrequest logs from stat1.
Thanks all!
-Andrew Otto
On May 20, 2013, at 2:13 PM, Andrew Otto <otto(a)wikimedia.org> wrote:
>
"Before that happens, you should make sure that any personal stuff on stat1 that you
need for number crunching is copied over to stat1002. "
from your note it looks like this is only related to webrequest data, is that correct?
Yup! That is correct. stat1002 will be primarily used as a sensitive private data host.
Only those users that have personal unpuppetized code and cronjobs that use this data
need to worry about moving them from stat1 to stat1002.
what are the criteria for deciding who has access
to stat1002? I see that contractors like Aaron Halfaker or Jonathan Morgan currently
don't have access to it.
The criteria will be the same as before: RT request + manager approval. However, the
request should only be made if the user actually needs access to the webrequest logs to do
analysis. For example, if the main reason someone already has access to stat1 is so that
they can access the research slave databases, then they won't need access to
stat1002.
can you give us more information on the long-term
plans/scope of stat1 vs stat1002 (and update
https://office.wikimedia.org/wiki/Data_access
as needed)?
I've added a small bit about stat1002 on that page.
I don't know much about a long term plan for stat1. It is hosted at the Tampa
datacenter, and in the long term (yearish?) all the machines there will have be be
decommissioned or relocated elsewhere. When it finally does move, it will most likely no
longer have a public IP. stat1 is intended to be used as a workspace for analysts to do
their thing on non-private data.
-Ao