Hi,
just a quick heads up that due to database issues, geowiki currently
cannot update daily with new data.
So pages with daily active editor counts like
http://gp.wmflabs.org/graphs/active_editors_totalhttp://gp.wmflabs.org/graphs/enwiki_editor_countshttp://gp.wmflabs.org/graphs/frwiki_editor_countshttp://gp.wmflabs.org/graphs/eowiki_editor_counts
[...]
and the private per country breakdowns at
https://stats.wikimedia.org/geowiki-private/
will not see updates until the issue is resolved.
Older data is not affected by the issue. So data up to May 1st is good
to use (with the usual geowiki caveats).
Best regards,
Christian
P.S.: The root issue is not severe, and I guess it can be fixed in the
next couple of days.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi,
people from gerrit's “Analytics” group [1] currently hold
* Push (including Force Push)
* Push Merge Commit
* Forge Author Identiy
* Forge Committer Identity
permissions on “analytics/*” projects in gerrit. But those permissions
got and get in the way one way or the other.
Do we need those permissions for our repos?
If no one objects, I'll start removing them on 2014-04-28.
Best regards,
Christian
[1] https://gerrit.wikimedia.org/r/#/admin/groups/uuid-d34747bee94be39cff54b5fd…
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi,
TL;DR: When consuming EventLogging data, only rely on the 'log'
database available from m2 replicas, like analytics-store.eqiad.wmnet.
Other representations might not get updated, might not get fix-ups or
may (on purpose) give you unvalidated data.
----------------------------------
Due to the versatile design of EventLogging, its data exists/existed
in many different representations, which got me confused around the
data quality expectations. Also I could not find them publicly
documented. After talking about different aspects with a few people, I
wanted to put my current understanding of it up for public discussion.
Please let me know (either in private or on list), if something looks
wrong or does not match your use of EventLogging data.
* MySQL / MariaDB database on m2
This database is the best place to consume EventLogging data from.
Available as 'log' database on m2 replicas, such as
analytics-store.eqiad.wmnet.
Only validated events enter the database.
In case of bugs, this database is the only place that gets fixes like
cleanup of historic data, or live fixes.
* 'all-events' JSON log files [1]
Use this data source only to debug issues around ingestion into the m2
database.
Entries are JSON objects.
Only validated events get written.
In case of bugs, historic data does not get fixed.
* Raw client and server side log files [2]
Use this data source only to debug issues around ingestion into the m2
database.
Entries are parameters to the event.gif's request. They are not
decoded at all.
In case of bugs, historic data does not get fixed. Neither need
hot-fixes reach those files.
* Kafka:
EventLogging data is no longer fed into Kafka since 2014-06-12 [3].
The EventLogging data in Kafka had no users.
Turning it on again is tracked in bug 66528 [4].
* MongoDB:
EventLogging data is no longer fed into MongoDB since 2014-02-13 [5].
The EventLogging data in MongoDB did not appear to get used.
I am not aware of plans to revive feeding the data into MongoDB.
* ZMQ:
ZMQ is available from vanadium.
In case of bugs, historic data cannot get fixed :-)
Data coming from the forwarders (ports 8421, 8422) is not validated
and need not see hot-fixes.
Data coming from processors (port 8521, 8522) and multiplexer (port
8600) is validated.
Have fun,
Christian
[1] Available as
stats1002:/a/eventlogging/archive/all-events.log-$DATE.gz
stats1003:/srv/eventlogging/archive/all-events.log-$DATE.gz
vanadium:/var/log/eventlogging/...
[2] Available as
stats1002:/a/eventlogging/archive/client-side-events.log-$DATE.gz
stats1002:/a/eventlogging/archive/server-side-events.log-$DATE.gz
stats1003:/srv/eventlogging/archive/client-side-events.log-$DATE.gz
stats1003:/srv/eventlogging/archive/server-side-events.log-$DATE.gz
vanadium:/var/log/eventlogging/...
[3] https://git.wikimedia.org/commitdiff/operations%2Fpuppet.git/f85b1dbcd61bbb…
[4] https://bugzilla.wikimedia.org/show_bug.cgi?id=66528
[5] https://git.wikimedia.org/commitdiff/operations%2Fpuppet.git/05b4027973c59b…
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi,
just a quick heads up that the replication lag on
analytics-store.eqiad.wmnet (aka “The one machine to rule them all”)
has risen to >12 hours for s1 replicas. Other replicas are fine.
So on analytics-store.eqiad.wmnet:
* enwiki is affected.
* log (EventLogging) is affected.
Other databases (like dewiki, eswiki, ...) on
analytics-store.eqiad.wmnet are /not/ affected.
For queries that only rely on enwiki, or log, you can use
s1-analytics-slave.eqiad.wmnet
as drop in replacement. enwiki and log are not lagging there.
I filed RT ticket 8032:
https://rt.wikimedia.org/Ticket/Display.html?id=8032
Best regards,
Christian
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi all,
Growth team needs some data removed from both the raw logs and analytics
slaves. Sean said he can help with the EventLogging db maintenance, but is
unfamiliar with the logs on Vanadium.
This is to purge data from a recent set of experiments that involved
setting a token for anonymous editors. Now that we've got our results and
aggregated any non-private data we need in the future, we can safely remove
any data stored in the associated schemas. This doesn't need to be
selective based on schema ids or dates, we can probably just wholesale
remove the associated schemas listed at
https://meta.wikimedia.org/wiki/Research:Asking_anonymous_editors_to_regist…
Sean suggested Christian or Nuria might be best equipped to help here. If
Aaron and I provide a list of the schemas, is this possible? Ideally, we'd
like to delete these by 8/04, so apologies in advance for such a tight
turnaround time.
--
Steven Walling,
Product Manager
https://wikimediafoundation.org/
The Wikimedia Research Hackathon on August 6 and 7 takes place parallel to
the general Wikimania Hackathon in London.
Wikimania Hackathon information is available at
https://wikimania2014.wikimedia.org/wiki/Hackathon
Research Hackathon information is available at
https://meta.wikimedia.org/wiki/Research:Labs2/Hackathons/August_6-7th,_2014
>From the Research Hackathon info page: this "is an opportunity for anyone
interested in research on wikis, Wikipedia, and other open collaborations
to meet, share ideas, and work together. It's being organized by
researchers in academia and the Wikimedia Foundation, but we want anyone
interested in research to participate. Whether or not you consider yourself
a researcher, or would ever want to be one, come with questions, answers,
data, code, crazy ideas... or just your insatiable curiosity."
Local participation will occur at Wikimania London and in Philadelphia, PA,
US. Remote participation is possible and will include researchers and
community members globally.
Please see the Research Hackathon information page for scheduling and
sign-up details.
Further questions may be directed to Aaron Halfaker (ahalfaker(a)wikimedia.org)
or Leila Zia (leila(a)wikimedia.org).*
Pine
*A $1 fine will be imposed by Oliver Keyes on anyone who misspells Leila's
name or misdirects emails to the WMF Executive Director.
As part of the global Labs2 hackathon coinciding with Wikimania
(https://meta.wikimedia.org/wiki/Research:Labs2/Hackathons/August_6-7th,_201…),
Dan Andreescu and I are hosting a local hackathon in Philadelphia,
Pennsylvania.
Where: Impact Hub Philly, 1227 N. 4th Street
When: August 7th, 2014, 10am to 8pm
More info: http://tinyurl.com/philly-wiki-research
The Wiki Research Hackathon is an opportunity for anyone interested in
research on Wikipedia, wikis, and other open collaborations to meet,
share ideas, and work together. Whether or not you consider yourself a
researcher, come with questions, answers, data, code, crazy ideas... or
just your insatiable curiosity.
People will be working on attributing credit to authors of wiki content,
screening medicine articles for quality, analyzing first edits for male
and female newcomers, and more. Join them or work on your own idea!
We're meeting at the Impact Hub in Philly and connecting online with
other groups around the world.
On August 6th, join us online, meet other researchers, and get set up to
access any data that you might need (optional).
Matt Flaschen
Hi,
is seems Wikimetrics is currently suffering load problems and may
appear down.
The corresponding bug is 68743:
https://bugzilla.wikimedia.org/show_bug.cgi?id=68743
Best regards,
Christian
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi!
So I've been rooting around in ServerSideAccountCreation and I've noticed
some inconsistencies in the data. The final two clauses in the WHERE in the
following query should be mutually exclusive (registered on Android app,
and registered not on mobile), but the number returned is nonzero.
SELECT count(*)
FROM ServerSideAccountCreation_5487345
WHERE timestamp >= 20140722000000
AND timestamp <= 20140723000000
AND userAgent like 'WikipediaApp%'
AND event_displayMobile = 0
I'm sure you guys get data inconsistencies like this all the time, but I
thought I should at least report it so you're aware.
Thanks,
Dan
--
Dan Garry
Associate Product Manager for Platform and Mobile Apps
Wikimedia Foundation
Hi,
the dev team has committed to the following user stories for the sprint
starting today, ending August 5.
Bug ID
Component
Summary
Points
68516
Wikimetrics
Story: Researcher has prototype for wikimania
8
Total Points: 8
You can see the sprint here:
http://sb.wmflabs.org/t/analytics-developers/2014-07-24/
Notes:
- 2/3 of the team is going on vacation during this sprint impacting our
regular velocity.
- Story 67128 has been completed since the previous sprint ended.
- Issues 67694, 68516 are carried over from the last sprint and work will
continue on them.
- Issue 68519 is new.
- Issues with 0 points are considered high priority production issues that
need to be resolved relatively quickly. We do not wait for our tasking or
sprint planning to work on them. The dev team takes this background noise
into account when committing to in a sprint.
Regards,
Kevin Leduc