Hi,
just a quick heads up, that Ops are about to add a “php” key to the
X-Analytics header (i.e.: for sampled-1000 logs, hive, ...):
https://gerrit.wikimedia.org/r/#/c/156793/
This header will hold the used PHP implementation [1].
Planned deployment is between 2014-09-01 and 2014-09-02.
Have fun,
Christian
[1] https://wikitech.wikimedia.org/wiki/X-Analytics#Keys
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi,
just a quick heads up that due to database issues, geowiki currently
cannot update daily with new data.
So pages with daily active editor counts like
http://gp.wmflabs.org/graphs/active_editors_totalhttp://gp.wmflabs.org/graphs/enwiki_editor_countshttp://gp.wmflabs.org/graphs/frwiki_editor_countshttp://gp.wmflabs.org/graphs/eowiki_editor_counts
[...]
and the private per country breakdowns at
https://stats.wikimedia.org/geowiki-private/
will not see updates until the issue is resolved.
Older data is not affected by the issue. So data up to May 1st is good
to use (with the usual geowiki caveats).
Best regards,
Christian
P.S.: The root issue is not severe, and I guess it can be fixed in the
next couple of days.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi :-)
These are the largest Eventlogging tables on m2-master:
145G MobileWebClickTracking_5929948.ibd
94G PageContentSaveComplete_5588433.ibd
61G MediaViewer_8572637.ibd
57G MediaViewer_8245578.ibd
30G MultimediaViewerNetworkPerformance_7917896.ibd
29G MediaViewer_8935662.ibd
24G MobileWikiAppToCInteraction_8461467.ibd
Are these sizes roughly expected?
Anything we can discard or reduce?
Where did the discussion on purging data end up?
No immediate problems here, just rattling cages :-)
BR
/s
--
DBA @ WMF
Hi,
people from gerrit's “Analytics” group [1] currently hold
* Push (including Force Push)
* Push Merge Commit
* Forge Author Identiy
* Forge Committer Identity
permissions on “analytics/*” projects in gerrit. But those permissions
got and get in the way one way or the other.
Do we need those permissions for our repos?
If no one objects, I'll start removing them on 2014-04-28.
Best regards,
Christian
[1] https://gerrit.wikimedia.org/r/#/admin/groups/uuid-d34747bee94be39cff54b5fd…
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi,
The fruits of our labor on Editor Engagement Vital Signs (EEVS) is on
display. This is still an early release, we have a backlog of feedback
from internal stakeholders and more iterations are to come.
https://metrics.wmflabs.org/static/public/dash/
This sprint’s commitments are:
Bug ID
Component
Summary
Points
69569
Wikimetrics
Story:d WikimetricsUser runs 'Rolling Recurring old active editors' report
13
67806
Visualization
Story: EEVSUser loads static site in accordance to Pau's design
13
71009
Wikimetrics
Update 'existing' Pages Created to include delete pages
5
71008
Wikimetrics
Update 'existing' Edits Metric to include deleted pages
5
70887
Dashiki
Story: Bookmarks / Statefull URL. Define protocol and use it to bootstrap
the dashboard and keep state
21
That’s 55 Points in 5 stories
Our progress is tracked in scrumbugs:
http://sb.wmflabs.org/t/analytics-developers/2014-09-18/
cheers,
Kevin Leduc
Hey folks,
It looks like we are accepting connections from port 443 (https) on
datasets.wikimedia.org, but we are serving a cert for stats.wikimedia.org.
I'd like to either:
1. Fix the cert so that people can connect to https://...
2. Disable connections from 443
-Aaron
---------- Forwarded message ----------
From: Finn Årup Nielsen <fn(a)imm.dtu.dk>
Date: Tue, Sep 30, 2014 at 11:28 AM
Subject: datasets.wikimedia.org certificat problem.
To: aaron.halfaker(a)gmail.com, dario(a)wikimedia.org
Dear Aaron and Dario,
I get ssl_error_bad_cert_domain from datasets.wikimedia.org:
datasets.wikimedia.org uses an invalid security certificate.
best regards
Finn
Hi,
in the week from 2014-09-15–2014-09-21 Andrew, Jeff, and I worked on
the following items around the Analytics Cluster and Analytics related
Ops:
* Using kafkatee to generate TSVs
* Bringing Webstatscollector to Hive
* TSV generation through Hive
* Logstash demo
* Reorganizing Wikimetrics mounts
* Stream to Universities
* Analytics1021 issues not an artifact of kafka consumers
* X-Analytics php tag missing/wrong for some requests (Bug 70463)
(details below)
Have fun,
Christian
* Using kafkatee to generate TSVs
For meeting the overall plan of ceasing to rely on udp2log for
Analytics tasks, we wanted to use kafkatee as drop in replacement for
udp2log. While initial tests were positive, kafkatee did not run
smoothly when trying to use in production, as it for example dropped
some partitions, and didn't update offset files. Both of which being
blockers for its use.
We're in contact with the kafkatee developer, and producing the
necessary logs for him to be able to debug it. But the issues have not
yet been resolved.
* Bringing Webstatscollector to Hive
We produced a first running Hive/Oozie implementation of
webstatscollector. Code still need polishing, but it's working. Once
in production, this code will be the first real-world use of the
cluster.
* TSV generation through Hive
Since kafkatee showed some severe issues for us (see above), we
discussed a plan B to move off of udp2log. After the initial checks,
it seems generating the TSVs through Hive could work out. It would
come with some nice benefits (like being able to re-run files, or
better controlling when which data flows into it), but also some real
downsides (like adding filters requiring implementation instead of
configuration, and no longer being able to use the existing tooling
around udp2log (think udp-filters to geolocate))
So we're still targeting to use kafkatee. But if it does not work out,
there are no immediate blockers for a Hive-based move away from
udp2log.
* Logstash demo
In order to raise visibility around Logstash and it's usefulness
around Hive and Hadoop, there was a demo session that showed the basic
workflows.
* Reorganizing Wikimetrics mounts
Wikimetrics ran out database disk space on the labs instances, so more
space got allocated and contents of the instances has been reshuffled
a bit to take better use of available disk space.
* Stream to Universities
Since some years some aspects of the udp2log multicast got streamed to
Universities for research purposes. Those streams caused pain on many
levels, and this week, the last one of those legacy streams could get
turned off.
* Analytics1021 issues not an artifact of kafka consumers
Around analytics1021, progress has been slow, as the issue on
analytics1021 only occur sporadically.
But kafka consumers got ruled out as culprit for dropping messages,
since the missing lines have been identified to be already missing in
kafka.
* X-Analytics php tag missing/wrong for some requests (Bug 70463)
The php={zend,hhvm} tagging happened twice for bits. Ops fixed the
double tagging, but now some requests don't see a tag at all. While
this is expected for some cases, Ops assume that some HHVM requests
come with php=zend tags. They are working on it.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
https://gerrit.wikimedia.org/r/#/c/147505/
If someone could review this MediaWiki patch, I'd be most grateful. It's
so frustrating when users, journalists and everyone mention
Special:Statistics numbers and get surprised at the existence of WikiStats.
Nemo
Forwarding from Siko Bouterse:
Greetings! The Wikimedia Foundation Individual Engagement Grants program is
accepting proposals for funding new experiments from September 1st to 30th.
<https://meta.wikimedia.org/wiki/Grants:IEG>
Your idea can improve Wikimedia projects by building a new tool or gadget,
organizing a better process on your wiki, conducting research on an
important issue, or providing other support for community-building. Whether
you need $200 or $30,000 USD, Individual Engagement Grants can cover your
own project development time in addition to funding for a team to help you.
The program has a flexible schedule and reporting structure, and
Grantmaking staff are there to support you through all stages of the
process.
Do you have have a good idea, but you are worried that it isn’t developed
enough for a grant? Put it into the IdeaLab, where volunteers and staff
can give you advice and guidance on how to bring it to life. <
https://meta.wikimedia.org/wiki/Grants:IdeaLab> Also, IEG will be hosting
three Hangout Sessions for real-time discussions to help you make your
proposal better - the first will happen on September 16th. <
https://meta.wikimedia.org/wiki/Grants:IdeaLab/Events#Upcoming_events>
For inspiration, you can read more about past projects <
https://blog.wikimedia.org/tag/individual-engagement-grants/> that received
funding or review open proposals <
https://meta.wikimedia.org/wiki/Grants:IEG#ieg-reviewing>. We are excited
to see some of the new ways your grant ideas can support our community and
make an impact on the future of Wikimedia projects.
Submit your proposal in September! <
https://meta.wikimedia.org/wiki/Grants:IEG#ieg-apply>