Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)
Hello!
The Analytics team would like to announce that we have migrated the
reportcard to a new domain:
https://analytics.wikimedia.org/dashboards/reportcard/#pageviews-july-2015-…
The migrated reportcard includes both legacy and current pageview data,
daily unique devices and new editors data. Pageview and devices data is
updated daily but editor data is still updated ad-hoc.
The team is working at this time on revamping the way we compute edit data
and we hope to be able to provide monthly updates for the main edit metrics
this quarter. Some of those will be visible in the reportcard but the new
wikistats will have more detailed reports.
You can follow the new wikistats project here:
https://phabricator.wikimedia.org/T130256
Thanks,
Nuria
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Hi all!
tl;dr: Stop using stat100[23] by September 1st.
We’re finally replacing stat1002 and stat1003. These boxes are out of
warranty, and are running Ubuntu Trusty, while most of the production fleet
is already on Debian Jessie or even Debian Stretch.
stat1005 is the new stat1002 replacement. If you have access to stat1002,
you also have access to stat1005. I’ve copied over home directories from
stat1002.
stat1006 is the new stat1003 replacement. If you have access to stat1003,
you also have access to stat1006. I’ve copied over home directories from
stat1003.
I have not migrated any personal cron jobs running on stat1002 or
stat1003. I need your help for this!
Both of these boxes are running Debian Stretch. As such, packages that
your work depends on may have upgraded. Please log into the new boxes and
try stuff out! If you find anything that doesn’t work, please let me know
by commenting on https://phabricator.wikimedia.org/T152712.
Please be fully migrated to the new nodes by September 1st. This will give
us enough time to fully decommission stat1002 and stat1003 by the end of
this quarter.
I’ve only done a single rsync of home directories. If there is new data on
stat1002 or stat1003 that you want rsynced over, let me know on the ticket.
A few notes:
- stat1002 used to have /a. This has been removed in favor of /srv. /a no
longer exists.
- Home directories are now much larger. You no longer need to create
personal directories in /srv.
- /tmp is still small, so please be careful. If you are running long jobs
that generate temporary data, please have those jobs write into your home
directory, rather than /tmp.
- We might implement user home directory quotas in the future.
Thanks all! I’ll send another email in about a months time to remind you
of the impending deadline of Sept 1.
-Andrew Otto
According to...
https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-browser/b…
... IE7 accounts for 2.5% of all pageviews in the last month.
According to...
https://analytics.wikimedia.org/dashboards/browsers/#desktop-site-by-browse…
... IE7 accounts for 5.1% of all desktop pageviews in the last month.
If that's true, IE7 (which came out 10 years ago) is more popular than all
versions of Safari combined. It also means that we need to roll back a
whole slew of features in MediaWiki that aren't supported in IE7.
Surely this can't be accurate though as most other sites on the internet
report virtually non-existent usage of IE7 (less than 1% everywhere I've
checked). Can someone double-check this?
Hi Igal,
All suggestions are welcome :)
Supporting this feature shouldn't be too difficult in theory because it is
already working with this kind of aggregation (month are built from days,
years from months...). The main problem is scalability for stats which
require uniqueness like number of users or number of edits *per page*.
That's why yearly stats can actually be disabled on some big wikis. So it
would be feasible but with edits limitations for the range (like 3-5
millions) and it would be very slow to load with lots of edits.
Akeron
2017-07-31 14:29 GMT+02:00 יגאל חיטרון <khitron(a)gmail.com>:
> Hello. It's amazing, thank you very much!
> Could I suggest one more feature, please? With it, the tool will be
> perfect. I'm talking about aggregation. Any kind of historical statistics
> for some day, month or year can be also shown as range of time. For
> example, if we have month statistics, we could fill From field to be Jan
> 2008 and To field to be May 2011, and get the aggregated numbers for this
> range. Is it possible?
> Thank you very much again,
> Igal (User:IKhitron)
>
> On Jul 30, 2017 22:18, "Pine W" <wiki.pine(a)gmail.com> wrote:
>
> > Wikiscan is an interesting tool for statistics fans. I suggest briefly
> > reading this IEG page
> > <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>, then
> > playing with the tool on https://wikiscan.org/
> >
> > Pine
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Is there a way to do it in the current scan tool?
On Jul 31, 2017 6:57 PM, "Akeron" <akeron.wp(a)gmail.com> wrote:
Good idea but the * char is allowed and sometime used in user names. Only
remaining available chars seems to be "# < > [ ] | { } / @".
Another possibility is to use a new dedicated field which would include all
user names starting with the entered string.
2017-07-31 10:27 GMT+02:00 Sibi Kanagaraj <commonssibi(a)gmail.com>:
> Hi Team ,
>
> @User:akeron
>
> Would like to know if there is any possibility for giving Wild card
> entries in the Users category .
>
> Example :
> Over here
>
> https://ta.wikiscan.org/users
>
> Will we be able to filer out users whose user name starts with TNSE -
>
> Say something like TNSE * or with a particular pattern . Say TNSE * ABC
> *XYZ
>
> Regards,
> K.Sibi
>
>
>
> On Mon, Jul 31, 2017 at 12:47 AM, Pine W <wiki.pine(a)gmail.com> wrote:
>
>> Wikiscan is an interesting tool for statistics fans. I suggest briefly
>> reading this IEG page
>> <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>, then
>> playing with the tool on https://wikiscan.org/
>>
>> Pine
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>