Hi Lars,You have a couple of options:1. download the data in lossless compressed form, https://dumps.wikimedia.org/other/pagecounts-ez/ The format is clever and doesn't lose granularity, should be a lot quicker than pagecounts-raw (this is basically what stats.grok.se did with the data as well, so downloading this way should be equivalent)2. work on Toolforge, a virtual cloud that's on the same network as the data, so getting the data is a lot faster and you can use our compute resources (free, of course): https://wikitech.wikimedia.org/wiki/Portal:Toolforg e
If you decide to go with the second option, the IRC channel where they support folks like you is #wikimedia-cloud and you can always find me there as milimetric.On Tue, Feb 20, 2018 at 12:51 PM, Lars Hillebrand <larshillebrand@icloud.com> wrote:______________________________Dear Analytics Team,I am a M.Sc. student at Copenhagen Business School. For my Master Thesis I would like to use page views data from certain Wikipedia articles. I found out that in July 2015 a new API was created which delivers this data. However, for my project I have to use data from before 2015.In my further search I found out that the old page views data exists (https://dumps.wikimedia.org/other/pagecounts-raw/ ) and until March 2017 it could be queried by using stats.grok.se. Unfortunately, this site does no longer exists, which is why I cannot filter and query the raw data in .gz format on the webpage.Are there any possibilities to get the page views data for certain articles from before July 2017?Thanks a lot and best regards,Lars HillebrandPS: I am conducting my research in R and for the post 2015 data the package “pageviews” works great._________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics