Wikitech-l August 2011

wikitech-l@lists.wikimedia.org

122 participants
122 discussions

by Erik Zachte

Page view archives are now online at http://dumps.wikimedia.org/other/pagecounts-ez/monthly/ Archives contain description of format (also in previous post) Erik Zachte

12 years, 9 months

State of page view stats

by MZMcBride

Hi. I've been asked a few times recently about doing reports of the most-viewed pages per month/per day/per year/etc. A few years after Domas first started publishing this information in raw form, the current situation seems rather bleak. Henrik has a visualization tool with a very simple JSON API behind it (<http://stats.grok.se>), but other than that, I don't know of any efforts to put this data into a database. Currently, if you want data on, for example, every article on the English Wikipedia, you'd have to make 3.7 million individual HTTP requests to Henrik's tool. At one per second, you're looking at over a month's worth of continuous fetching. This is obviously not practical. A lot of people were waiting on Wikimedia's Open Web Analytics work to come to fruition, but it seems that has been indefinitely put on hold. (Is that right?) Is it worth a Toolserver user's time to try to create a database of per-project, per-page page view statistics? Is it worth a grant from the Wikimedia Foundation to have someone work on this? Is it worth trying to convince Wikimedia Deutschland to assign resources? And, of course, it wouldn't be a bad idea if Domas' first-pass implementation was improved on Wikimedia's side, regardless. Thoughts and comments welcome on this. There's a lot of desire to have a usable system. MZMcBride

12 years, 9 months

Bugzilla Weekly Report

by reporter

MediaWiki Bugzilla Report for August 08, 2011 - August 15, 2011 Status changes this week Bugs NEW : 252 Bugs ASSIGNED : 11 Bugs REOPENED : 33 Bugs RESOLVED : 108 Total bugs still open: 5919 Resolutions for the week: Bugs marked FIXED : 66 Bugs marked REMIND : 0 Bugs marked INVALID : 10 Bugs marked DUPLICATE : 8 Bugs marked WONTFIX : 10 Bugs marked WORKSFORME : 14 Bugs marked LATER : 1 Bugs marked MOVED : 0 Specific Product/Component Resolutions & User Metrics New Bugs Per Component General/Unknown 5 ArticleFeedback 3 Images and files 3 Semantic MediaWiki 3 generic 3 New Bugs Per Product MediaWiki 33 Wikimedia 8 MediaWiki extensions 18 mwdumper 1 Wikimedia Mobile 4 Top 5 Bug Resolvers roan.kattouw [AT] gmail.com 18 mah [AT] everybody.org 14 sam [AT] reedyboy.net 12 krinklemail [AT] gmail.com 8 innocentkiller [AT] gmail.com 6

12 years, 9 months

1.18 Code Review and FIXME status

by mhershberger＠wikimedia.org

We're doing well on the road to 1.18. We went from ~370 un-reviewed revisions on last weekend to ~210 un-reviewed revisions today. We need to sustain this momentum to have 1.18 ready for release in time. One area that hasn't been getting enough attention, though, is FIXME'd revisions. On Monday, there were 95 FIXMEs and today that was only reduced to 86. We need that rate to really increase, so on Monday, I'll be contacting people with FIXME'd revisions and asking them to take action. If you don't have time, please let me, Robla, or the list know so that we can make sure that we have the code in a releasable state in time for release. (Information in this email was gleaned from the Revision Report: http://www.mediawiki.org/wiki/MediaWiki_roadmap/1.18/Revision_report) Thanks, Mark.

12 years, 9 months

Mark your calendar: MediaWiki hackathon, New Orleans, 14-16 Oct.

by Sumana Harihareswara

http://www.mediawiki.org/wiki/NOLA_Hackathon MediaWiki developers are going to meet in New Orleans, Louisiana, USA, October 14-16, 2011. Ryan Lane is putting this together and I'm helping a bit. If you're intending to come, please add your name here, just so we can start getting an idea of how many people are coming: http://www.mediawiki.org/wiki/NOLA_Hackathon#Attendees I'll add more details to the wiki page next week. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

12 years, 9 months

State of page view stats

by Erik Zachte

[Resending as plain text] I maintain compacted monthly version of dammit.lt page view stats, starting with Jan 2010 (not an official WMF project). This is to preserve our page views counts for future historians (compare Twitter archive by Library of Congress) It could also be used to resurrect http://wikistics.falsikon.de/latest/wikipedia/en/ which was very popular. Alas the author vanished and does not reply on requests and we dont have the source code. I just applied for storage on dataset1 or ..2, will publish the monthly < 2Gb files asap. Each day I download 24 hourly dammit.lt files and compact these into one file. Each month I compact these into monthly file. Major space saving: monthly files with all hourly page views is 8 Gb (compressed), with only articles with 5+ page views per month it is even less than 2 Gb. This is because each page title occurs once instead of up to 24*31 times, and bytes sent field is omitted. All hourly counts are preserved, prefixed by day number and hour number. Here are the first lines of one such file which also describes the format: Erik Zachte (on wikibreak till Sep 12) # Wikimedia article requests (aka page views) for year 2010, month 11 # # Each line contains four fields separated by spaces # - wiki code (subproject.project, see below) # - article title (encoding from original hourly files is preserved to maintain proper sort sequence) # - monthly total (possibly extrapolated from available data when hours/days in input were missing) # - hourly counts (only for hours where indeed article requests occurred) # # Subproject is language code, followed by project code # Project is b:wikibooks, k:wiktionary, n:wikinews, q:wikiquote, s:wikisource, v:wikiversity, z:wikipedia # Note: suffix z added by compression script: project wikipedia happens to be sorted last in dammit.lt files, so add this suffix to fix sort order # # To keep hourly counts compact and tidy both day and hour are coded as one character each, as follows: # Hour 0..23 shown as A..X convert to number: ordinal (char) - ordinal ('A') # Day 1..31 shown as A.._ 27=[ 28=\ 29=] 30=^ 31=_ convert to number: ordinal (char) - ordinal ('A') + 1 # # Original data source: Wikimedia full (=unsampled) squid logs # These data have been aggregated from hourly pagecount files at http://dammit.lt/wikistats, originally produced by Domas Mituzas # Daily and monthly aggregator script built by Erik Zachte # Each day hourly files for previous day are downloaded and merged into one file per day # Each month daily files are merged into one file per month # # This file contains only lines with monthly page request total greater/equal 5 # # Data for all hours of each day were available in input # aa.b File:Broom_icon.svg 6 AV1,IQ1,OT1,QB1,YT1,^K1 aa.b File:Wikimedia.png 7 BO1,BW1,CE1,EV1,LA1,TA1,^A1 aa.b File:Wikipedia-logo-de.png 5 BO1,CE1,EV1,LA1,TA1 aa.b File:Wikiversity-logo.png 7 AB1,BO1,CE1,EV1,LA1,TA1,[C1 aa.b File:Wiktionary-logo-de.png 5 CE1,CM1,EV1,TA1,^N1 aa.b File_talk:Commons-logo.svg 9 CE3,UO3,YE3 aa.b File_talk:Incubator-notext.svg 60 CH3,CL3,DB3,DG3,ET3,FH3,GM3,GO3,IA3,JQ3,KT3,LK3,LL3,MH3,OO3,PF3,XO3,[F3,[O3, ]P3 aa.b MediaWiki:Ipb_cant_unblock 5 BO1,JL1,XX1,[F2

12 years, 9 months

State of page view stats

by Erik Zachte

I maintain compacted monthly version of dammit.lt page view stats, starting with Jan 2010 (not an official WMF project). This is to preserve our page views counts for future historians (compare Twitter archive by Library of Congress) It could also be used to resurrect http://wikistics.falsikon.de/latest/wikipedia/en/ which was very popular. Alas the author vanished and does not reply on requests and we don't have the source code. I just applied for storage on dataset1 or ..2, will publish the monthly < 2Gb files asap. Each day I download 24 hourly dammit.lt files and compact these into one file. Each month I compact these into monthly file. Major space saving: monthly files with all hourly page views is 8 Gb (compressed), with only articles with 5+ page views per month it is even less than 2 Gb. This is because each page title occurs once instead of up to 24*31 times, and 'bytes sent' field is omitted. All hourly counts are preserved, prefixed by day number and hour number. Here are the first lines of one such file which also describes the format: Erik Zachte (on wikibreak till Sep 12) # Wikimedia article requests (aka page views) for year 2010, month 11 # # Each line contains four fields separated by spaces # - wiki code (subproject.project, see below) # - article title (encoding from original hourly files is preserved to maintain proper sort sequence) # - monthly total (possibly extrapolated from available data when hours/days in input were missing) # - hourly counts (only for hours where indeed article requests occurred) # # Subproject is language code, followed by project code # Project is b:wikibooks, k:wiktionary, n:wikinews, q:wikiquote, s:wikisource, v:wikiversity, z:wikipedia # Note: suffix z added by compression script: project wikipedia happens to be sorted last in dammit.lt files, so add this suffix to fix sort order # # To keep hourly counts compact and tidy both day and hour are coded as one character each, as follows: # Hour 0..23 shown as A..X convert to number: ordinal (char) - ordinal ('A') # Day 1..31 shown as A.._ 27=[ 28=\ 29=] 30=^ 31=_ convert to number: ordinal (char) - ordinal ('A') + 1 # # Original data source: Wikimedia full (=unsampled) squid logs # These data have been aggregated from hourly pagecount files at <http://dammit.lt/wikistats> http://dammit.lt/wikistats, originally produced by Domas Mituzas # Daily and monthly aggregator script built by Erik Zachte # Each day hourly files for previous day are downloaded and merged into one file per day # Each month daily files are merged into one file per month # # This file contains only lines with monthly page request total greater/equal 5 # # Data for all hours of each day were available in input # aa.b File:Broom_icon.svg 6 AV1,IQ1,OT1,QB1,YT1,^K1 aa.b File:Wikimedia.png 7 BO1,BW1,CE1,EV1,LA1,TA1,^A1 aa.b File:Wikipedia-logo-de.png 5 BO1,CE1,EV1,LA1,TA1 aa.b File:Wikiversity-logo.png 7 AB1,BO1,CE1,EV1,LA1,TA1,[C1 aa.b File:Wiktionary-logo-de.png 5 CE1,CM1,EV1,TA1,^N1 aa.b File_talk:Commons-logo.svg 9 CE3,UO3,YE3 aa.b File_talk:Incubator-notext.svg 60 CH3,CL3,DB3,DG3,ET3,FH3,GM3,GO3,IA3,JQ3,KT3,LK3,LL3,MH3,OO3,PF3,XO3,[F3,[O3, ]P3 aa.b MediaWiki:Ipb_cant_unblock 5 BO1,JL1,XX1,[F2

12 years, 9 months

Re: [Wikitech-l] We need to make it easy to fork and leave

by Brion Vibber

On Fri, Aug 12, 2011 at 6:55 AM, David Gerard <dgerard(a)gmail.com> wrote: > > [posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere] > > > THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to > fork the projects, so as to preserve them. > > This is the single point of failure problem. The reasons for it having > happened are obvious, but it's still a problem. Blog posts (please > excuse me linking these yet again): > > * http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/ > * http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/ > > I dream of the encyclopedia being meaningfully backed up. This will > require technical attention specifically to making the projects - > particularly that huge encyclopedia in English - meaningfully > forkable. > > Yes, we should be making ourselves forkable. That way people don't > *have* to trust us. > > We're digital natives - we know the most effective way to keep > something safe is to make sure there's lots of copies around. > > How easy is it to set up a copy of English Wikipedia - all text, all > pictures, all software, all extensions and customisations to the > software? What bits are hard? If a sizable chunk of the community > wanted to fork, how can we make it *easy* for them to do so? Software and customizations are pretty easy -- that's all in SVN, and most of the config files are also made visible on noc.wikimedia.org. If you're running a large site there'll be more 'tips and tricks' in the actual setup that you may need to learn; most documentation on the setups should be on wikitech.wikimedia.org, and do feel free to ask for details on anything that might seem missing -- it should be reasonably complete. But to just keep a data set, it's mostly a matter of disk space, bandwidth, and getting timely updates. For data there are three parts: * page data -- everything that's not deleted/oversighted is in the public dumps at download.wikimedia.org, but may be a bit slow to build/process due to the dump system's history; it doesn't scale as well as we really want with current data size. More to the point, getting data isn't enough for a "working" fork - a wiki without a community is an empty thing, so being able to move data around between different sites (merging changes, distributing new articles) would be a big plus. This is a bit awkward with today's MediaWiki (though I tjimk I've seen some exts aiming to help); DVCSs like git show good ways to do this sort of thing -- forking a project on/from a git hoster like github or gitorious is usually the first step to contributing upstream! This is healthy and should be encouraged for wikis, too. * media files -- these are freely copiable but I'm not sure the state of easily obtaing them in bulk. As the data set moved into TB it became impractical to just build .tar dumps. There are batch downloader tools available, and the metadata's all in dumps and api. * user data -- watchlists, emails, passwords, prefs are not exported in bulk, but you can always obtain your own info so an account migration tool would not be hard to devise. > And I ask all this knowing that we don't have the paid tech resources > to look into it - tech is a huge chunk of the WMF budget and we're > still flat-out just keeping the lights on. But I do think it needs > serious consideration for long-term preservation of all this work. This is part of WMF's purpose, actually, so I'll disagree on that point. That's why for instance we insist on using so much open source -- we *want* everything we do to be able to be reused or rebuilt independently of us. -- brion > > > - d. > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l

12 years, 9 months

SMWCon 2011 registration now open

by Markus Krötzsch

[All apologies for cross-posting] We are happy to announce that you can now register for SMWCon Fall 2011 Berlin, September 21–23, 2011 http://semantic-mediawiki.org/wiki/SMWCon_Fall_2011 Registration is at http://de.amiando.com/SMWCon_Fall_2011 SMWCon brings together developers, users, and organizations from the Semantic MediaWiki community in particular and everyone interested in managing data in wikis in general. The Fall 2011 event runs for three days September 21–23, 2011: * Sept 21: practical tutorials about using SMW (learn about essential aspects of using SMW) + developer consultation (meet with all developers and discuss technical questions) * Sept 22–23: community conference with talks and discussions The detailed program is about to take shape [1]. Contributions are still possible. Please note that the event takes place at the time of the famous Berlin Marathon and a visit of Pope Benedict XXI. Booking hotels soon is recommended. You can register for the whole event or for the conference days only. Registration includes lunch and coffee on all days + a conference dinner on Sept 22nd. Special subsidised rates are available for students. Moreover, MediaWiki developers are invited to join the first day (in particular the developer consultations) at a reduced rate. We are stretching ourselves to keep rates as low as possible in spite of additional costs incurred by the rooms this time. We are therefore welcoming sponsors to help back up the finances of the meeting, now and in the future. If your organisation would be interested in becoming an official supporter of the event, please contact the Open Semantic Data Association <osda(a)semantic-mediawiki.org>. SMWCon Fall 2011 is organised by the Web-Based Systems Group at Free University Berlin [2] and by MediaEvent Services [3]. Looking forward to seeing you in Berlin! Markus [1] http://semantic-mediawiki.org/wiki/SMWCon_Fall_2011 [2] http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/index.html [3] http://mediaeventservices.com/

12 years, 9 months

"Private Mode" questions

by Jay Ashworth

First, I realize that this addition to the 1.17 installer is sort of a "hold it with our fingertips and pinch our nose" feature, so thanks for putting it in at all... :-) That said, there are a few bits of tuning I've been trying to add, and they lead me to a more basic question for which my google-fu is unavailing. I've changed the page title and copy on the "Log In" page, having gotten lucky with the names of those messages in the system message dictionary. I'm trying to find an easy way to take "Click here to return to $pagetitle" on the post-logout page *out*, since, on a login-required-to-read wiki, it's lying to the user: they can't do that. The copy above it about flushing your cache, I found and changed. But that message, if it's not hardwired in code by accident, lives in a spot in the message dictionary that I couldn't find (except perhaps by scrolling through the entire thing)... which leads me to said question: I tentatively assume that the built in text search engine might search *modified* messages in the dictionary, given how one edits them... but is there any way to make it search *default messages*? Do they live in a (pseudo-)namespace the default, or user-specified, messages? Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

12 years, 9 months

← Newer
1
...
6
7
8
9
10
11
12
13
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l August 2011