Wikitech-l March 2010

wikitech-l@lists.wikimedia.org

102 participants
74 discussions

Start a nNew thread

Auto Reply: Transclude contemporary template states to page hisories?

by leo.st-jacques＠hc-sc.gc.ca

This is an auto-replied message. I am out of office right now and will return Thursday March 25, 2010.

14 years, 1 month

Who can give me a svndump of svn.wikimedia.org?

by Ævar Arnfjörð Bjarmason

Chad and I have been playing around with a SVN->Git conversion of MediaWiki. After running into some odd issues with git-svn and since it takes around 3 weeks to do a complete git-svn import (with branches) of the MediaWiki SVN repository I'd like to get access to `svnadmin dump' output as run on mayflower. Are these maybe being done already as part of some backup procedure but aren't made public? If there's interest I'd like to write any required scripts / hacks required to get SVN dumps on on http://download.wikimedia.org/

14 years, 1 month

Can we have one for wikibooks?

by Magnus Manske

http://www.wired.com/gadgetlab/2010/03/high-speed-camera-scans-books-in-sec…

14 years, 1 month

Wikimedia Google Summer of Code Accepted!

by Rob Lanphier

Hi folks, We've been accepted again for another Google Summer of Code! What this means: * Mentors: please go to this page to formally apply to be a mentor: http://socghop.appspot.com/gsoc/mentor/request/google/gsoc2010/wikimedia Note: you can't officially be a mentor until you do this, and we can't do it for you (part of it involves agreeing to the mentor agreement). Question for the group: how many student slots do you think we should request? On the "advice for mentors page", it says: " A good rule of thumb when finding and assigning mentors is to have two mentors per student. It is also a good idea to have a spare mentor or two who can pay attention to many students and keep track of the big picture." Given our current list of mentors (we have 9 listed, plus 1 "maybe"), that would give us "4" as the number of slots. Does that seem like a number that's both low enough that we can be reasonably confident we'll do a good job mentoring, but high enough that we're not selling ourselves short? * Students: it's still not yet formally time to apply, but now is a really good time to start brainstorming ideas, and getting clarifications on what's already been suggested: http://www.mediawiki.org/wiki/Summer_of_Code_2010 While you may be tempted (from a competitive perspective) not to reveal what your ideas are early, it is almost certainly going to be to your benefit to engage now. By "engage", I mean "demonstrate that you're really thinking about how to improve MediaWiki and other Wikimedia project technologies, and have the wherewithal to do it", not merely "impress us with what skills you have". The more specific and thoughtful your ideas, questions, and suggestions are, the more comfortable we'll all feel in selecting you. You might want to take a peek at the GSoC student agreement now, since you'll be required to agree to it as a precondition for being part of this year's program: http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/stude… Rob

14 years, 1 month

Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

by Jamie Morken

Date: Wed, 17 Mar 2010 15:15:24 +0100 From: Platonides <Platonides(a)gmail.com> Subject: Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D To: wikitech-l(a)lists.wikimedia.org Message-ID: <hnqo49$itc$1(a)dough.gmane.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Jamie Morken wrote: > Also I wonder if it is possible to convert from 7z to bz2 without having > to make the 5469GB file first? If this can be done then having only 7z > files would be fine, as the bz2 file could be created with a "normal" > PC (ie one without a 6TB+ harddrive). This would be a good solution, > but not sure if it can be done. If it could though, might as well get > rid of all the large wiki's bz2 pages-meta-history files! Sure. 7z e -so DatabaseDump.7z | bzip -9 > DatabaseDump.bz Hi, Thanks for the info, I think 7z is the way to go :) cheers, Jamie

14 years, 1 month

Recompression results

by Tim Starling

About 40% of our text storage has been recompressed into DiffHistoryBlob format, which uses a combination of binary diffs and gzip to reduce storage space. Approximately 1.9TB of text storage, mostly revisions compressed individually with gzip, was recompressed to about 140GB, a saving of 93%. -- Tim Starling

14 years, 1 month

Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

by Felipe Ortega

Let alone that, for some of us outside USA (and even with a good connection to the EU resarch network) the download process takes, so to say, slightly more time than expected (and is prone to errors as the file gets larger). So other +1 to replace bzip with 7zip. F. --- El mar, 16/3/10, Kevin Webb <kpwebb(a)gmail.com> escribió: > De: Kevin Webb <kpwebb(a)gmail.com> > Asunto: Re: [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D > Para: "Lev Muchnik" <levmuchnik(a)gmail.com> > CC: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>, xmldatadumps-admin-l(a)lists.wikimedia.org, Xmldatadumps-l(a)lists.wikimedia.org > Fecha: martes, 16 de marzo, 2010 22:35 > Yeah, same here. I'm totally fine > with replacing bzip with 7zip as the > primary format for the dumps. Seems like it solves the > space and speed > problems together... > > I just did a quick benchmark and got a 7x improvement on > decompression > speed using 7zip over bzip using a single core, based on > actual dump > data. > > kpw > > > > On Tue, Mar 16, 2010 at 4:54 PM, Lev Muchnik <levmuchnik(a)gmail.com> > wrote: > > > > I am entirely for 7z. In fact, once released, I'll be > able to test the XML > > integrity right away - I process the data on the fly, > without unpacking it > > first. > > > > > > On Tue, Mar 16, 2010 at 4:45 PM, Tomasz Finc <tfinc(a)wikimedia.org> > wrote: > >> > >> Kevin Webb wrote: > >> > I just managed to finish decompression. That > took about 54 hours on an > >> > EC2 2.5x unit CPU. The final data size is > 5469GB. > >> > > >> > As the process just finished I haven't been > able to check the > >> > integrity of the XML, however, the bzip > stream itself appears to be > >> > good. > >> > > >> > As was mentioned previously, it would be > great if you could compress > >> > future archives using pbzib to allow for > parallel decompression. As I > >> > understand it, the pbzip files are reverse > compatible with all > >> > existing bzip2 utilities. > >> > >> Looks like the trade off is slightly larger files > due to pbzip2's > >> algorithm for individual chunking. We'd have to > change the > >> > >> buildFilters function in http://tinyurl.com/yjun6n5 and install the new > >> binary. Ubuntu already has it in 8.04 LTS making > it easy. > >> > >> Any takers for the change? > >> > >> I'd also like to gauge everyones opinion on moving > away from the large > >> file sizes of bz2 and going exclusively 7z. We'd > save a huge amount of > >> space doing it at a slightly larger cost during > compression. > >> Decompression of 7z these days is wicked fast. > >> > >> let know > >> > >> --tomasz > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Xmldatadumps-admin-l mailing list > >> Xmldatadumps-admin-l(a)lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l > > > > > > _______________________________________________ > Xmldatadumps-admin-l mailing list > Xmldatadumps-admin-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l >

14 years, 1 month

Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

by Felipe Ortega

--- El mar, 16/3/10, Kevin Webb <kpwebb(a)gmail.com> escribió: > De: Kevin Webb <kpwebb(a)gmail.com> > Asunto: Re: [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D > Para: "Tomasz Finc" <tfinc(a)wikimedia.org> > CC: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>, xmldatadumps-admin-l(a)lists.wikimedia.org, Xmldatadumps-l(a)lists.wikimedia.org > Fecha: martes, 16 de marzo, 2010 21:10 > I just managed to finish > decompression. That took about 54 hours on an > EC2 2.5x unit CPU. The final data size is 5469GB. > > As the process just finished I haven't been able to check > the > integrity of the XML, however, the bzip stream itself > appears to be > good. > > As was mentioned previously, it would be great if you could > compress > future archives using pbzib to allow for parallel > decompression. As I > understand it, the pbzip files are reverse compatible with > all > existing bzip2 utilities. > Yes, they're :-). Regards, F. > Thanks again for all your work on this! > Kevin > > > On Tue, Mar 16, 2010 at 4:05 PM, Tomasz Finc <tfinc(a)wikimedia.org> > wrote: > > Tomasz Finc wrote: > >> New full history en wiki snapshot is hot off the > presses! > >> > >> It's currently being checksummed which will take a > while for 280GB+ of > >> compressed data but for those brave souls willing > to test please grab it > >> from > >> > >> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-hi… > >> > >> and give us feedback about its quality. This run > took just over a month > >> and gained a huge speed up after Tims work on > re-compressing ES. If we > >> see no hiccups with this data snapshot, I'll start > mirroring it to other > >> locations (internet archive, amazon public data > sets, etc). > >> > >> For those not familiar, the last successful run > that we've seen of this > >> data goes all the way back to 2008-10-03. That's > over 1.5 years of > >> people waiting to get access to these data bits. > >> > >> I'm excited to say that we seem to have it :) > > > > So now that we've had it for a couple of days .. can I > get a status > > report from someone about its quality? > > > > Even if you had no issues please let us know so that > we start mirroring. > > > > --tomasz > > > > _______________________________________________ > > Xmldatadumps-admin-l mailing list > > Xmldatadumps-admin-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l > > > > _______________________________________________ > Xmldatadumps-admin-l mailing list > Xmldatadumps-admin-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l >

14 years, 1 month

Broken videos

by Lars Aronsson

This past weekend, at the SXSW conference, a new initiative was launched to "get video on Wikipedia", http://videoonwikipedia.org/ That sounds like a great idea. (I wasn't there, but I was told.) But among the first videos to be uploaded since the announcement are two that show some construction equipment and both break my browser every time I try to watch them. How can this be possible with a fully updated Mozilla Firefox 3.5.8 on Ubuntu Linux? I suppose something went wrong in the OGG encoding, but still, browsers should not be fooled by this, and/or Wikimedia Commons needs to make sure videos are correctly encoded so they can be safely watched. I have asked that these two broken videos be removed, http://commons.wikimedia.org/wiki/File:6hpPowerTrowel.ogv http://commons.wikimedia.org/wiki/File:13hpBoren.ogv We discussed for a long time why OpenOffice documents can't be uploaded to Wikimedia Commons because the ZIP encoding wasn't safe and could explode in the face of the user. Well, maybe OGG isn't safe either? Should we just ban video all together? -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

14 years, 1 month

Sajax and Jquery documentation in MediaWiki

by Sanyam goyal

Hi, Is there a technical documentation/blogs/articles of mediaWiki available on net. I went through this link http://www.mediawiki.org/wiki/Category:MediaWiki_technical_documentation, but I am looking for more of architecture of mediaWiki , specifically that talks about Using Jquery and Sajax in media wiki . Thanks -- Sanyam Goyal Btech4,Senior UnderGraduate IIT Bombay Home Page: http://www.sanyamgoyal.isgreat.org/

14 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l March 2010