[Wikipedia-l] Re: Image tarball

16 Oct 2004

Andy Rabagliati wrote:
...
  It will take me a week or so to get a good look at
these - but -
 a question for the developers - am I right to only accept files
 matching ./en/[0-9a-f]/../* from the archive ?

 Presumably uploads are just hashed into these dirs ? 
Yes, that's correct. The directory name is derived from the MD5 hash of 
the filename.

...
  There are a few pics that come with the mediawiki
software that I
 would, naturally, leave alone.

 In the first (Jun) archive /thumb/* was about 700Meg, and /archive/*
 was similar. There were also a lot of encyclopedia pics in the
 root dir - I threw them all away without noticing anything untoward. 
In the real root directory there's symlinks to images in the other 
directories, apparently left there to avoid breaking URLs used in an 
earlier version of the software. Obviously tar has converted them from 
symlinks to duplicates. You can delete them.

...
  I might run a script over the archive and convert
large images
 to ones of the same size but, say, 70% quality. I imagine I
 could easily halve the archive size that way. 
Quite likely.

...
  If there are other regexes that would catch files
resized by the
 server I would be very grateful for the hint. 
The thumb directory contains all the images resized automatically, 
although the ./en/[0-9a-f] directories will contain some duplicate 
images resized by hand.

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[Wikipedia-l] Re: Image tarball