[Xmldatadumps-admin-l] Degraded Raid Controller on dumps/snapshots storage node

Tomasz Finc tfinc at wikimedia.org
Fri Jul 24 20:43:06 UTC 2009


Tomasz Finc wrote:
> Tomasz Finc wrote:
>> Brion Vibber wrote:
>>> Tomasz Finc wrote:
>>>> Looks like we aren't getting in the replacement drives until mon/tues 
>>>> of next week so the array will continue to be in degraded state until 
>>>> then. Thankfully it's still under warranty so the turn around wont be 
>>>> too bad. Tentatively putting the work to happen on Tuesday now.
>>> We were able to put in new disks today, but the raid array didn't fully 
>>> recover. We got lots of I/O errors, and have been unable to run JFS 
>>> recovery successfully so far.
>>>
>>> In the meantime we're running http://download.wikimedia.org/ off the 
>>> copy of the last couple of dumps that had been copied to another server. 
>>> The dump _files_ are there but currently the index is not.
>>>
>>> We're not 100% sure whether we'll be able to recover the earlier dumps 
>>> or not, but of course more will be made soon enough. :)
>>>
>>> Some additional files such as the MediaWiki release download and DVD ISO 
>>> downloads are still in process of being restored.
>>>
>>> -- brion
>> Thanks for the update Brion. I'll be checking in with Rob tomorrow to 
>> see how ready the new set of drives are and if we are set to start 
>> generating the snapshots anew.
>>
>> --tomasz
>>
>> _______________________________________________
>> Xmldatadumps-admin-l mailing list
>> Xmldatadumps-admin-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
> 
> Sadly Fred and Rob took a look at the JFS storage and were not able to 
> salvage any of the existing file system. We've gone ahead and started 
> clean with the archives I made last week as the seeds.
> 
> There will be one more day of testing tomorrow for drive removal and I 
> expect to have the system back up and running by the end of the week. It 
> should take about a week after to get a full cycle of all wikis.
> 

Everything has been looking really good so far and I'm finally 
comfortable in starting the snapshots back up. The only bit left to do 
is to test by pulling a drive but that will have to wait till we have 
RobH on site again.

Were currently running at five snapshot processes and if nothing weird 
happens I'll dial it up to eight.

--tomasz



More information about the Xmldatadumps-admin-l mailing list