[QA] WMF Continuous Integration currently offline

Antoine Musso hashar+wmf at free.fr
Wed Jun 8 20:48:40 UTC 2016


On 08/06/16 18:47, Antoine Musso wrote:
> Le 08/06/2016 à 15:02, Antoine Musso a écrit :
>>
>> The operation team has worked hard this European morning to backup
>> files, investigate the raid issue and setup a new host.
>>
>> We are in the process of reinstalling everything on the new host and
>> bring back Jenkins and Zuul on it.
>>
>> No ETA yet, since a 5 years old boxes must have hidden issues which
>> makes it hard to estimate how long it would need to fully recover.
>
> A status update:
>
> Ops (Jaime, Faidon, Mark, Chris) had a disk replaced and the raid array
> is rebuilding right now.  Should take roughly an hour from now.  If the
> disk and raid are confirmed to be fine, we would bring back Jenkins and
> Zuul.
>
> A new server has been installed contint1001. Jenkins data are being
> copied there.  We would need to adjust a few network rules and update IP
> address in configuration files then attempt to switch to that new setup.
>
> Main task is:
> https://phabricator.wikimedia.org/T137265

The CI service is back since 19:00 UTC after a disk got replaced and the 
RAID array rebuild successfully.

The issue might well occurs again and we would move the various services 
out of the server (gallium).

-- 
Antoine Musso





More information about the QA mailing list