[Engineering] Hadoop - Last week data needs to be backfilled

Joseph Allemandou jallemandou at wikimedia.org
Mon Mar 7 10:11:12 UTC 2016


Hi,
Quick follow-up: All data has been backfilled, you can get back to normal
cluster activity :)
Sorry for the inconvenience.
Joseph


On Tue, Mar 1, 2016 at 2:26 PM, Joseph Allemandou <jallemandou at wikimedia.org
> wrote:

> Hi,
>
> *TL,DR: Please don't use hive / spark / hadoop before next week.*
>
> Last week the Analytics Team performed an upgrade to the Hadoop Cluster.
> It went reasonably well except for many of the hadoop processes were
> launched with a special option to NOT use utf-8 as default encoding.
> This issue caused trouble particularly in page title extraction and was
> detected last sunday (many kudos to the people having filled bugs on
> Analytics API about encoding :)
> We found the bug and fixed it yesterday, and backfill starts today, with
> the cluster recomputing every dataset starting 2016-02-23 onward.
> This means you shouldn't query last week data during this week, first
> because it is incorrect, and second because you'll curse the cluster for
> being too slow :)
>
> We are sorry for the inconvenience.
> Don't hesitate to contact us if you have any question
>
>
> --
> *Joseph Allemandou*
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>



-- 
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/engineering/attachments/20160307/3656ebea/attachment.html>


More information about the Engineering mailing list