[Foundation-l] Article Feedback Tool 5 testing deployment

Fri Dec 23 02:41:47 UTC 2011

Oliver, with regards to Geni's question and your response, this is what I understood was the situation too: that the use of AFTv5 was on a small subset of articles to ensure minimum disruption to the editing community whilst still being able to gain enough usage data from readers to know whether it's working. Then iterate, improve, rollout to a slightly larger set, repeat.... :-)

However, I'd like to contest the two reasons you've given for not turning off AFTv4 in the mean time.

On 23/12/2011, at 3:49, Oliver Keyes <okeyes at wikimedia.org> wrote:

> Actually, we're trying to avoid turning off AFT4. The reasoning is twofold.
> 
> On a product development front, the AFT5 presence is for testing purposes,
> and for testing purposes only; it will be up for around 2-3 weeks so we can
> build a decent picture of the quantity and quality of feedback we're
> getting. While this process is going on, we want to maintain a pretty
> coherent interface for the readers to avoid confusion - and AFT4 is much
> closer to AFT5 than no form at all is.

Are you saying that AFTv4 (the 'star rating' system) is being used as the "control group" in this experiment? That is, if ONLY 0.3% of en.wp articles had a feedback tool enabled, then they would receive different kinds of feedback because they would look different to the vast majority if the encyclopedia. So you're trying to minimize that difference by keeping it running on all the rest? If that's the case, then surely you only need to run the "control" group at the same frequency as the new tests rather than giving them disproportionate visibility.

On the other hand, what I think you're saying is that you want to preserve a consisten user-experience during this period of testing AFTv5, so that we don't go from 100% of v4, to 0.3% of v5 (with the rest having nothing), and then to 100% v5. If this is the case I find it a bit worrying that the current version of the tool - which has always been proposed as experimental - is now simply there as a placeholder awaiting improvement. Surely if we know that we're not using the current version any more, we should take it offline until the new one is ready. I would be very surprised if any members of the general public would be confused because I would be surprised if any members of the general public are actually looking for the feedback tool when they visit any articles. Quite the contrary, I think the public WOULD be confused if we told them that the big box at the bottom of every article is only there to "maintain a consistent interface" and we're not actually using the ratings data that the big box is asking them for.

I'm NOT making the argument that the AFT is inherently bad (in fact I'm really looking forward to the v5 of the tool to see how much good-quality reader feedback we get, which will hopefully enliven a lot of very quiet talkpages). I'm also NOT making the argument that the WMF needs to seek some kind of mythical consensus for every single software change or new feature test. What I AM saying is that now that v4 has been depreciated it is both disingenuous to our readers and annoying to our community to have a big box appear in such valuable real-estate simply because it will eventually be replaced by a different, more useful, box. As you say, this replacement is "still quite some time away" so it's a long time to leave a placeholder on the world's 5th most visited website.

> 
> On a data front, because the AFT5 presence is only for tests, and is only
> temporary (at least at the moment) there's no question of AFT4 feedback
> being ignored; the actual replacement of AFT4 with AFT5 on a wider scale is
> still quite some time away, and until that happens, I hope any AFT4
> feedback will be taken into account.

What AFTv4 ratings has ever actually been used? I understand that data on HOW the tool has been used is providing input into the design of v5, which is fair enough. But has anyone actually been able to get useful data out of the ratings themselves - either on a per-article or whole dataset basis? I think the software of the "article feedback dashboard" is very interesting and potentially quite a useful system http://en.wikipedia.org/wiki/Special:ArticleFeedback but, honestly, has any Wikipedian ever been able to make practical use of that information to improve articles? Personally, I make use of that tool to identify articles which are current targets for NPOV editing [e.g. Justin Beiber is currently 6th highest rated article in the entire encyclopedia, whilst Hanukkah is the 4th lowest], potentially useful information for vandal patrollers, but hardly the intended use of the whole system. 

Sincerely,
-Liam