Update:
I have been checking on the indexed link count over the last couple of
months, and it has been roughly constant. Upon another check in the past
week, it looked like it was time to go ahead with the robots.txt update.
Just yesterday, the start of a robots.txt entry for <lang>.
has also been updated to instruct all robots like
Googlebot to not index <lang>.zero.wikipedia.org. Looks like even more
<lang>.zero.wikipedia.org pages may already be starting to fall out of the
index.
Thanks for flagging this! Will keep watching the indexed links count as it
dwindles.
Thanks again.
-Adam
On Wed, Jun 26, 2013 at 10:57 AM, Adam Baso <abaso(a)wikimedia.org> wrote:
Okay, looks like the index of
zero.wikipedia.org pages
in Google has
shrunk by some 20 million entries. Nonetheless, a number of really old
pages (e.g., going back to 6-May-2013) are still in the Google index with
article text. I'll set a reminder to check on the Google index again in 30
days, and hopefully then we can finally put the no-index rules in place at
that time.
The good news is that many of the pages are now correctly suppressed in
natural search as non-canonical pages. In other words, a user would need to
go through omitted results or do a site:<domain> search to see them.
-Adam
On Tue, Jun 18, 2013 at 3:33 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
Update:
We've added an enhancement to Wikipedia Zero so that if a user who isn't
on a participating carrier network navigates to a Wikipedia Zero page on
<language>.zero.wikipedia.org, such as
http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
presented an option to visit the canonical URL of the article. If clicked,
the canonical URL should get the user to the mobile or desktop version of
the page, based on device type.
We're hoping that by next week the Google index will be refreshed so as
to correctly mark the <language>.zero.wikipedia.org pages as duplicate
pages in the omitted section. Upon confirmation of as much, the current
plan is to introduce
https://gerrit.wikimedia.org/r/#/c/69420/ to
prevent indexing of <language>.zero.wikipedia.org altogether.
On Tue, May 21, 2013 at 11:26 AM, Adam Baso <abaso(a)wikimedia.org> wrote:
Kasper, after further analysis, it appears the
patch introduced last
week should clear this up within about 35 days. I'll set a reminder to
validate as much.
On Mon, May 20, 2013 at 2:29 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
Kasper, we're looking at a patch for this.
Thanks!
On Mon, May 20, 2013 at 1:02 PM, Kasper Souren <kasper(a)guaka.org>wrote;wrote:
> "Disallow: /" in http://*.zero.wikipedia.org/robots.txt would suffice.
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/mobile-l
>