Look at the robots.txt of this site:
The content is:
User-Agent: Googlebot
Disallow: /
That ought to tell google not to index the site, no?
If true, why does the site appear in google searches?
Look at the robots.txt of this site:
The content is:
User-Agent: Googlebot
Disallow: /
That ought to tell google not to index the site, no?
If true, why does the site appear in google searches?
Besides having to wait, because Google's index updates take some time, also note that if you have other sites linking to your site, robots.txt alone won't be sufficient to remove your site.
Quoting Google's support page "Remove a page or site from Google's search results":
If the page still exists but you don't want it to appear in search results, use robots.txt to prevent Google from crawling it. Note that in general, even if a URL is disallowed by robots.txt we may still index the page if we find its URL on another site. However, Google won't index the page if it's blocked in robots.txt and there's an active removal request for the page.
One possible alternative solution is also mentioned in above document:
Alternatively, you can use a noindex meta tag. When we see this tag on a page, Google will completely drop the page from our search results, even if other pages link to it. This is a good solution if you don't have direct access to the site server. (You will need to be able to edit the HTML source of the page).
If you just added this, then you'll have to wait - it's not instantaenous - until Googlebot comes back to respider the site and sees the robots.txt, the site'll still be in their database.
I doubt it's relevant, but you might want to change your "Agent" to "agent" - Google's most likely not case sensitive for this, but can't hurt to follow the standard exactly.
I can confirm Google doesn't respect the Robots Exclusion File. Here's my file, which I created before putting this origin online:
https://git.habd.as/robots.txt
And the full contents of the file:
User-agent: *
Disallow:
User-agent: Google
Disallow: /
And Google still indexed it.
I don't use Google after cancelling my account last March and never had this site added to a webmaster console outside Yandex which leaves me with two assumptions:
I haven't grepped my logs yet but I will and my assumption is I'll find Google spiders in there misbehaving.
© 2022 - 2024 — McMap. All rights reserved.