Is there a way to prevent Googlebot from indexing certain parts of a page? [closed]
Asked Answered
S

9

13

Is it possible to fine-tune directives to Google to such an extent that it will ignore part of a page, yet still index the rest?

There are a couple of different issues we've come across which would be helped by this, such as:

  • RSS feed/news ticker-type text on a page displaying content from an external source
  • users entering contact phone etc. details who want them visible on the site but would rather they not be google-able

I'm aware that both of the above can be addressed via other techniques (such as writing the content with JavaScript), but am wondering if anyone knows if there's a cleaner option already available from Google?

I've been doing some digging on this and came across mentions of googleon and googleoff tags, but these seem to be exclusive to Google Search Appliances.

Does anyone know if there's a similar set of tags to which Googlebot will adhere?

Edit: Just to clarify, I don't want to go down the dangerous route of cloaking/serving up different content to Google, which is why I'm looking to see if there's a "legit" way of achieving what I'd like to do here.

Scleroprotein answered 30/9, 2009 at 11:5 Comment(1)
I voted to close this question because it is not a programming question and it is off-topic on Stack Overflow. Non-programming questions about your website should be asked on Webmasters. In this case the question has already been asked and answered there: Preventing robots from crawling specific part of a pageHeadboard
C
10

What you're asking for, can't really be done, Google either takes the entire page, or none of it.

You could do some sneaky tricks though like insert the part of the page you don't want indexed in an iFrame and use robots.txt to ask Google not to index that iFrame.

Cyrstalcyrus answered 30/9, 2009 at 11:17 Comment(0)
H
1

In short NO - unless you use cloaking with is discouraged by Google.

Herring answered 30/9, 2009 at 11:8 Comment(0)
P
0

Please check out the official documentation from here

http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html

Go to section "Excluding Unwanted Text from the Index"

<!--googleoff: index-->
here will be skipped
<!--googleon: index-->
Phenetidine answered 23/12, 2011 at 12:35 Comment(1)
Sadly, this only applies to Google Search Appliance, not to the public Google website.Isopropyl
V
0

If the concern is parts of the page that you don't want appearing in the search result snippet, you can use data-nosnippet attribute

https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#data-nosnippet-attr

Virilism answered 18/7, 2024 at 14:42 Comment(0)
C
-1

At your server detect the search bot by IP using PHP or ASP. Then feed the IP addresses that fall into that list a version of the page you wish to be indexed. In that search engine friendly version of your page use the canonical link tag to specify to the search engine the page version that you do not want to be indexed.

This way the page with the content that do want to be index will be indexed by address only while the only the content you wish to be indexed will be indexed. This method will not get you blocked by the search engines and is completely safe.

Claudieclaudina answered 30/9, 2009 at 11:16 Comment(1)
As noted in a separate comment, this may cause your site to be removed from Google.Isopropyl
Y
-1

Found useful resource for using certain duplicate content and not to allow index by search engine for such content.

<p>This is normal (X)HTML content that will be indexed by Google.</p>

<!--googleoff: index-->

<p>This (X)HTML content will NOT be indexed by Google.</p>

<!--googleon: index>
Yukoyukon answered 12/2, 2017 at 18:45 Comment(1)
Google web search doesn't support these HTML comments. Only the Google Search appliance which indexed internal company documents ever supported these.Headboard
E
-2

There are meta-tags for bots, and there's also the robots.txt, with which you can restrict access to certain directories.

Epicureanism answered 30/9, 2009 at 11:8 Comment(1)
meta-tags and robots.txt both allow or restrict access on a file level, I'm curious if you can allow a page to be indexed, but block a certain part of it.Scleroprotein
D
-2

All search engines either index or ignore the entire page. The only possible way to implement what you want is to:

(a) have two different versions of the same page

(b) detect the browser used

(c) If it's a search engine, serve the second version of your page.

This link might prove helpful.

Derk answered 30/9, 2009 at 11:9 Comment(1)
Indeed (google.com/support/webmasters/bin/…): "Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index."Derk
P
-2

Yes definitely you can stop Google from indexing some parts of your website by creating custom robots.txt and write which portions you don't want to index like wpadmins, or a particular post or page so you can do that easily by creating this robots.txt file .before creating check your site robots.txt for example www.yoursite.com/robots.txt.

Propagandism answered 31/1, 2014 at 6:45 Comment(1)
robots.txt only works on whole pages. It can' be used for "part of a page" as requested in the comment.Headboard

© 2022 - 2025 — McMap. All rights reserved.