<noindex> tag for Google
Asked Answered
P

5

26

I would like to tell Google not to index certain parts of the page. In Yandex (russian SE) there's a very useful tag called <noindex>. How can it be done with Google?

Puncture answered 28/3, 2013 at 15:2 Comment(0)
S
9

You can prevent Google from seeing portions of the page by putting those portions in iframes that are blocked by robots.txt.

robots.txt

Disallow: /iframes/

index.html

This text is crawlable, but now you'll see 
text that search engines can't see:
<iframe src="/iframes/hidden.html" width="100%" height=300 scrolling=no>

/iframes/hidden.html

Search engines cannot see this text.

Instead of using using iframes, you could load the contents of the hidden file using AJAX. Here is an example that uses jquery ajax to do so:

his text is crawlable, but now you'll see 
text that search engines can't see:
<div id="hidden"></div>
<script>
    $.get(
        "/iframes/hidden.html",
        function(data){$('#hidden').html(data)},
    );
</script>
Samualsamuel answered 30/3, 2013 at 11:56 Comment(2)
Note that the AJAX part is no longer correct. Most search engines evaluate JavaScript and execute XHR calls.Erenow
If you are all loaded by Ajax is disallowed by JavaScript, search engines still won't be able to see it even if they execute JavaScript in general.Samualsamuel
I
27

According to Wikipedia1, there are some rules some spiders follow:

<!--googleoff: all-->
This should not be indexed by Google. Though its main spider, Googlebot,
might ignore that hint.
<!--googleon: all-->

<div class="robots-nocontent">Yahoo bots won't index this.</div>

<noindex>Yandex bots ignore this text.</noindex>
<!--noindex-->They will ignore this, too.<!--/noindex-->

Unfortunately, they could not agree on a single standard it seems – and to my knowledge, there's nothing to keep all spiders off...

The googleoff: comment seems to support different options, though I'm not sure where there's a complete list. There's at least:

  • all: completely ignore the block
  • index: content doesn't go into Google's index
  • anchor: anchor text for links will not be associated with the target page
  • snippet: text will not be used to create snippets for search results

Note as well that (at least for Google) this will only affect the search index, not the page ranking etc. Furthermore, as Stephen Ostermiller correctly pointed out in his comment below, googleon and googleoff only work with the Google search appliance and have no effect on normal Googlebot, unfortunately.

There's also an article on the Yahoo part2 (and an article describing that Yandex also honors <noindex>6). On the googleoff: part, also see this answer, and the article I took most of the related information from.3


Additionally, Google Webmaster Tools recommend using the rel=nofollow attribute4 for specific links (e.g. ads or links to pages not accessible/useful to the bots, such as login/signup). That means, the HTML a rel Attribute should be honored by the Google bots – though that's mainly related to page rank, not to the search index itself. Unfortunately, it seems there's no rel=noindex5,7. I'm also not sure if this attribute could be used for other elements as well (e.g. <DIV REL="noindex">); but unless crawlers honor "noindex", that wouldn't make sense either.


Further references:


1 Wikipedia: Noindex
2 Which Sections of Your Web Pages Might Search Engines Ignore?
3 Tell Google to Not Index Certain Parts of Your Page
4 Use rel="nofollow" for specific links
5 Is it a good idea to use <a href=“http://name.com” rel=“noindex, nofollow”>name</a>?
6 Using HTML tags — Yandex.Help. Webmaster
7 existing REL values

Intercollegiate answered 10/4, 2014 at 0:15 Comment(2)
googleoff and googleon only work with the Google search appliance and have no effect on normal GooglebotSamualsamuel
@StephenOstermiller true, I figured that meanwhile as well. Thanks for pointing out, I completely forgot to update that here!Intercollegiate
S
9

You can prevent Google from seeing portions of the page by putting those portions in iframes that are blocked by robots.txt.

robots.txt

Disallow: /iframes/

index.html

This text is crawlable, but now you'll see 
text that search engines can't see:
<iframe src="/iframes/hidden.html" width="100%" height=300 scrolling=no>

/iframes/hidden.html

Search engines cannot see this text.

Instead of using using iframes, you could load the contents of the hidden file using AJAX. Here is an example that uses jquery ajax to do so:

his text is crawlable, but now you'll see 
text that search engines can't see:
<div id="hidden"></div>
<script>
    $.get(
        "/iframes/hidden.html",
        function(data){$('#hidden').html(data)},
    );
</script>
Samualsamuel answered 30/3, 2013 at 11:56 Comment(2)
Note that the AJAX part is no longer correct. Most search engines evaluate JavaScript and execute XHR calls.Erenow
If you are all loaded by Ajax is disallowed by JavaScript, search engines still won't be able to see it even if they execute JavaScript in general.Samualsamuel
P
3

No, Google does not support the <noindex> tag. Virtually no one does.

Premium answered 28/3, 2013 at 15:3 Comment(2)
Not in any way that Google approves of: webmasters.stackexchange.com/questions/16390/…Premium
"Virtually noone" includes at least Yandex, see my answer. But who really cares about that one, is another question.Intercollegiate
D
1

I had the same issue, the solution is to use data-nosnippet.

<p><span data-nosnippet>This text won't show in google results</span></p>
Doubleripper answered 5/6, 2022 at 16:8 Comment(1)
data-nosnippet doesn't stop Google from indexing the content, it only means that they won't use that content for the snippet in their search results. Google will still index that content and use it when evaluating the page ranking, and they will still display it in their cached version of the page for everyone to see.Bonnee
C
-7

Create a robots.txt file at your root level and insert something like the following:

Block Google:

User-agent: Googlebot
Disallow: /myDisallowedDir1/
Disallow: /myDisallowedPage.html
Disallow: /myDisallowedDir2/

Block all bots:

User-agent: *
Disallow: /myDisallowedDir1/
Disallow: /myDisallowedPage.html
Disallow: /myDisallowedDir2/

A handy robots.txt generator:

http://www.mcanerin.com/EN/search-engine/robots-txt.asp

Corrianne answered 30/3, 2013 at 16:59 Comment(2)
teslasimus doesn't want to block the whole page, only "certain parts".Kingwood
good point, my answer can be used along with the iframe solution proposed aboveCorrianne

© 2022 - 2024 — McMap. All rights reserved.