How to tell google bot to skip part of HTML?
Asked Answered
A

7

8

There is much info about opposite situation, when people try to have stuff in HTML, that is visible to Google bots, but not visible to users, in my case, I need opposite thing - to hide some of the html from google bot. The question would be how?

Flash is not an answer,
Would prefer not to use fancy ajax things also (mainly because I need it right away, not on document ready),
Also robots.txt is not an answer, because it works on urls, not parts of the page. Would any special css/simple javascript work, is any special html tag for this?

Abby answered 11/1, 2012 at 14:49 Comment(7)
why do you want to hide information from Googlebot? What is the purpose of doing so? There may be other ways to achieve the goal than hiding HTML.Jasmin
Uhmz.. I'm pretty sure googlebot doesn't know how to read images...Aleishaalejandra
mmm what about using an iframe? you can create a separated html, exclude it en robots.txt, and in your page call de html on a iframeCohl
to be honest, I don't know why my client wishes this, he has some SEO consultant and I think that was his idea. If it really matters why, I could ask (problem is 8 hours time difference between me and client), but till that time would be nice to know alternatives.Abby
Images is not an option, as content is dynamic (actually is part of navigation). Iframes - I had that idea, but I hate them, so let's leave that as an escape option.Abby
@Abby - beware the SEO consultant that wants to show one set of content to Googlebot and another to users. This is cloaking, plain and simple. I understand that you want to prepare alternatives, but don't implement them until you know the reason - it could be very bad for your client.Dumbwaiter
I'm using pseudoclass :after on my CSS to add some text (This don't work with html, of course). example css: h1:after { display: block; content: attr( data-note ); margin: 0 0 20px 0; color: #af0000; font-weight: bold; } Example html: <h1 data-note='This example works only on centos Linux'>Tutorial about something</h1>. I think this can't be indexed by google, well only if google becomes more crazy than it is :PTriphibious
O
9

Maybe a base64 encoding server side and then decoding on the client side could work?

Code:

<!-- visible to Google -->
<p> Hi, Google Bot! </p>

<!-- not visible from here on -->
<script type="text/javascript">
document.write ("<?php echo base64_encode('<b>hey there, user</b>'); ?>");
</script>

How it looks to the bot:

<!-- visible to Google -->
<p> Hi, Google Bot! </p>

<!-- not visible from here on -->
<script type="text/javascript">
document.write (base64_decode("B9A985350099BC8913=="));
</script>
Oppose answered 11/1, 2012 at 14:55 Comment(3)
Good idea. For large amounts of HTML you can use AJAX (if you have jQuery or almost any other library, it's really easy... if you have no library, still easy to implement and may be a better solution than base64 as you dont have to encode your html each time you change it)Recurrent
This solution does not work, because the Googlebot is able to parse and execute JavaScript nowadays.Repatriate
@modiX it wasn't the best solution even back then, it's kinda hacky. But OP's client wanted it this way. Still, if Google is executing JS and indexing the results, CSS can be used via display: none AFAIK Google respects hidden elements and doesn't index the contents.Oppose
M
5

Create a Div, Load the content of the Div (ajax) from an html file which resides in a directory protected by robots. Example. /index.html

Somewhere on the header. (check http://api.jquery.com/jQuery.ajax/ )

$.ajax({
  url: '/hiddendirfrombots/test.html',
  success: function(data) {
    $('#hiddenfrombots').html(data);
  }
});

... somewhere in the body

<div id="hiddenfrombots"></div>

create a directory "hiddenfrombots" and put the followin in the roots .htaccess

User-agent: *
Disallow: /hiddenfrombots/ 
Magnificence answered 11/1, 2012 at 15:45 Comment(2)
Now as the Google bot understand javascript&ajax ... does it check robots.txt also for urls called via ajax?Mandrill
Yes it does! If you put your the url-s to be called via Ajax in a specific folder, you can just tell Google-robots to ignore that folder.Magnificence
F
4

This should do the Trick:

<!--googleoff: index-->
<p>hide me!</p>
<!--googleon: index-->

For more information check out the link to Googles page that describes it in more depth.

Excluding Unwanted Text from the Index

Fabri answered 26/11, 2015 at 1:29 Comment(1)
Only for Google search appliancesLaurettalaurette
C
2

If you can use PHP, just output your content if not Googlebot:

// if not google
if(!strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot")) { 
    echo $div;
}

That's how I could solve this issue.

Cornemuse answered 24/7, 2013 at 6:44 Comment(0)
U
0
  • Load your content via an Ajax call
  • But create a JS file (e.g.: noGoogleBot.js) that contains the function that implements the ajax call:

    $.ajax({
      url: 'anything.html',
      success: function(data) {
        $('#anywhere').html(data);
      }
    });
    

Then in your robots.txt

User-agent: *
Disallow: /noGoogleBot.js

So all the divs that are loaded using the function in noGoogleBot will be blocked. Googlebot (or any other crawler) will ignore the content of noGoogleBot.js.

Unalterable answered 8/6, 2015 at 10:36 Comment(0)
M
0

Per Google docs

<p>This text can be shown in a snippet
<span data-nosnippet>and this part would not be shown</span>.</p>

Adding data-nosnippet to div, span, or section prevents the content from being shown in the search result.

Miniver answered 10/2, 2023 at 11:59 Comment(0)
A
-2

simple, create an image with the text you don't want Google to see

Antithesis answered 11/1, 2012 at 14:55 Comment(2)
Downvote, Google does have sophisticated OCR for image capabilities.Corregidor
afaik only for PDF documents, or do you have a reference?Antithesis

© 2022 - 2024 — McMap. All rights reserved.