How to prevent search engines from indexing a single page of my website?
Asked Answered
S

7

35

I don't want the search engines to index my imprint page. How could I do that?

Slack answered 29/10, 2010 at 19:41 Comment(1)
css-tricks.com/snippets/html/…Snuffer
R
33

You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:

User-Agent: *
Disallow: /imprint-page.htm

Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.

That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).

Reading answered 29/10, 2010 at 19:42 Comment(5)
Thanks Sam! Added your link next to the other tutorial.Reading
Thanks a lot! Must I include robots.txt somewhere in the header? Or is it enough to just drop it into the root of the website?Slack
Nope, you don't need to include it in a header; it's enough to just put it in your root directory.Reading
According to this blog article: beussery.com/blog/index.php/2014/06/robots-txt-disallow-20 the information in this post is not correct. The robots.txt file will prevent search engines from crawling the page, but they will still index it. The best solution is to use meta robots tag. See answers below.Libeler
DV you said "You need a robots.txt" but other answers have indicated clearly that a robots.txt isn't a necessityChap
E
50

Also you can add following meta tag in HEAD of that page

<meta name="robots" content="noindex,nofollow" />
Enthronement answered 29/10, 2010 at 19:55 Comment(2)
This is a better solution than using robots.txt. The reason being, if you robots.txt a page out, search engines won't even visit the page. If there are links pointing to the page, they won't remove it from the index because you haven't told them to. Google will show the page without a description, because they know about the page but don't know what's on the page. The only way to explicitly remove it from the index is to tell the engines that you don't want it displayed at all with the 'noindex' command.Irenairene
This is a bit of a problem (too much more time for coding) if the head is dynamically included as server-side language like php, which will be same for all pages.Zarzuela
R
33

You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:

User-Agent: *
Disallow: /imprint-page.htm

Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.

That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).

Reading answered 29/10, 2010 at 19:42 Comment(5)
Thanks Sam! Added your link next to the other tutorial.Reading
Thanks a lot! Must I include robots.txt somewhere in the header? Or is it enough to just drop it into the root of the website?Slack
Nope, you don't need to include it in a header; it's enough to just put it in your root directory.Reading
According to this blog article: beussery.com/blog/index.php/2014/06/robots-txt-disallow-20 the information in this post is not correct. The robots.txt file will prevent search engines from crawling the page, but they will still index it. The best solution is to use meta robots tag. See answers below.Libeler
DV you said "You need a robots.txt" but other answers have indicated clearly that a robots.txt isn't a necessityChap
B
5

You can setup a robots.txt file to try and tell search engines to ignore certain directories.

See here for more info.

Basically:

User-agent: *
Disallow: /[directory or file here]
Brande answered 29/10, 2010 at 19:45 Comment(0)
G
4

Nowadays, the best method is to use a robots meta tag and set it to noindex,follow:

<meta name="robots" content="noindex, follow">
Guidry answered 12/8, 2014 at 18:45 Comment(0)
B
3
<meta name="robots" content="noindex, nofollow">

Just include this line in your <html> <head> tag. Why I'm telling you this because if you use robots.txt file to hide your URLs that might be login pages or other protected URLs that you won't show to someone else or search engines.

What I can do is just accessing the robots.txt file directly from your website and can see which URLs you have are secret. Then what is the logic behind this robots.txt file?

The good way is to include the meta tag from above and keep yourself safe from anyone.

Bans answered 10/11, 2016 at 6:36 Comment(0)
K
0

Create a robots.txt file and set the controls there.

Here are the docs for google: http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

Kale answered 8/12, 2011 at 16:57 Comment(0)
R
0

A robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds: you can explicitly disallow :

User-agent: *
Disallow: /~joe/junk.html

please visit below link for details robots.txt

Reproachless answered 30/1, 2017 at 10:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.