How to prevent search engines from indexing a single page of my website?

S

7

35

I don't want the search engines to index my imprint page. How could I do that?

Slack answered 29/10, 2010 at 19:41 Comment(1)

css-tricks.com/snippets/html/… – Snuffer 14/3, 2015 at 16:40

R

33

You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:

User-Agent: *
Disallow: /imprint-page.htm

Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.

That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).

Reading answered 29/10, 2010 at 19:42 Comment(5)

Thanks Sam! Added your link next to the other tutorial. – Reading 29/10, 2010 at 19:50

Thanks a lot! Must I include robots.txt somewhere in the header? Or is it enough to just drop it into the root of the website? – Slack 29/10, 2010 at 19:53

Nope, you don't need to include it in a header; it's enough to just put it in your root directory. – Reading 29/10, 2010 at 20:1

According to this blog article: beussery.com/blog/index.php/2014/06/robots-txt-disallow-20 the information in this post is not correct. The robots.txt file will prevent search engines from crawling the page, but they will still index it. The best solution is to use meta robots tag. See answers below. – Libeler 15/1, 2016 at 13:55

DV you said "You need a robots.txt" but other answers have indicated clearly that a robots.txt isn't a necessity – Chap 16/6, 2018 at 8:12

E

50

Also you can add following meta tag in HEAD of that page

<meta name="robots" content="noindex,nofollow" />

Enthronement answered 29/10, 2010 at 19:55 Comment(2)

This is a better solution than using robots.txt. The reason being, if you robots.txt a page out, search engines won't even visit the page. If there are links pointing to the page, they won't remove it from the index because you haven't told them to. Google will show the page without a description, because they know about the page but don't know what's on the page. The only way to explicitly remove it from the index is to tell the engines that you don't want it displayed at all with the 'noindex' command. – Irenairene 2/11, 2010 at 22:52

This is a bit of a problem (too much more time for coding) if the head is dynamically included as server-side language like php, which will be same for all pages. – Zarzuela 12/6, 2015 at 21:48

R

33

You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:

User-Agent: *
Disallow: /imprint-page.htm

Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.

That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).

Reading answered 29/10, 2010 at 19:42 Comment(5)

Thanks Sam! Added your link next to the other tutorial. – Reading 29/10, 2010 at 19:50

Thanks a lot! Must I include robots.txt somewhere in the header? Or is it enough to just drop it into the root of the website? – Slack 29/10, 2010 at 19:53

Nope, you don't need to include it in a header; it's enough to just put it in your root directory. – Reading 29/10, 2010 at 20:1

According to this blog article: beussery.com/blog/index.php/2014/06/robots-txt-disallow-20 the information in this post is not correct. The robots.txt file will prevent search engines from crawling the page, but they will still index it. The best solution is to use meta robots tag. See answers below. – Libeler 15/1, 2016 at 13:55

DV you said "You need a robots.txt" but other answers have indicated clearly that a robots.txt isn't a necessity – Chap 16/6, 2018 at 8:12

B

5

You can setup a robots.txt file to try and tell search engines to ignore certain directories.

See here for more info.

Basically:

User-agent: *
Disallow: /[directory or file here]

Brande answered 29/10, 2010 at 19:45 Comment(0)

G

4

Nowadays, the best method is to use a robots meta tag and set it to noindex,follow:

<meta name="robots" content="noindex, follow">

Guidry answered 12/8, 2014 at 18:45 Comment(0)

B

3

<meta name="robots" content="noindex, nofollow">

Just include this line in your <html> <head> tag. Why I'm telling you this because if you use robots.txt file to hide your URLs that might be login pages or other protected URLs that you won't show to someone else or search engines.

What I can do is just accessing the robots.txt file directly from your website and can see which URLs you have are secret. Then what is the logic behind this robots.txt file?

The good way is to include the meta tag from above and keep yourself safe from anyone.

Bans answered 10/11, 2016 at 6:36 Comment(0)

K

0

Create a robots.txt file and set the controls there.

Here are the docs for google: http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

Kale answered 8/12, 2011 at 16:57 Comment(0)

R

0

A robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds: you can explicitly disallow :

User-agent: *
Disallow: /~joe/junk.html

please visit below link for details robots.txt

Reproachless answered 30/1, 2017 at 10:57 Comment(0)

Recommended topics

Hot tags