Can I prevent search engines from indexing an entire directory on my website?
Asked Answered
A

6

16

I have a staging site which I use to draft new features, changes and content to my actual website.

I don't want this to get indexed, but I'm hoping for a solution a little easier than having to add the below to every page on my site:

<meta name="robots" content="noindex, nofollow">

Can I do this in a way similar to how I added a password to the domain using a .htaccess file?

Alla answered 29/1, 2012 at 9:7 Comment(0)
C
30

The robots.txt standard is meant for this. Example

User-agent: *
Disallow: /protected-directory/

Search engines will obey this, but of course the content will still be published (and probably more easily discoverable if you put the URL in the robots.txt), so password protection via .htaccess is an option, too.

Clancy answered 29/1, 2012 at 9:11 Comment(1)
I went with password protection.Duffy
A
7

Indeed, robots.txt at the site root is the way to go. To add multiple entries (as the OP suggests), do as follows:

User-agent: *
Disallow: /test_directory_aaa/
Disallow: /test_directory_bbb/
Disallow: /test_directory_ccc/

Or, to take the .htpasswd route:

In .htaccess, add:

AuthType Basic
AuthName "Marty's test directory"
AuthUserFile /test_directory_aaa/.htpasswd
AuthUserFile /test_directory_bbb/.htpasswd
AuthUserFile /test_directory_ccc/.htpasswd
require valid-user

In .htpasswd, add:

username1:s0M3md5H4sh1
username2:s0M3md5H4sh2
username3:s0M3md5H4sh3
Airlike answered 25/6, 2014 at 6:58 Comment(2)
Does this result in a username and password prompt? I've never heard of .htpasswd before.Undine
Here are the Apache docs for .htpasswd : httpd.apache.org/docs/current/programs/htpasswd.html. The Wikipedia article also contains a similar example as the one given above : en.wikipedia.org/wiki/.htpasswd.Airlike
K
6

What you want is a robots.txt file

The file should be in your server root and the content should be something like;

User-agent: *
Disallow: /mybetasite/

This will politely ask search indexing services not to index the pages under that directory, which all well behaved search engines will respect.

Karnes answered 29/1, 2012 at 9:10 Comment(0)
V
3

Put following code in robot.txt which should be in root directory to refuse your entire site from indexing.

User-agent: *
Disallow: /
Venule answered 30/3, 2012 at 10:23 Comment(0)
I
2

Create a file called Robots.txt in your public_html directory.

Put the following code in it:

    User-agent: * 
    Disallow: /foldername/

foldername is the name of the directory you wish to block

Illyes answered 29/1, 2012 at 9:13 Comment(0)
D
0

Block Specific File for SEO: To specify matching the end of a URL, use $. For instance, to block any URLs that end with .xls:

User-agent: * Disallow: /*.xls$

Ref: http://antezeta.com/news/avoid-search-engine-indexing

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449&topic=1724262&ctx=topic

Doolittle answered 22/2, 2013 at 6:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.