The configuration you've written is correct. I'd give one caveat (assuming your config is otherwise standard):
It will only output the X-Robots-Tag when the result code is 200, 201, 204, 206, 301, 302, 303, 304, or 307 (e.g. content matches a disk file, a redirect is issued, etc.). So if you have an /archive/index.html
, a hit to http://yoursite.com/archive/
will give the header. If the index.html
does not exist (404), you won't see the tag.
The always
parameter will output the header for all response codes, assuming the location block is processed:
location ~ .*/(?:archive|filter|topic)/.* {
add_header X-Robots-Tag "noindex, follow" always;
}
Another option will guarantee the header is output on a URI match. This is useful for when there's a chance that a location block may not get processed (due to short-circuiting, like with return or a last
on a rewrite etc):
http {
...
map $request_uri $robot_header {
default "";
~.*/(?:archive|filter|topic)/.* "noindex, follow";
}
server {
...
add_header X-Robots-Tag $robot_header;
...
}
robots.txt
? – Nonsuchrobots.txt
is exactly the tool to prevent indexing. support.google.com/webmasters/answer/6062608?hl=en A robots.txt file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers – Nonsuch