Correct nginx configuration to prevent indexing of some folders - McMap

About

Correct nginx configuration to prevent indexing of some folders

Asked 29/3, 2017 at 13:25 Answered 29/3, 2017 at 20:38

Solved nginx googlebot

S

1

8

I'm using the following Nginx configuration to prevent the indexing of content in some of my folders when I use the x-robots tag

location ~ .*/(?:archive|filter|topic)/.* {
    add_header X-Robots-Tag "noindex, follow";      
}

The content remains indexed but I can't debug the Nginx configuration.

My questions: is the configuration I use correct and if I should wait till googlebot re-crawls content and de-indexes the content? Or is my configuration wrong?

Southeast answered 29/3, 2017 at 13:25 Comment(8)

Have you tried plain old robots.txt? – Nonsuch 29/3, 2017 at 14:10

Anyway, I guess, you have some other rules in your config. I'm pretty sure this location is not used. Show full config – Nonsuch 29/3, 2017 at 14:12

@AlexeyTen robots.txt doesn't prevent indexing. The question is only: is the syntax of configuration correct: yes/no? – Southeast 29/3, 2017 at 14:12

robots.txt is exactly the tool to prevent indexing. support.google.com/webmasters/answer/6062608?hl=en A robots.txt file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers – Nonsuch 29/3, 2017 at 14:13

@AlexeyTen dude, if you don't know the difference between indexing and crawling, be quite, please, and rtfm. – Southeast 29/3, 2017 at 14:14

Syntax of this snippet is correct, but it does not mean it works – Nonsuch 29/3, 2017 at 14:15

So tell me difference… – Nonsuch 29/3, 2017 at 14:15

wow, i'll archive this one - what a kind of funny things tell guys from yandex – Southeast 29/3, 2017 at 14:17

A

10

The configuration you've written is correct. I'd give one caveat (assuming your config is otherwise standard):

It will only output the X-Robots-Tag when the result code is 200, 201, 204, 206, 301, 302, 303, 304, or 307 (e.g. content matches a disk file, a redirect is issued, etc.). So if you have an /archive/index.html, a hit to http://yoursite.com/archive/ will give the header. If the index.html does not exist (404), you won't see the tag.

The always parameter will output the header for all response codes, assuming the location block is processed:

location ~ .*/(?:archive|filter|topic)/.* {
    add_header X-Robots-Tag "noindex, follow" always;      
}

Another option will guarantee the header is output on a URI match. This is useful for when there's a chance that a location block may not get processed (due to short-circuiting, like with return or a last on a rewrite etc):

http {
    ...
    map $request_uri $robot_header {
        default "";
        ~.*/(?:archive|filter|topic)/.* "noindex, follow";
    }

    server {
        ...
        add_header X-Robots-Tag $robot_header;
        ...
    }

Antinomy answered 29/3, 2017 at 20:38 Comment(1)

Interesting point with index.html. How should configuration look alike, if http://yoursite.com/archive/ is a kind of symlink, or if this url is built by rewrite rule? – Southeast 30/3, 2017 at 7:46

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2025 — McMap. All rights reserved.