order of directives in robots.txt, do they overwrite each other or complement each other?
Asked Answered
B

1

6
User-agent: Googlebot
Disallow: /privatedir/

User-agent: *
Disallow: /

Now, what are disallowed for Googlebot: /privatedir/, or the whole website / ?

Breath answered 25/7, 2017 at 3:30 Comment(0)
I
9

According to the original robots.txt specification:

  1. A bot must follow the first record that matches its user-agent name.

  2. If such a record doesn’t exist, it must follow the record with User-agent: * (this line may not appear in more than one record).

  3. If such a record doesn’t exist, it doesn’t have to follow any record.

So a bot never follows more than one record.


For your example this means:

  • A bot that matches the name "Googlebot" is not allowed to crawl URLs with a path that starts with /privatedir/.
  • A bot that doesn’t match the name "Googlebot" is not allowed to crawl any URL.
Insalubrious answered 26/7, 2017 at 22:59 Comment(1)
Excellent answer! Much clearer than the original robots.txt specification. Thanks!Breath

© 2022 - 2024 — McMap. All rights reserved.