robots.txt in subdirectory
Asked Answered
D

5

16

I have a project that lies in a folder below the main domain, and I dont have access to the root of the domain itself.

http://mydomain.com/myproject/

I want to disallow indexing on the subfolder "forbidden"

http://mydomain.com/myproject/forbidden/

Can I simply put a robots.txt in the myproject folder? Will it get read even if there is no robots.txt in the root?

What is the correct syntax for disallowing the forbidden folder?

User-agent: *
Disallow: /forbidden/

or

User-agent: *
Disallow: forbidden/
Dialectics answered 29/1, 2011 at 14:16 Comment(0)
R
22

From robotstxt.org:

Where to put it

The short answer: in the top-level directory of your web server.

The longer answer:

When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

So I'm afraid the answer is that you have to put it in the root folder :-(

With regards to your second question, I believe the correct syntax is the one starting with a forward slash (eg. /forbidden/).

Rickard answered 29/1, 2011 at 14:24 Comment(2)
But since the robots.txt will be at the root, he'll want it to read Disallow: /myproject/forbidden/.Mcsweeney
@Jim, I was talking about syntax though, not actual paths, but you are correct.Rickard
P
4

If you don't have the root, you can use the "robots meta tag".

https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag

Precept answered 9/1, 2014 at 20:36 Comment(0)
P
3

You can't unfortunately. Robots.txt can only go at the root of the domain.

Maybe if you ask the owner of the domain kindly he will oblige?

The first syntax is the correct syntax, but remember it needs to be the absolute path from the root of the domain.

Pathos answered 29/1, 2011 at 14:26 Comment(0)
K
1

Actually, I can see requests from a variety of bots on robots.txt in a subfolder, which always result in a 404 error. Just some of these bots:

So, if you want to prevent these from spamming your error log with dumb 404 errors, you redirect these requests to the right place via .htaccess:

RewriteRule .+/robots.txt$ /robots.txt [R=301,L]
Koral answered 7/4, 2016 at 9:52 Comment(0)
C
0

Since this is one of the top results that shows when Googling, I wanted to provide an updated answer and reference Google's own documentation. The robotstxt website linked from the previous answer is out of date and has some incorrect information, although the answer from Klaus remains essentially the same.

In short: no. It must be in the root directory. Here's Google's official statement on the matter.

The longer version is that you should use the robots meta tag to disallow certain pages in subdirectories, as suggested by Stairbob.

Copilot answered 11/10, 2021 at 7:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.