Robots.txt for multiple domains
Asked Answered
Q

3

23

We have different domains for each language

  1. www.abc.com
  2. www.abc.se
  3. www.abc.de

And then we have different sitemap.xml for each site. In robots.txt, I want to add sitemap reference for each domain.

  1. Is it possible to have multiple sitemap references for each domain in single robots.txt?
  2. If there are multiple, which one does it pick?
Qnp answered 7/7, 2012 at 8:2 Comment(0)
C
8

The robots.txt can only inform the search engines of sitemaps for its own domain. So that one will be the only one it honors when it crawls that domain's robots.txt. If all three domains map to the same website and share a robots.txt then the search engines will effectively find each sitemap.

Custombuilt answered 7/7, 2012 at 15:41 Comment(2)
These are three different websites hosted together. Of course, the content is different and also the sitemap.xml file they have is different.Qnp
This is the only answer of it's kind and it has only one upvote.. the rest of the internet seems to rewrite the robots.txt to the domain specific one using htaccess, but this makes much more senseDepreciable
I
39

I'm using the following solution in .htaccess after all domain redirects and www to non-www redirection.

# Rewrite URL for robots.txt
RewriteRule ^robots\.txt$ robots/%{HTTP_HOST}.txt [L]

Create a new directory in your root called robots. Create a text file filled with the specific robots information for every domain.

  • /robots/abc.com.txt
  • /robots/abc.se.txt
  • /robots/abc.de.txt
Immensity answered 7/6, 2013 at 10:50 Comment(2)
Isn't it required that the robots.txt is in the root of each domain?Blacken
@Blacken according to robotstxt.org/orig.html : >start-quote< This file must be accessible via HTTP on the local URL "/robots.txt". >end-quote< Using the method above the file is accessible in the root, although it's not stored in the root.Immensity
C
8

The robots.txt can only inform the search engines of sitemaps for its own domain. So that one will be the only one it honors when it crawls that domain's robots.txt. If all three domains map to the same website and share a robots.txt then the search engines will effectively find each sitemap.

Custombuilt answered 7/7, 2012 at 15:41 Comment(2)
These are three different websites hosted together. Of course, the content is different and also the sitemap.xml file they have is different.Qnp
This is the only answer of it's kind and it has only one upvote.. the rest of the internet seems to rewrite the robots.txt to the domain specific one using htaccess, but this makes much more senseDepreciable
D
5

Based on Hans2103's answer, I wrote this one that should be safe to be included in just about every web project:

# URL Rewrite solution for robots.txt for multidomains on single docroot
RewriteCond %{REQUEST_FILENAME} !-d # not an existing dir
RewriteCond %{REQUEST_FILENAME} !-f # not an existing file
RewriteCond robots/%{HTTP_HOST}.txt -f # and the specific robots file exists
RewriteRule ^robots\.txt$ robots/%{HTTP_HOST}.txt [L]

This rewrite condition should just serve the normal robots.txt if it's present and only look for a robots/ directory with the specified file robots/<domain.tld>.txt.

Dye answered 1/11, 2018 at 11:37 Comment(2)
Isn't it required that the robots.txt is in the root of each domain?Blacken
This way, every request to /robots.txt gets answered by sending the contents of the specified domain (if you've set up the directory and files correctly, ofcourse)Dye

© 2022 - 2024 — McMap. All rights reserved.