Can a relative sitemap url be used in a robots.txt?
Asked Answered
P

3

223

In robots.txt can I write the following relative URL for the sitemap file?

sitemap: /sitemap.ashx

Or do I have to use the complete (absolute) URL for the sitemap file, like:

sitemap: http://subdomain.domain.com/sitemap.ashx

Why I wonder:

  • I own a new blog service, www.domain.com, that allow users to blog on accountname.domain.com.
  • I use wildcards, so all subdomains (accounts) point to: "blog.domain.com".

In blog.domain.com I put the robots.txt to let search engines find the sitemap. But, due to the wildcards, all user account share the same robots.txt file.Thats why I can't use the second alternative. And for now I can't use url rewrite for txt files. (I guess that later versions of IIS can handle this?)

Paradies answered 7/1, 2013 at 13:16 Comment(0)
D
347

According to the official documentation on sitemaps.org it needs to be a full URL:

You can specify the location of the Sitemap using a robots.txt file. To do this, simply add the following line including the full URL to the sitemap:

Sitemap: http://www.example.com/sitemap.xml
Destined answered 8/1, 2013 at 15:33 Comment(5)
Please note @unor's example has: Sitemap with capital S. This is important as Robots.txt is case sensitive.Papaverine
And on the topic of case, robotstxt.org specifies the file to be named robots.txt without the capital R.Befit
if the site is loading https, Sitemap URL mentioned with http. Is this fine? Or do we have to place the sitemap URL based on the protocol?Abmho
@Shams: The URLs listed in your sitemap have to use the same protocol and the same host as the sitemap file. If your site is available under http and https, you should only provide one sitemap (with the canonical variant).Destined
I'm using Server Side Includes to overcome this limitation. Enable it for robots.txt, then #echo the HTTP_HOST.Boser
G
7

Google crawlers are not smart enough, they can't crawl relative URLs, that's why it's always recommended to use absolute URL's for better crawlability and indexability.

Therefore, you can not use this variation

> sitemap: /sitemap.xml

Recommended syntax is

Sitemap: https://www.yourdomain.com/sitemap.xml

Note:

  • Don't forgot to capitalise the first letter in "sitemap"
  • Don't forgot to put space after "Sitemap:"
Groundsel answered 19/2, 2019 at 17:49 Comment(0)
C
-5

Good technical & logical question my dear friend. No in robots.txt file you can't go with relative URL of the sitemap; you need to go with the complete URL of the sitemap.

It's better to go with "sitemap: https://www.example.com/sitemap_index.xml"

In the above URL after the colon gives space. I also like to support Deepak.

Cheesewood answered 23/8, 2019 at 6:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.