Listing both sitemaps and sitemap index files in robots.txt?
Asked Answered
A

2

13

My site is comprised of 3 main sections: Reviews, Forum, and Blog. I have plugins for the forum and blog that automatically generate sitemaps for them. The forum plugin generates a sitemap INDEX file pointing to multiple indexes, and the blog plugin generates a regular sitemap file containing all my blog content. Here are their entries from robots.txt:

Sitemap: http://www.datesphere.com/forum/sitemap-index.xml
Sitemap: http://www.datesphere.com/blog/sitemap.xml

I just created a Reviews sitemap.xml file that contains all the content in the Reviews section. I was planning to just add a line to robots.txt so the whole thing would look like this:

Sitemap: http://www.datesphere.com/forum/sitemap-index.xml
Sitemap: http://www.datesphere.com/blog/sitemap.xml
Sitemap: http://www.datesphere.com/reviews-sitemap.xml

HERE'S MY QUESTION: I know you can list multiple sitemaps in robots.txt, but is it OK to have a sitemap index file as well as multiple sitemaps listed? Will Googlebot ignore the other sitemap files if it finds a sitemap-index.xml file in robots.txt? If so, do I have to put my blog and reviews sitemaps in another sitemap index file and just list that in robots.txt?

I've checked around but can only find answers to the question "can I list multiple sitemaps?"

Atwood answered 15/6, 2011 at 19:9 Comment(1)
dude I thinks you should accept the 'official' answer you got :)Asymptotic
S
23

Googlebot will not ignore any of the Sitemaps you list in robots.txt even if you list their parent Sitemap Index, too. We follow pretty much every link we find and if we're allowed to, we'll crawl them. Personally, I'd probably list only the Sitemap Indexes, though only for manageability's sake, but it's up to you, Googlebot won't mind if you list both the indexes and the Sitemaps.

Siddon answered 10/7, 2011 at 9:59 Comment(0)
L
2

When you have multiple sitemaps, you can either specify your sitemap index file URL in your robots.txt file as shown in the example below:

// robots.txt
Sitemap: http://www.example.com/sitemap_index.xml 
User-agent:* 
Disallow: /some/disallowed/path

Or, you can specify individual URLs of your multiple sitemap files, as shown in the example below:

// robots.txt
Sitemap: http://www.example.com/sitemap_host1.xml 
Sitemap: http://www.example.com/sitemap_host2.xml 
User-agent:* 
Disallow: /some/disallowed/path

Finally, this is what you need to pay attention to when adding the Sitemap directive to the robots.txt file.

Leavings answered 17/1, 2017 at 22:59 Comment(2)
And that one thing is??Inebriety
@Sebastian, please remove that User-agent:* Disallow from your examples, otherwise whoever copies and pastes your code will tell all robots not to index your site.Whitehead

© 2022 - 2024 — McMap. All rights reserved.