Multiple Sitemap: entries in robots.txt?
Asked Answered
S

5

54

I have been searching around using Google but I can't find an answer to this question.

A robots.txt file can contain the following line:

Sitemap: http://www.mysite.com/sitemapindex.xml

but is it possible to specify multiple sitemap index files in the robots.txt and have the search engines recognize that and crawl ALL of the sitemaps referenced in each sitemap index file? For example, will this work:

Sitemap: http://www.mysite.com/sitemapindex1.xml

Sitemap: http://www.mysite.com/sitemapindex2.xml

Sitemap: http://www.mysite.com/sitemapindex3.xml
Straka answered 7/4, 2010 at 16:31 Comment(0)
R
-4

It is possible to write them, but it is up to the search engine to know what to do with it. I suspect many search engines will either "keep digesting" more and more tokens, or alternatively, take the last sitemap they find as the real one.

I propose that the question be "if I want ____ search engine to index my site, would I be able to define multiple sitemaps?"

Raleigh answered 7/4, 2010 at 16:40 Comment(7)
Yea, this seems reasonable. I think read in Google FAQ that they do support this.Straka
Google does support that, but if you want to be certain, just manually submit the Sitemap files in Webmaster Tools.Modular
-1 It is in the protocol specs. This answer here is a lame excuse for not reading it and assuming everybody else - especially implementors - would not read it either. The chance of not supporting sitemaps at all in robots.txt is much higher then not supporting according to specs.Susurrate
@Etamar Laron: Can you please review your answer? For me it reads a bit that you say here, most search engines would not support the sitemap standard. Can you please clarfiy a bit and perhaps differentiate?Susurrate
@Susurrate - if you read my answer carefully you'd see that it is very precise, the -1 is your call. Why not next time write your second note, and only then decide?...Raleigh
@EtamarLaron: Do you want to say that the answer isn't correct but it does not deserve a DV either? Just a comment? I'm not so sure if that would be right. Also you didn't respond to the second comment either, I would be lucky if you would have done so, so I could review the DV. There's nothing set in stone.Susurrate
@Susurrate Interestingly, Baidu, which is the biggest search engine in China, don't supported gzipped sitemap. You can not really have too much believe in the others.Trotta
G
114

Yes it is possible to have more than one sitemap-index-file:

You can have more than one Sitemap index file.

Highlight by me.

Yes it is possible to list multiple sitemap-files within robots.txt, see as well in the sitemap.org site:

You can specify more than one Sitemap file per robots.txt file.

Sitemap: http://www.example.com/sitemap-host1.xml

Sitemap: http://www.example.com/sitemap-host2.xml

Highlight by me, this can not be misread I'd say, so simply spoken, this can be done.

This is also necessary for cross-submits, for which btw. the robots.txt has been chosen.

Btw Google, Yahoo and Bing, all are members of sitemaps.org:

Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.

So you can rest assured that your sitemap entries will be properly read by the search engine bots.

Submitting them via webmaster tools can not hurt either - as John Mueller commented.

Gaygaya answered 6/7, 2010 at 9:19 Comment(3)
The Google robots.txt documentation confirms this to be true for Google, and references that it should work for other bots as well: "Multiple sitemap entries may exist. As non-group-member records, these are not tied to any specific user-agents and may be followed by all crawlers, provided it is not disallowed." The Google robots.txt documentation can be found here: developers.google.com/webmasters/control-crawl-index/docs/…Blaine
The question asks if multiple sitemap index entries may exist in robots.txt not if multiple sitemap entries may exist.Indomitability
@NigelAlderton: The specs are likewise clear about that: "You can have more than one Sitemap index file.". If you compare then with the Sitemaps & Cross Submits section, it is not only clear but inherently necessary to allow multiple index files per robots.txt for cross-domain index usage.Susurrate
C
8

If your sitemap is over 10 MB (uncompressed) or has more than 50 000 entries Google requires that you use multiple sitemaps bundled with a Sitemap Index File.

In your robots.txt point to a sitemap index which should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml.gz</loc>
      <lastmod>2012-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml.gz</loc>
      <lastmod>2012-01-01</lastmod>
   </sitemap>
</sitemapindex>
Concinnate answered 27/4, 2012 at 5:14 Comment(5)
Um, not exactly. From sitemaps.org/protocol.php: "Each text file can contain a maximum of 50,000 URLs and must be no larger than 10MB (10,485,760 bytes)."Spicule
Google has since upped the allowed size per sitemap file to 50MB #2887858Queston
Would it be better to sitemap: in robots point to sitemapindex.xml or have multiple sitemap: lines pointing to each one?Hypogenous
@WarrenDodsworth I think this does not matter, but if you have a "sitemapsitemap" file its easier to submit only one file to google / bing / etc instead of each sitemap file by itself if you choose to do so.Margoriemargot
Sitemaps has standardised the 50MB limit: "once uncompressed must be no larger than 50MB" sitemaps.org/protocol.htmlFuneral
E
4

It's recommended to create a sitemap index file, rather separate XML URLs to put in your your robots.txt file.

Then, put the indexed sitemap URL as below in your robots.txt file.

Sitemap: http://www.yoursite.com/sitemap_index.xml

If you want to learn how to create indexed sitemap URL, then follow this guide from sitemap.org

Best Practice:

  • Create image sitemap, video sitemap separately if your website has huge number of such contents.
  • Check spelling of robots file, it should be robots.txt, don't use robot.txt or any misspelling. Put robots.txt file in root directly only.
  • For more info, you can visit robots.txt's official website.
Exophthalmos answered 19/2, 2019 at 18:13 Comment(0)
S
0

You need specify in your in your file sitemap.xml this code:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>http://www.exemple.com/sitemap1.xml.gz</loc>
    </sitemap>
    <sitemap>
        <loc>http://www.exemple.com/sitemap2.xml.gz</loc>
    </sitemap>
</sitemapindex>

source: https://support.google.com/webmasters/answer/75712?hl=fr#

Shoshana answered 11/6, 2020 at 12:3 Comment(0)
R
-4

It is possible to write them, but it is up to the search engine to know what to do with it. I suspect many search engines will either "keep digesting" more and more tokens, or alternatively, take the last sitemap they find as the real one.

I propose that the question be "if I want ____ search engine to index my site, would I be able to define multiple sitemaps?"

Raleigh answered 7/4, 2010 at 16:40 Comment(7)
Yea, this seems reasonable. I think read in Google FAQ that they do support this.Straka
Google does support that, but if you want to be certain, just manually submit the Sitemap files in Webmaster Tools.Modular
-1 It is in the protocol specs. This answer here is a lame excuse for not reading it and assuming everybody else - especially implementors - would not read it either. The chance of not supporting sitemaps at all in robots.txt is much higher then not supporting according to specs.Susurrate
@Etamar Laron: Can you please review your answer? For me it reads a bit that you say here, most search engines would not support the sitemap standard. Can you please clarfiy a bit and perhaps differentiate?Susurrate
@Susurrate - if you read my answer carefully you'd see that it is very precise, the -1 is your call. Why not next time write your second note, and only then decide?...Raleigh
@EtamarLaron: Do you want to say that the answer isn't correct but it does not deserve a DV either? Just a comment? I'm not so sure if that would be right. Also you didn't respond to the second comment either, I would be lucky if you would have done so, so I could review the DV. There's nothing set in stone.Susurrate
@Susurrate Interestingly, Baidu, which is the biggest search engine in China, don't supported gzipped sitemap. You can not really have too much believe in the others.Trotta

© 2022 - 2024 — McMap. All rights reserved.