Standards and disclaimer
Sitemap:
in robots.txt is a nonstandard extension according to Wikipedia. Remember that:
Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Wikipedia also lists allow:
as a nonstandard extension.
Multiple sitemaps in robots.txt
You can specify more than one Sitemap file per robots.txt file. When specifying more than one sitemap in robots.txt this is the format:
Sitemap: http://www.example.com/sitemap-host1.xml
Sitemap: http://www.example.com/sitemap-host2.xml
An index of sitemaps
There is also a type of sitemap file that is an index of sitemap files.
If you have a Sitemap index file, you can include the location of just that file. You don't need to list each individual Sitemap listed in the index file.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
<lastmod>
is optional.
About excluding content
The Sitemaps protocol enables you to let search engines know what content you would like indexed. To tell search engines the content you don't want indexed, use a robots.txt file or robots meta tag. See robotstxt.org for more information on how to exclude content from search engines.
If you want search engines not to index anything it should be in the robots.txt file (in the User Page repository) as:
User-agent: *
Disallow: /project_to_disallow/
Disallow: /projectname/page_to_disallow.html
Alternatively you can use the robots tag.
Suggestions
User-agent: *
Disallow: /project_to_disallow/
Disallow: /projectname/page_to_disallow.html
Sitemap: http://www.example.com/sitemap.xml
Sitemap: http://www.example.com/projectA/sitemap.xml
Sitemap: http://www.example.com/projectB/sitemap.xml
or, if you are using a sitemap index file
User-agent: *
Disallow: /project_to_disallow/
Disallow: /projectname/page_to_disallow.html
Sitemap: http://www.example.com/siteindex.xml
where http://www.example.com/siteindex.xml
looks like
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>http://www.example.com/projectA/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>http://www.example.com/projectB/sitemap.xml</loc>
</sitemap>
</sitemapindex>
For info on how set up robots.txt with GitHub Pages see my answer here.
sitemap.txt
file under each project repo. Can I use something likesitemap: https://www.example.com/sitemap.txt; sitemap: https://www.example.com/ProjectA/sitemap.txt; sitemap: https://www.example.com/ProjectB/sitemap.txt
(in three lines of course)? In this way, I don't need to update the top-level repo if there is any robot rule was changed under a project repo. Thank you for replying. – Christine