I reverse engineered Google Bots behavior for my own studies, so here are my own insights on how they currently work. Might not be accurate, studies were made in 2015, but I doubt it changed that much.
Google Bots will hit your /
and will then follow the 301, 302, etc. redirections. Semantically 301, 302 and cons. are different, but I bet that Google does not really care for the most common types, due to the wide range of administrative/programming errors/lazyness that can be encountered on the world wild web..
They will follow up to a maximum of n
redirections, with n being 5
if I remember correctly, until they hit a 200
or abandon.
After some time Google bots will come back on your page, at first a little bit more than usual (a few times in a couple of hours), then very slowly (once every few hours). They probably try to analyze how dynamic your content is. Note that they will accurately reference the redirected URLs of your website content, even after multiple redirections (I verified my links in the search engine).
By analyzing Google download agents (API, Google Docs...), I am pretty sure that Google uses libcurl
for most of their active requests and did not implement a black magic-based solution. Libcurl natively implements redirection for all of the 3xx
message suite.
For SEO optimizations consider using sitemap.xml which I know they rely on.
If you are really paranoid, then feed GoogleBots with the content you want to:
Implement an index.php on /
Detect user-agent, if not a Google Bot -> redirects to /en
If a Google Bot: deliver the content you want
Google Bots user agents are officially documented here.
If you are not trusting the User Agent enough, try performing a reverse DNS resolution of sollicitor, proper handling of the results is also documented by Google.
You can reverse DNS the sollicitor with gethostbyaddr()
but it might slow down the loading process, or you can trust a crawlers IP database. I would recommend none of these, the User Agent control should be ok.