Header not being set with RewriteRule [R=302,L]
Asked Answered
U

2

8

Goal: I'm trying to set two headers via htaccess:

X-Robots-Tag: noindex, nofollow
Location: http://example.com/foo

PoC: In PHP one could do this which works well:

header( "X-Robots-Tag: noindex, nofollow", true );
header( "Location: " .  $url, 302 );

Problem: In my .htaccess file I have this:

# Do not let robots index anything from /out/
RewriteCond %{REQUEST_URI} ^/?out/?
Header set X-Robots-Tag "noindex, nofollow"

...

# Redirect /out/example/ type links
RewriteRule ^/?out/example/(.*)$ "http://example.com/$1" [R=302,L]

I'm sure there is a simple mistake somewhere that I'm not seeing, but if I inspect the headers of, say, http://localhost/out/example/foo, the Location header is set, but the X-Robots-Tagis not.

HTTP/1.1 302 Found
Date: Wed, 08 Jun 2016 23:59:18 GMT
Content-Type: text/html; charset=iso-8859-1
Transfer-Encoding: chunked
Connection: close
Location: http://example.com/foo
...

However, triggering a 404 (e.g. http://localhost/out/404) will set the appropriate header:

HTTP/1.1 404 Not·Found
Date: Wed, 08 Jun 2016 23:56:19 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding,User-Agent
X-Robots-Tag: noindex, nofollow    <--- set
...

Where is the problem?

Underpants answered 9/6, 2016 at 0:36 Comment(3)
I think the trick is to use the always keyword when setting the header - otherwise, Apache should only set it on 2xx responses. So try Header always set ...Embrace
I don't think you need an example - just add always before set in the Header directive. Header always set X-Robots-Tag "noindex, nofollow"Embrace
@MikeRockett That worked. Well done. Write it up for the rep.Underpants
U
5

The solution was to do the following:

# Redirect /out/example/ type links
RewriteRule ^/?out/example/(.*)$ "http://example.com/$1" [R=302,L,E=OUTLINK:1]

# Add the robots header if E was set above
Header always set X-Robots-Tag "noindex, nofollow" env=OUTLINK

Note: This was a challenge because the initial solution was adding the "noindex" header to everything which killed my site. I hope this helps someone in the future.

Underpants answered 20/6, 2016 at 4:31 Comment(1)
Hi @mario ! Yeah, no workaround. On your suggestion I set up a PHP SOCKS5 proxy interface then went scouring the net for proxies that listen on port 80. Only HTTP proxies naturally are available. Good suggestion though. That question probably won't help anyone else as it's too narrow. I'll look into setting up something as you mention on a better server. Thanks for the reach out.Underpants
H
2

Apache will only set headers for success/2xx response codes. In order to to have the header set for any other status code, you need to use the always keyword:

Header always set X-Robots-Tag "noindex, nofollow"

More Information:

When your action is a function of an existing header, you may need to specify a condition of always, depending on which internal table the original header was set in. The table that corresponds to always is used for locally generated error responses as well as successful responses. Note also that repeating this directive with both conditions makes sense in some scenarios because always is not a superset of onsuccess with respect to existing headers:

  • You're adding a header to a locally generated non-success (non-2xx) response, such as a redirect, in which case only the table corresponding to always is used in the ultimate response.
  • You're modifying or removing a header generated by a CGI script, in which case the CGI scripts are in the table corresponding to always and not in the default table.
  • You're modifying or removing a header generated by some piece of the server but that header is not being found by the default onsuccess condition.
Hydrolyze answered 9/6, 2016 at 5:7 Comment(5)
It's more than likely because that response would return content, and not a redirect. Though I see it isn't discussed in the docs...Embrace
Sorry to say this, but an example should have been supplied. This answer will add a "noindex" robot header to every page request regardless of following a RewriteCond. Bummer.Underpants
@Drakes, right, so I simply missed out on the link between mod_rewrite and mod_headers and only pointed out that these only work with 2xx response codes. My bad. At least, though, it pointed you in the right direction.Embrace
I was so excited it worked on the redirects that I didn't check the other links. My site got de-indexed. I can laugh now though (crying laugh). Thanks for the step in the right direction. :)Underpants
Sorry to hear that -- we all have something that goes boom every now and then. Hope all gets back to normal.Embrace

© 2022 - 2024 — McMap. All rights reserved.