External links URL encoding leads to '%3F' and '%3D' on Nginx server
Asked Answered
H

3

13

I got a problem with my server. I got four inbound links to different sites of my dynamic webpage which look something like this:

myurl.com/default/Site%3Fid%3D13

They should look like this:

myurl.com/default/Site?id=13

I do know that those %3F is an escape sequence for the ? sign and the %3D is an escape sequence for the equal sign. But I do get an error 400 when I use those links. What can I do about that?

The four links are for different sites, and I imagine over time there will be more links like that. So one fix for all would be perfect.

Houck answered 9/12, 2013 at 12:3 Comment(1)
@John just in case you forgot and didn't notice, the bounty you set is going to expire within few hours.Eer
T
8

An exact same question was actually asked on nginx-ru mailing list about a year ago:

http://mailman.nginx.org/pipermail/nginx-ru/2013-February/050200.html

The most helpful response, by an Nginx, Inc, employee/developer, Валентин Бартенев:

http://mailman.nginx.org/pipermail/nginx-ru/2013-February/050209.html

Если запрос приходит в таком виде, то это уже не параметры, а имя запрошенного файла. Другое дело, что location ищется по уже раскодированному адресу, о чем в документации написано.

Translation:

If the request comes in such a form, then these are no longer the args, but the name of the requested file. Another thing is that, as documented, the location matching is performed against a normalised URI.

His suggested solution, translated to the sample example from the question here at SO, would then be:

location /default/Site? {
    rewrite \?(.*)$ /default/Site?$1? last;
}

location = /default/Site {
    [...]
}
Transverse answered 9/1, 2014 at 5:34 Comment(6)
@John and user3082653 -- Is there anything about my answer that is unclear or requires further clarification?Transverse
can you please give a more generic example?Geostatics
@Geostatics what's non-generic about the existing example? I don't really see what further info would be required; feel free to ask a new (generic) question and ping me back here.Transverse
@Geostatics the other answer below already works for any URLs encoded in this format — you can't get more generic than that!Transverse
Should I use /default/Site also no my nginx configuration?Geostatics
@Derzu, no, you should not.Transverse
T
1

The following sample would redirect all wrongly-looking requests (defined as having ? in the requested filename — encoded as %3F in the request) into less wrongly-looking ones, regardless of URL.

(Please note that, as rightly advised elsewhere, you should not be getting these wrongly-formed links in the first place, so, use it as a last resort — only when you cannot correct the wrongly formed links otherwise, and you do know that such requests are attempted by valid agents.)

server {
    listen      [::]:80;
    server_name localhost;

    rewrite     ^/([^?]*)\?(.*)$    /$1?$2?     permanent;
    location / {
        return  200 "id is $arg_id\n";
    }
}

This is example of how it would work — when a wrongly looking request is encountered, a correction attempt is made with a 301 Moved Permanently response with a supposedly correct Location response header, which would make the browser automatically re-issue the request to the newly provided location:

opti# curl -6v "http://localhost/default/Site%3Fid%3D13"
* About to connect() to localhost port 80 (#0)
*   Trying ::1...
* connected
* Connected to localhost (::1) port 80 (#0)
> GET /default/Site%3Fid%3D13 HTTP/1.1
> User-Agent: curl/7.26.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.4.1
< Date: Wed, 15 Jan 2014 17:09:25 GMT
< Content-Type: text/html
< Content-Length: 184
< Location: http://localhost/default/Site?id=13
< Connection: keep-alive
<
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.4.1</center>
</body>
</html>
* Connection #0 to host localhost left intact
* Closing connection #0

Note that no correction attempts are made on proper-looking requests:

opti# curl -6v "http://localhost/default/Site?id=13"
* About to connect() to localhost port 80 (#0)
*   Trying ::1...
* connected
* Connected to localhost (::1) port 80 (#0)
> GET /default/Site?id=13 HTTP/1.1
> User-Agent: curl/7.26.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.4.1
< Date: Wed, 15 Jan 2014 17:09:30 GMT
< Content-Type: application/octet-stream
< Content-Length: 9
< Connection: keep-alive
<
id is 13
* Connection #0 to host localhost left intact
* Closing connection #0
Transverse answered 15/1, 2014 at 17:17 Comment(0)
P
0

The URL is perfectly valid. The escaped characters it contains are just that, escaped. Which is perfectly fine.

The purpose is that you can actually have a request name (in most cases corresponding to the filename on the disk) that is Site?id=13 and not Site and the rest as the query string.

I would consider it bad practice to have characters in a filename that makes this necessary. However, in URL arguments it may very well be necessary.

Nevertheless, the request URL is valid, and probably not what you want it to be. Which consequently suggest that you should correct the error wherever anybody has picked up the wrong URL in the first place.

I do not really understand why you get an error 400; you should rather get an error 404. But that depends on your setup.

There are also cases, especially with nginx, that mostly involve passing on whole URLs and URL parts along multiple levels (for example reverse proxies, matching regular expressions from the URL and using them as variables, etc.) where such an error may occur. But to verify this and fix it we would need to know more about your setup.

Photodynamics answered 15/1, 2014 at 16:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.