What's the difference between URI.escape and CGI.escape?

Asked 13/5, 2010 at 2:32 Answered 9/1, 2020 at 16:16

175

What's the difference between URI.escape and CGI.escape and which one should I use?

Siskind answered 13/5, 2010 at 2:32 Comment(0)

135

There were some small differences, but the important point is that URI.escape has been deprecated in Ruby 1.9.2... so use CGI::escape or ERB::Util.url_encode.

There is a long discussion on ruby-core for those interested which also mentions WEBrick::HTTPUtils.escape and WEBrick::HTTPUtils.escape_form.

Big answered 14/5, 2010 at 5:27 Comment(8)

Just to add to confusion - I just saw a comment on #4968108 where someone mentioned that cgi escape uses '+' instead of %20 for spaces, and that it's against the 'spec'... – Stockstill 19/7, 2012 at 11:24

an alternative is using ERB::Util.url_encode that properly uses %20 for spaces – Visibility 16/10, 2012 at 8:40

AFAIK, URI.escape is not deprecated in 1.9.2 nor 1.9.3. Did I miss something? – Geniculate 23/10, 2012 at 16:56

@Ernest: See: github.com/ruby/ruby/commit/… (answer updated) – Marnimarnia 24/10, 2012 at 4:25

What's the safe replacement of URI.escape in rails 3.2 or above code? We noticed that the CGI::escape also encode '/' as '%2F' which URI.escape did not do. – Gwendolyngweneth 4/11, 2013 at 18:4

ruby-doc.org/stdlib-2.0.0/libdoc/uri/rdoc/URI/Escape.html. There is URI.escape module in ruby 2.0.0. Why was it deprecated? – Gwendolyngweneth 4/11, 2013 at 19:34

@Gwendolyngweneth if you click the show source on there you'll see it's still marked as deprecated. – Aida 11/2, 2014 at 0:22

I was using URI.escape to do things like pass an encoded String from an .erb to JavaScript like so: MyScriptModule.my_function('<%= URI.escape(my_string) %>') and I had to convert to CGI.escape due to the deprecation. But, as you guys have noted, if you look at the code in CGI::Escape it very explicitly turns spaces into + so in my Rails extensions initializer I overrode escape to take out the +'s. I really hope they will add some options to that method like oh I don't know, don't convert spaces. – Allianora 13/10, 2020 at 18:11

285

What's the difference between an axe and a sword and which one I should use? Well it depends on what you need to do.

URI.escape was supposed to encode a string (URL) into, so called, "Percent-encoding".

CGI::escape is coming from the CGI spec, which describes how data should be encoded/decode between web server and application.

Now, let's say that you need to escape a URI in your app. It is a more specific use case. For that, the Ruby community used URI.escape for years. The problem with URI.escape was that it could not handle the RFC-3896 spec.

URI.escape 'http://google.com/foo?bar=at#anchor&title=My Blog & Your Blog' 
# => "http://google.com/foo?bar=at%23anchor&title=My%20Blog%20&%20Your%20Blog"

URI.escape was marked as obsolete:

Moreover current URI.encode is simple gsub. But I think it should split a URI to components, then escape each components, and finally join them.

So current URI.encode is considered harmful and deprecated. This will be removed or change behavior drastically.

What is the replacement at this time?

As I said above, current URI.encode is wrong on spec level. So we won't provide the exact replacement. The replacement will vary by its use case.

https://bugs.ruby-lang.org/issues/4167

Unfortunately there is not a single word about it in the docs, the only way to know about it is to check the source, or run the script with warnings in verbose level (-wW2) (or use some google-fu).

Some proposed to use CGI::Escape for query parameters, because you couldn't escape an entire URI:

CGI::escape 'http://google.com/foo?bar=at#anchor&title=My Blog & Your Blog'
# => "http%3A%2F%2Fgoogle.com%2Ffoo%3Fbar%3Dat%23anchor%26title%3DMy+Blog+%26+Your+Blog"

CGI::escape should be used for query parameters only, but the results will be, again, against the spec. Actually the most common use-case is escaping form data, such as while sending an application/x-www-form-urlencoded POST request.

Also mentioned WEBrick::HTTPUtils.escape is not much of improvement (again it's just a simple gsub, which is, IMO, even a worse option than URI.escape):

WEBrick::HTTPUtils.escape 'http://google.com/foo?bar=at#anchor&title=My Blog & Your Blog'
# => "http://google.com/foo?bar=at%23anchor&title=My%20Blog%20&%20Your%20Blog"

The closest to the spec seems to be the Addressable gem:

require 'addressable/uri'
Addressable::URI.escape 'http://google.com/foo?bar=at#anchor&title=My Blog & Your Blog'
# => "http://google.com/foo?bar=at#anchor&title=My%20Blog%20&%20Your%20Blog"

Notice, that unlike all previous options, Addressable doesn't escape #, and this is the expected behaviour. you want to keep the # hash in the URI path but not in the URI query.

The only problem left is that we didn't escape our query parameters properly, which brings us to the conclusion: we should not use a single method for the entire URI, because there is no perfect solution (so far). As you see & was not escaped from "My Blog & Your Blog". We need to use a different form of escaping for query params, where users can put different characters that have a special meaning in URLs. Enter URL encode. URL encode should be used for every "suspicious" query value, similar to what ERB::Util.url_encode does:

ERB::Util.url_encode "My Blod & Your Blog"
# => "My%20Blod%20%26%20Your%20Blog""

It's cool but we've already required Addressable:

uri = Addressable::URI.parse("http://www.go.com/foo")
# => #<Addressable::URI:0x186feb0 URI:http://www.go.com/foo>
uri.query_values = {title: "My Blog & Your Blog"}
uri.normalize.to_s
# => "http://www.go.com/foo?title=My%20Blog%20%26%20Your%20Blog"

Conclusion:

Do not use URI.escape or similar
Use CGI::escape if you only need form escape
If you need to work with URIs, use Addressable, it offers URL encoding, form encoding and normalizes URLs.
If it is a Rails project, check out "How do I URL-escape a string in Rails?"

Geniculate answered 24/10, 2012 at 23:57 Comment(6)

Thanks a lot for the info. It sure got rid of some hoe testing warnings. A rake and a hoe look out below. – Labionasal 24/9, 2014 at 7:18

Great explanation @Ernest, but the problem with this is that it wont work for external URLs that I am not trying to create (and have no control over). e.g. crawlers that reads URLs from a web page, and then tries to access those URLs (which need to be encoded before access). – Adkison 9/11, 2014 at 23:23

@Adkison if you can afford having Addressable as one of your gems, you could parse URL first, f.i. rubydoc.info/gems/addressable/Addressable/URI.heuristic_parse – Geniculate 10/11, 2014 at 15:50

Interesting! But again, I cannot get a hash of parameters from the original url using this, which I then encode as you describe. The flow in my case is: I get external urls from some feed -> which I then need to encode -> Pass to http client to fetch content. Now if I don't encode the external urls properly, the ruby based HTTP clients fail with invalid URI errors. – Adkison 10/11, 2014 at 15:58

@Adkison parse method will return instance of Addressable:URL, you can then call all of instance methods on it, maybe one of them will get you desired results: rubydoc.info/gems/addressable/Addressable/URI – Geniculate 11/11, 2014 at 17:48

what is the use case for not escaping # ? when would you get input with a # that requires escaping but not escaping of the # why is %20 preferred over + – Grafton 10/3, 2021 at 23:11

135

There were some small differences, but the important point is that URI.escape has been deprecated in Ruby 1.9.2... so use CGI::escape or ERB::Util.url_encode.

There is a long discussion on ruby-core for those interested which also mentions WEBrick::HTTPUtils.escape and WEBrick::HTTPUtils.escape_form.

Big answered 14/5, 2010 at 5:27 Comment(8)

an alternative is using ERB::Util.url_encode that properly uses %20 for spaces – Visibility 16/10, 2012 at 8:40

AFAIK, URI.escape is not deprecated in 1.9.2 nor 1.9.3. Did I miss something? – Geniculate 23/10, 2012 at 16:56

@Ernest: See: github.com/ruby/ruby/commit/… (answer updated) – Marnimarnia 24/10, 2012 at 4:25

What's the safe replacement of URI.escape in rails 3.2 or above code? We noticed that the CGI::escape also encode '/' as '%2F' which URI.escape did not do. – Gwendolyngweneth 4/11, 2013 at 18:4

ruby-doc.org/stdlib-2.0.0/libdoc/uri/rdoc/URI/Escape.html. There is URI.escape module in ruby 2.0.0. Why was it deprecated? – Gwendolyngweneth 4/11, 2013 at 19:34

@Gwendolyngweneth if you click the show source on there you'll see it's still marked as deprecated. – Aida 11/2, 2014 at 0:22

URI.escape takes a second parameter that lets you mark what's unsafe. See APIDock:

http://apidock.com/ruby/CGI/escape/class

http://apidock.com/ruby/URI/Escape/escape

Primula answered 13/5, 2010 at 2:49 Comment(1)

Great answer @Robert Speicher. – Idempotent 30/3, 2013 at 8:3

CGI::escape is good for escaping text segment so they can be used in url query parameters (strings after '?'). For example if you want to have parameter containing slash characters in the url, you CGI::escape that string first and then insert it in the url.

However in Rails you probably won't be using it directly. Usually you use hash.to_param, which will use CGI::escape under the hood.

URI::escape is good for escaping a url which was not escaped properly. For example some websites output wrong/unescaped url in their anchor tag. If your program use these urls to fetch more resources, OpenURI will complain that the urls are invalid. You need to URI::escape these to make it a valid url. So it is used to escape the whole URI string to make it proper. In my word URI::unescape makes a url readable by human, and URI::escape makes it valid to browsers.

These are my layman's term and feel free to correct those.

Pskov answered 6/1, 2012 at 10:6 Comment(1)

Upvoted because of Hash#to_param but URI::escape is not recommended nowadays for anything. Also FYI to any lurkers, alternative is ERB::Util.url_encode for RFC 3986 encoding (will replace space with %20). While CGI::escape would use +. – Perri 14/2 at 17:49

The difference is that URI.escape is not working...

CGI.escape"/en/test?asd=qwe"
=> "%2Fen%2Ftest%3Fasd%3Dqwe"

URI.escape"/en/test?asd=qwe"
=> "/en/test?asd=qwe"

Buke answered 2/11, 2017 at 9:18 Comment(3)

You chose the wrong test case.. The /'s, ?'s and ='s are all part of a valid URI and thus not escaped. Other characters that need to be escaped especially in the query string should be. – Commercial 19/1, 2019 at 3:59

@GerardONeill I chose the test case precisely to show how URI.escape is not working and unreliable. Are you suggesting that URI.escape is escaping only the query string? how could it tell when is a parameter value finished if I want to encode a & in there? maybe that is why it is obsolete? – Buke 21/1, 2019 at 17:53

That is exactly what I'm saying. The URI escape has to parse the URL, separate out what it thinks are the individual parameters, escape them, and put them back together. Even that can be messy. But it doesn't do that -- it just avoids escaping some characters while escaping the rest, which makes it incomplete. It can be used for simple cases especially if you know that your parameters won't be.. confusing. – Commercial 21/1, 2019 at 22:3

CGI.escape is for escaping a URL value in the query string. All characters that don't fall into the ALPHA, DIGIT, '_', '-', '.' and ' ' character set are escaped.

But that would make a URL incorrect, since a url needs to have '/', ':', '?', '[', '&', '=', and ';'. Perhaps more that I can't think of off the top of my head.

URI.escape leaves those URL characters alone, and tries to find the query string keys and values to escape. However this really can't be depended on since values can have all kinds of characters preventing an easy escape. Basically, its too late. But if the URL can be depended on to be simple (no '&'s and '='s etc in the values), this function might be used to escape perhaps unreadable or illegal characters.

In general -- always use CGI.escape on the individual keys and values prior to joining them with '&' and adding them after the '?'.

Commercial answered 19/1, 2019 at 4:8 Comment(0)

CGI.escape didn't work with the OpenProject API. It encoded the [],: and not the +. I hacked this together which seems to work so far for OpenProject's API. But I'm sure it's missing some .gsub's. It's likely almost as bad as URI.escape, but it won't give you the obsolete errors.

class XXX
      def self.encode(path)
        path, query = path.split("?", 2)
        return path if query.nil?
        query = CGI.escape(query).gsub("%3A", ":").gsub("%3D","=").gsub("%5B","[").gsub("%5D","]").gsub("%2C",",").gsub("+","%20")
        return [path,query].join("?")
      end
end

XXX.encode("http://test.com/some/path?query=[box: \"cart\"]")
URI.encode("http://test.com/some/path?query=[box: \"cart\"]")

Both output:

=> "http://test.com/some/path?query=[box:%20%22cart%22]"
=> "http://test.com/some/path?query=[box:%20%22cart%22]"

Maxim answered 9/1, 2020 at 16:16 Comment(2)

but + does not need to be encoded in the URL it is a valid character for parameters – Hyaloplasm 30/7, 2021 at 16:21

Perhaps you could have used ERB::Util.url_encode instead? – Perri 14/2 at 17:42

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags