Regular expression "empty range in char class error"
Asked Answered
S

1

14

I got a regex in my code, which is to match pattern of url and threw error:

/^(http|https):\/\/([\w-]+\.)+[\w-]+([\w- .\/?%&=]*)?$/

The error was "empty range in char class error". I found the cause of that is in ([\w- .\/?%&=]*)? part. Ruby seems to recognize - in \w- . as an operator for range instead of a literal -. After adding escape to the dash, the problem was solved.

But the original regular expression ran well on my co-workers' machines. We use the same version of osx, rails and ruby: Ruby version is ruby 1.9.3p194, rails is 3.1.6 and osx is 10.7.5. And after we deployed code to our Heroku server, everything worked fine too. Why did only my environment have error regarding this regex? What is the mechanism of Ruby regex interpreting?

Shaynashayne answered 31/10, 2012 at 15:54 Comment(4)
I don't know why it worked on one machine and not on another, but hyphens in character classes should always be either escaped or at the beginning or end of the character class. Otherwise the engine might decide to make it a range. Hyphens are also allowed directly after other ranges (like [A-Z-_]) but this is rather discouraged, too, I'd say.Flem
What version of Ruby? Is it an earlier version with the optional regex support compiled in? Without provided any details regarding at least versioning, possibly OS, etc. it's impossible to help.Lashley
Thank you guys for your help. To Dave: ruby version is ruby 1.9.3p194, rails is 3.1.6 and osx is 10.7.5. I'm not sure if my ruby comes with other optional regex support. Can you share your thoughts please?Shaynashayne
It's standard regex practice to place the dash at the end of the character class.Housewares
J
18

I can replicate this error on Ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux], installed on Ubuntu 12.04.1 LTS using rvm 1.13.4. However, this should not be a version-specific error. In fact, I'm surprised it worked on the other machines at all.

A a simpler demonstration that fails just as well:

"abcd" =~ /[\w- ]/

This is because [\w- ] is interpreted as "a range beginning with any word character up to space (or blank)", rather than a character class containing a word, a hyphen, or a space, which is what you had intended.

Per Ruby's regular expression documentation:

Within a character class the hyphen (-) is a metacharacter denoting an inclusive range of characters. [abcd] is equivalent to [a-d]. A range can be followed by another range, so [abcdwxyz] is equivalent to [a-dw-z]. The order in which ranges or individual characters appear inside a character class is irrelevant.

As you saw, prepending a backslash escaped the hyphen, thus changing the nature of the regexp from a range to a character class, removing the error. However, escaping the hyphen in the middle of character class is not recommended, since it's easy to confuse the intended meaning of the hyphen in such cases. As m.buettner pointed out, always place hyphens either at the beginning or the end of a character class:

"abcd" =~ /[-\w ]/
Jeconiah answered 8/11, 2012 at 8:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.