Is there a bug in Ruby lookbehind assertions (1.9/2.0)?
Asked Answered
C

1

34

Why doesn't the regex (?<=fo).* match foo (whereas (?<=f).* does)?

"foo" =~ /(?<=f).*/m          => 1
"foo" =~ /(?<=fo).*/m         => nil

This only seems to happen with singleline mode turned on (dot matches newline); without it, everything is OK:

"foo" =~ /(?<=f).*/           => 1
"foo" =~ /(?<=fo).*/          => 2

Tested on Ruby 1.9.3 and 2.0.0.

See it on Rubular

EDIT: Some more observations:

Adding an end-of-line anchor doesn't change anything:

"foo" =~ /(?<=fo).*$/m        => nil

But together with a lazy quantifier, it "works":

"foo" =~ /(?<=fo).*?$/m       => 2

EDIT: And some more observations:

.+ works as does its equivalent {1,}, but only in Ruby 1.9 (it seems that that's the only behavioral difference between the two in this scenario):

"foo" =~ /(?<=fo).+/m         => 2
"foo" =~ /(?<=fo).{1,}/       => 2

In Ruby 2.0:

"foo" =~ /(?<=fo).+/m         => nil
"foo" =~ /(?<=fo).{1,}/m      => nil

.{0,} is busted (in both 1.9 and 2.0):

"foo" =~ /(?<=fo).{0,}/m      => nil

But {n,m} works in both:

"foo" =~ /(?<=fo).{0,1}/m     => 2
"foo" =~ /(?<=fo).{0,2}/m     => 2
"foo" =~ /(?<=fo).{0,999}/m   => 2
"foo" =~ /(?<=fo).{1,999}/m   => 2
Commutate answered 5/3, 2013 at 21:4 Comment(14)
Well, lookbehind assertions are a new feature since version 1.9, but it's not like this is a very complicated one...makes you wonder what other bugs there are.Commutate
If it's a bug, it's in two different regexp engines (1.9 and 2.0.0 don't use the same engine).Entertainment
Well the Ruby 2.0 engine is Onigmo, which is a fork of Ruby 1.9's engine Oniguruma. So if it's really a bug, it may well exist in both engines going unnoticed so far.Scanlon
Well, I've opened a ticket in the Ruby bug tracker...: bugs.ruby-lang.org/issues/8023Commutate
In Ruby, 'dot matches all' is multiline mode, and there is no singleline mode as such.Hilleary
@MikeM: What Ruby calls "multiline" is called "singleline" in every other regex flavor there is. This is confusing enough :)Commutate
Linked: How do I create a multiline regex?.Hilleary
@dbenhur: Thanks for the additional observations! I've played with them and found a difference between Ruby 2.0 and 1.9's regex engines in the .+/.{1,} variants of the regex (see above).Commutate
I think it would be helpful and easier to read to many others if you remove the irb/pry prompt from the code chunks, and further put the results on the same line as the code like "foo" =~ /(?<=f).*/m # => 1.Nomen
@sawa: Right, thanks, this was getting out of hand :)Commutate
@WayneConrad: It's a slightly different bug in each version. Specifically, .+ works in 1.9 and fails in 2.0...Commutate
Shouldn't your comment on opening a bug tracker be the answer? And this question be closed? Or even deleted, since it's almost obvious from the start that this is a bug? It's a great find, but I don't think it's a great SO question. And especially it isn't an unanswered one. (so -1 on the question, but +1 on the comment in an attempt to show this "question" in answered.)Colbycolbye
@ChrisWesseling: I agree, but I'm still waiting for any reaction from the Ruby bugtracker. So far, there has been no activity at all. Until that happens, I'm hesitant to close the question.Commutate
@ChrisWesseling: Also, as is evident from the edits to the question, there have been valuable contributions to the question that helped define the (fairly obvious) bug better; that's something I'm not seeing on the bugtracker either.Commutate
C
7

This has been officially classified as a bug and subsequently fixed, together with another problem concerning \Z anchors in multiline strings.

Commutate answered 8/3, 2013 at 19:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.