javascript regex - look behind alternative?
Asked Answered
D

8

160

Here is a regex that works fine in most regex implementations:

(?<!filename)\.js$

This matches .js for a string which ends with .js except for filename.js

Javascript doesn't have regex lookbehind. Is anyone able put together an alternative regex which achieve the same result and works in javascript?

Here are some thoughts, but needs helper functions. I was hoping to achieve it just with a regex: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript

Disaccustom answered 11/9, 2011 at 4:6 Comment(13)
if you just need to check a specific filename or list of filenames, why not just use two checks? check if it ends in .js and then if it does, check that it doesn't match filename.js or vice versa.Underpart
Update: The latest public Chrome version (v62) includes (presumably experimental) lookbehinds out of the box :D Note however that lookbehinds are still in proposal stage 3: github.com/tc39/proposal-regexp-lookbehind . So, it may take a while until JavaScript everywhere supports it. Better be careful about using in production!Innards
# Update: ES2018 includes lookbehind assertions Plus: - dotAll mode (the s flag) - Lookbehind assertions - Named capture groups - Unicode property escapesConoscenti
Just use (?<=thingy)thingy for positive lookbehind and (?<!thingy)thingy for negative lookbehind. Now it supports them.Hypersonic
@K._ As of Feb 2018 that's not true yet!! And it will need some time because browsers and engines must implement the specification (current in draft).Dede
@AndreFigueiredo Yes, you're right. The proposal is currently on Stage 4. Maybe I was thinking of only Chrome, I guess.Hypersonic
nodejs kangax.github.io/compat-table/es2016plus supports itSepticemia
Firefox still hasn't implemented the 2018 specification which prescribes support for look-behinds. Here's the bug.Rogerson
@LonnieBest meanwhile fixed for FF (5 days ago) :-)Ignorant
@Ignorant : That's fantastic news. When will it land? Version 77?Rogerson
looks like 78 (see ` Milestone: mozilla78`)Ignorant
Still not supported for safari @ 2022Silvery
As of Jan 12, The latest Safari Technology Preview release 161 (bugs.webkit.org/show_bug.cgi?id=174931#c56) supports lookbehind.Decennary
P
68

^(?!filename).+\.js works for me

tested against:

  • test.js match
  • blabla.js match
  • filename.js no match

A proper explanation for this regex can be found at Regular expression to match string not containing a word?

Look ahead is available since version 1.5 of javascript and is supported by all major browsers

Updated to match filename2.js and 2filename.js but not filename.js

(^(?!filename\.js$).).+\.js

Portaltoportal answered 11/9, 2011 at 4:17 Comment(6)
That question you linked to talks about a slightly different problem: matching a string that doesn't contain the target word anywhere. This one is much simpler: matching a string that doesn't start with the target word.Unionism
Thats really nice, it only misses out on cases like: filename2.js or filenameddk.js or similar. This is a no match, but should be a match.Disaccustom
@Disaccustom You asked for a look-behind, not a look-ahead, why did you accepted this answer?Grimaud
I'm grave-digging here, but the updated one has a broken/useless capture group and matches "filename.js" in the string "filename.json". It should be ^(?!filename\.js$).+\.js$Sinhalese
the given one does not match on a.jsMcconaghy
The original regex with lookbehind doesn't match 2filename.js, but the regex given here does. A more appropriate one would be ^(?!.*filename\.js$).*\.js$. This means, match any *.js except *filename.js.Blau
F
169

EDIT: From ECMAScript 2018 onwards, lookbehind assertions (even unbounded) are supported natively.

In previous versions, you can do this:

^(?:(?!filename\.js$).)*\.js$

This does explicitly what the lookbehind expression is doing implicitly: check each character of the string if the lookbehind expression plus the regex after it will not match, and only then allow that character to match.

^                 # Start of string
(?:               # Try to match the following:
 (?!              # First assert that we can't match the following:
  filename\.js    # filename.js 
  $               # and end-of-string
 )                # End of negative lookahead
 .                # Match any character
)*                # Repeat as needed
\.js              # Match .js
$                 # End of string

Another edit:

It pains me to say (especially since this answer has been upvoted so much) that there is a far easier way to accomplish this goal. There is no need to check the lookahead at every character:

^(?!.*filename\.js$).*\.js$

works just as well:

^                 # Start of string
(?!               # Assert that we can't match the following:
 .*               # any string, 
  filename\.js    # followed by filename.js
  $               # and end-of-string
)                 # End of negative lookahead
.*                # Match any string
\.js              # Match .js
$                 # End of string
Fructidor answered 11/9, 2011 at 6:7 Comment(9)
Works on lots of cases except where there are preceeding characters, for example: filename.js (works-nomatch) filename2.js (works-match) blah.js (works - match) 2filename.js (doesn't work - nomatch) --- having said that, the lookbehind has the same limitation which I didn't realise until now...Disaccustom
@daniel: Well, your regex (with lookbehind) also doesn't match 2filename.js. My regex matches in exactly the same cases as your example regex.Fructidor
Forgive my naivety but is there a use for the non capturing group here? I've always known that to be only useful when trying to glean back reference for replacement in a string. As far as I know, this too will work ^(?!filename\.js$).*\.js$Affricate
Not quite, that regex checks for "filename.js" only at the start of the string. But ^(?!.*filename\.js$).*\.js$ would work. Trying to think of situations where the ncgroup might still be necessary...Fructidor
This approach can be summarized as: instead of looking behind X, look ahead at every character that comes before X?Novelize
@HaiPhan: Yes, but re-reading my answer I just noticed that there is a vastly less complicated solution that I had completely overlooked. Will update my answer X-)Fructidor
Firefox still doesn't support look-behinds as prescribed by the 2018 specification. I understand they're actively working on it though.Rogerson
BTW, I really like how you break the RegEx apart and describe it so concisely.Rogerson
Really appreciate the breakdown of what each component does. Thanks!Denadenae
P
68

^(?!filename).+\.js works for me

tested against:

  • test.js match
  • blabla.js match
  • filename.js no match

A proper explanation for this regex can be found at Regular expression to match string not containing a word?

Look ahead is available since version 1.5 of javascript and is supported by all major browsers

Updated to match filename2.js and 2filename.js but not filename.js

(^(?!filename\.js$).).+\.js

Portaltoportal answered 11/9, 2011 at 4:17 Comment(6)
That question you linked to talks about a slightly different problem: matching a string that doesn't contain the target word anywhere. This one is much simpler: matching a string that doesn't start with the target word.Unionism
Thats really nice, it only misses out on cases like: filename2.js or filenameddk.js or similar. This is a no match, but should be a match.Disaccustom
@Disaccustom You asked for a look-behind, not a look-ahead, why did you accepted this answer?Grimaud
I'm grave-digging here, but the updated one has a broken/useless capture group and matches "filename.js" in the string "filename.json". It should be ^(?!filename\.js$).+\.js$Sinhalese
the given one does not match on a.jsMcconaghy
The original regex with lookbehind doesn't match 2filename.js, but the regex given here does. A more appropriate one would be ^(?!.*filename\.js$).*\.js$. This means, match any *.js except *filename.js.Blau
F
26

Let's suppose you want to find all int not preceded by unsigned:

With support for negative look-behind:

(?<!unsigned )int

Without support for negative look-behind:

((?!unsigned ).{9}|^.{0,8})int

Basically idea is to grab n preceding characters and exclude match with negative look-ahead, but also match the cases where there's no preceeding n characters. (where n is length of look-behind).

So the regex in question:

(?<!filename)\.js$

would translate to:

((?!filename).{8}|^.{0,7})\.js$

You might need to play with capturing groups to find exact spot of the string that interests you or you want't to replace specific part with something else.

Fiona answered 30/11, 2014 at 13:50 Comment(4)
I just converted this: (?<!barna)(?<!ene)(?<!en)(?<!erne) (?:sin|vår)e?(?:$| (?!egen|egne)) to (?!barna).(?!erne).(?!ene).(?!en).. (?:sin|vår)e?(?:$| (?!egen|egne)) which does the trick for my needs. Just providing this as another "real-world" scenario. See linkInnards
I think you meant: ((?!unsigned ).{9}|^.{0,8})intNunn
@Nunn Yes. Thank you. I just corrected my response.Fiona
Thanks for the more generalized answer which works even where there is a need to match deep within the text (where initial ^ would be impractical)!Peshawar
F
3

If you can look ahead but back, you could reverse the string first and then do a lookahead. Some more work will need to be done, of course.

Fig answered 26/11, 2015 at 18:8 Comment(1)
This answer could really use some improvement. It seems more like a comment to me.Limonene
B
3

This is an equivalent solution to Tim Pietzcker's answer (see also comments of same answer):

^(?!.*filename\.js$).*\.js$

It means, match *.js except *filename.js.

To get to this solution, you can check which patterns the negative lookbehind excludes, and then exclude exactly these patterns with a negative lookahead.

Blau answered 16/5, 2017 at 5:48 Comment(0)
P
2

Thanks for the answers from Tim Pietzcker and other persons. I was so inspired by their works. However, there is no any ideal solution, I think, for mimicking lookbehind. For example, solution from Pietzcker is limited by $ as EOL, that is, without $ there would get unexpected result:

let str="filename.js  main.js  2022.07.01"
console.log( /^(?!.*filename\.js).*\.js/g.exec(str) ) //null

Another limitation is that it is hard to translate multiply lookbehinds, such as:

let reg=/(?<!exP0)exp0 \d (?<!exP1)exp1 \d (?<!exP2)exp2/

How to build a more generic and free method to use lookbehind assertion alternatively? Bellow is my solution.

The core pattern of alternative code is:

(?:(?!ExpB)....|^.{0,3})ExpA <= (?<!ExpB)ExpA

Detail explanation:

(?:         # start an unsave group:
 (?!ExpB)   # Assert a possion who can't match the ExpB
 ....       # Any string, the same length as ExpB
 |^.{0,3}   # Or match any string whoes length is less than ExpB
)           # End of negative lookahead
ExpA        # Match ExpA

For instance:

var str="file.js  main.js  2022.07.01"
var reg=/(?:(?!file)....|^.{0,3})\.js/g // <= (?<!file)\.js
console.log( reg.exec(str)[0] )  // main.js

Here is an implement to translate above pattern into a sugar:

var str="file.js  main.js  2022.07.01"
var reg=newReg("﹤4?!file﹥\\.js","g") //pattern sugar
console.log(reg.exec(str)[0]) // main.js

function newReg(sReg,flags){
  flags=flags||""
  sReg=sReg.replace(/(^|[^\\])\\﹤/g,"$1<_sl_>").replace(/(^|[^\\])\\﹥/g,"$1<_sr_>")
  if (/﹤\?<?([=!])(.+?)﹥/.test(sReg)){
    throw "invalid format of string for lookbehind regExp"
  }
  var reg=/﹤(\d+)\?<?([=!])(.+?)﹥/g
  if (sReg.match(reg)){
    sReg=sReg.replace(reg, function(p0,p1,p2,p3){
      return "(?:(?"+p2+p3+")"+".".repeat(parseInt(p1))+"|^.{0,"+(parseInt(p1)-1)+"})"
    })
  }
  sReg=sReg.replace(/<_sl_>/g,"﹤").replace(/<_sr_>/g,"﹥")
  var rr=new RegExp(sReg,flags)
  return rr
}

Two special characters ( \uFE64 or &#65124; ) and ( \uFE65 or &#65125; ) are used to enclose the lookbehind expression, and a number N counting the length of lookbehind expression must follow the . That is ,the syntax of lookbehind is:

﹤N?!ExpB﹥ExpA <= (?<!ExpB)ExpA
﹤N?=ExpB﹥ExpA <= (?<=ExpB)ExpA

To make the pattern above more ES5-like, you can replace or with parenthesis and remove N , by writing more code into newReg() function.

Petunia answered 31/7, 2022 at 19:12 Comment(0)
N
2

I know this answer is not tackling really how to rewrite a regex to simulate lookbehinds, but i managed to overcome some very simple situations like this one by replacing the unwanted match from the string beforehand, as in:

  let string = originalString.replace("filename.js", "filename_js")
  string.match(/.*\.js/)
Naumann answered 5/1, 2023 at 21:32 Comment(0)
H
-1

Below is a positive lookbehind JavaScript alternative showing how to capture the last name of people with 'Michael' as their first name.

1) Given this text:

const exampleText = "Michael, how are you? - Cool, how is John Williamns and Michael Jordan? I don't know but Michael Johnson is fine. Michael do you still score points with LeBron James, Michael Green Miller and Michael Wood?";

get an array of last names of people named Michael. The result should be: ["Jordan","Johnson","Green","Wood"]

2) Solution:

function getMichaelLastName2(text) {
  return text
    .match(/(?:Michael )([A-Z][a-z]+)/g)
    .map(person => person.slice(person.indexOf(' ')+1));
}

// or even
    .map(person => person.slice(8)); // since we know the length of "Michael "

3) Check solution

console.log(JSON.stringify(    getMichaelLastName(exampleText)    ));
// ["Jordan","Johnson","Green","Wood"]

Demo here: http://codepen.io/PiotrBerebecki/pen/GjwRoo

You can also try it out by running the snippet below.

const inputText = "Michael, how are you? - Cool, how is John Williamns and Michael Jordan? I don't know but Michael Johnson is fine. Michael do you still score points with LeBron James, Michael Green Miller and Michael Wood?";



function getMichaelLastName(text) {
  return text
    .match(/(?:Michael )([A-Z][a-z]+)/g)
    .map(person => person.slice(8));
}

console.log(JSON.stringify(    getMichaelLastName(inputText)    ));
Harbert answered 19/10, 2016 at 9:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.