Splitting string into matching and non-matching groups in javascript
Asked Answered
T

2

5

I am trying to split the string into an array of strings those matching a regular expression and those that don't:

string = "Lazy {{some_animal}} jumps over.."
# do some magic with regex /({{\s?[\w]+\s?}})/g and its negation
array = ["Lazy ", "{{some_animal}}", " jumps over.."]

Best performant way to do that in javascript?

Tog answered 4/8, 2017 at 8:36 Comment(2)
This feels like an X/Y problem, e.g., you want to do X, think you need this array to do it, so you're asking how to create this array (Y). What's X? We may be able to provide other useful ways that don't involve this operation.Enochenol
you can capture them into 3 groups and use those groups to create an arraySolidus
S
4

You can use String match for that

The regex below simply matches anything that's not a mustach, optionally surrounded by mustaches.

Example snippet:

var str = "Lazy {{some_animal}} jumps over..";

const pattern = /\{*[^{}]+\}*/g;

var array = str.match(pattern);

console.log(str);
console.log(pattern);
console.log(array);

But to make it more precise, the regex pattern becomes a bit more complicated.
The regex below matches:

  1. "what you want"
    (a word between 2 mustaches on each side)
  2. OR "what you don't want followed by what you want"
    (using lazy matching and positive lookahead)
  3. OR "what remains"

var str = "Lazy {{some_animal}} jumps over..";

const pattern = /\{\{\w+\}\}|.+?(?=\{\{\w+\}\})|.+/g;

var array = str.match(pattern);

console.log(str);
console.log(pattern);
console.log(array);

And last but not least, the evil SM method.
Split AND Match on the same regex. And concatinate them into a single array.
The downside of this method is that the order is not preserved.

var str = "Lazy {{some_animal}} jumps over..";

const pattern = /\{\{\w+\}\}/g;

var what_you_want = str.match(pattern);
var what_you_dont_want = str.split(pattern);

var array = what_you_want.concat(what_you_dont_want);

console.log(str);
console.log(pattern);
console.log(array);
Shult answered 4/8, 2017 at 9:5 Comment(3)
Well, this will fail if there are single or non-matching curlies in the string.Endbrain
Well, that depends on how you look at it. Sure, the first method doesn't count the mustaches. So it would also split on the lonely mustaches. But that's not an issue if the input string doesn't have those. And if it does, there will just be a few more records in the resulting array.Shult
@Drenai Wow, a comment on such old answer. Well, it wasn't requested by the OP to make a distinction. If that's what you need then I suggest to search for it, or make a question for it.Shult
E
4

I'm fairly sure a simple exec loop is going to be your best option:

function getSegments(rex, str) {
  var segments = [];
  var lastIndex = 0;
  var match;
  rex.lastIndex = 0; // In case there's a dangling previous search
  while (match = rex.exec(str)) {
    if (match.index > lastIndex) {
      segments.push(str.substring(lastIndex, match.index));
    }
    segments.push(match[0]);
    lastIndex = match.index + match[0].length;
  }
  if (lastIndex < str.length) {
    segments.push(str.substring(lastIndex));
  }
  return segments;
}

var rex = /{{\s?[\w]+\s?}}/g;
var string = "Lazy {{some_animal}} jumps over..";

console.log(getSegments(/{{\s?[\w]+\s?}}/g, string));

Note I removed the capture group; it's not needed for this sort of solution.

Enochenol answered 4/8, 2017 at 8:43 Comment(1)
I do like how you actually found a solution for the title of the question. Which clearly took more effort than my solution which merely focused on the question. :)Shult
S
4

You can use String match for that

The regex below simply matches anything that's not a mustach, optionally surrounded by mustaches.

Example snippet:

var str = "Lazy {{some_animal}} jumps over..";

const pattern = /\{*[^{}]+\}*/g;

var array = str.match(pattern);

console.log(str);
console.log(pattern);
console.log(array);

But to make it more precise, the regex pattern becomes a bit more complicated.
The regex below matches:

  1. "what you want"
    (a word between 2 mustaches on each side)
  2. OR "what you don't want followed by what you want"
    (using lazy matching and positive lookahead)
  3. OR "what remains"

var str = "Lazy {{some_animal}} jumps over..";

const pattern = /\{\{\w+\}\}|.+?(?=\{\{\w+\}\})|.+/g;

var array = str.match(pattern);

console.log(str);
console.log(pattern);
console.log(array);

And last but not least, the evil SM method.
Split AND Match on the same regex. And concatinate them into a single array.
The downside of this method is that the order is not preserved.

var str = "Lazy {{some_animal}} jumps over..";

const pattern = /\{\{\w+\}\}/g;

var what_you_want = str.match(pattern);
var what_you_dont_want = str.split(pattern);

var array = what_you_want.concat(what_you_dont_want);

console.log(str);
console.log(pattern);
console.log(array);
Shult answered 4/8, 2017 at 9:5 Comment(3)
Well, this will fail if there are single or non-matching curlies in the string.Endbrain
Well, that depends on how you look at it. Sure, the first method doesn't count the mustaches. So it would also split on the lonely mustaches. But that's not an issue if the input string doesn't have those. And if it does, there will just be a few more records in the resulting array.Shult
@Drenai Wow, a comment on such old answer. Well, it wasn't requested by the OP to make a distinction. If that's what you need then I suggest to search for it, or make a question for it.Shult

© 2022 - 2024 — McMap. All rights reserved.