Split a string using whitespace in Javascript?
Asked Answered
V

7

13

I need a tokenizer that given a string with arbitrary white-space among words will create an array of words without empty sub-strings.

For example, given a string:

" I dont know what you mean by glory Alice said."

I use:

str2.split(" ")

This also returns empty sub-strings:

["", "I", "dont", "know", "what", "you", "mean", "by", "glory", "", "Alice", "said."]

How to filter out empty strings from an array?

Vagal answered 22/2, 2012 at 19:50 Comment(0)
S
18

You probably don't even need to filter, just split using this Regular Expression:

"   I dont know what you mean by glory Alice said.".split(/\b\s+/)
Silures answered 22/2, 2012 at 19:55 Comment(4)
Off-topic: what mean \b in regex?Smoulder
Matches a word boundary, such as a space, a newline character, punctuation character or end of string (developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions). Might not be the perfect Regex but for that example it works.Silures
@Mustafa yeah, I know. But it is just a curiosity.Smoulder
I like the regex, but how to account for ", " (comma, space)Marten
P
11
 str.match(/\S+/g) 

returns a list of non-space sequences ["I", "dont", "know", "what", "you", "mean", "by", "glory", "Alice", "said."] (note that this includes the dot in "said.")

 str.match(/\w+/g) 

returns a list of all words: ["I", "dont", "know", "what", "you", "mean", "by", "glory", "Alice", "said"]

docs on match()

Piers answered 22/2, 2012 at 20:36 Comment(1)
Good answer. For others' reference, /S+/ matches against groups of characters that are not whitespace, whereas /w+/ matches groups of characters that are alphanumeric+underscore. That's why the period (.) character matches in one but not the other.Iinde
B
7

You should trim the string before using split.

var str = " I dont know what you mean by glory Alice said."
var trimmed = str.replace(/^\s+|\s+$/g, '');
trimmed = str.split(" ")
Bountiful answered 22/2, 2012 at 19:58 Comment(0)
L
2

I recommend .match:

str.match(/\b\w+\b/g);

This matches words between word boundaries, so all spaces are not matched and thus not included in the resulting array.

Lumbricalis answered 22/2, 2012 at 19:58 Comment(2)
This works even better: >>> str2 "Humpty Dumpty smiled contemptuously Of course you dont—till I tell you I meant theres a nice knock-down argument for you! " Using: str3 = str2.match(/\b\w+\b/g); Results in: >>> str3 ["Humpty", "Dumpty", "smiled", "contemptuously", "Of", "course", "you", "dont", "till", "I", "tell", "you", "I", "meant", "theres", "a", "nice", "knock", "down", "argument", "for", "you"] So w+ matchs also "—"Vagal
@dokondr: What do you count as word characters? If it's everything except spaces, you may want to just use [^ ] instead of \w.Lumbricalis
C
0

see the filter method

http://www.hunlock.com/blogs/Mastering_Javascript_Arrays#quickIDX13

Contagious answered 22/2, 2012 at 19:55 Comment(0)
A
0

i think empty sub-string happen because there are multiple white-spaces you can use a replace() in a for loop to replace multiple white-spaces with a single white-space then split() to split the program using a single white space like this:

// getting full program from div
var program = document.getElementById("ans").textContent;
//removing multiple spaces
var res = program.replace("  ", " ");
for (i = 0; i <= program.length; i++) {
  var res = res.replace("  ", " ");
}
// spliting each word using space as saperator
var result = res.split(" ");
Aylward answered 8/3, 2015 at 8:6 Comment(0)
C
0

That is all that we need:

str.trim().split(' ')
Corelation answered 11/1, 2022 at 19:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.