Split a string using string.gmatch() in Lua
Asked Answered
C

3

13

There are some discussions here, and utility functions, for splitting strings, but I need an ad-hoc one-liner for a very simple task.

I have the following string:

local s = "one;two;;four"

And I want to split it on ";". I want, eventually, go get { "one", "two", "", "four" } in return.

So I tried to do:

local s = "one;two;;four"

local words = {}
for w in s:gmatch("([^;]*)") do table.insert(words, w) end

But the result (the words table) is { "one", "", "two", "", "", "four", "" }. That's certainly not what I want.

Now, as I remarked, there are some discussions here on splitting strings, but they have "lengthy" functions in them and I need something succinct. I need this code for a program where I show the merit of Lua, and if I add a lengthy function to do something so trivial it would go against me.

Crosslink answered 11/11, 2013 at 13:45 Comment(1)
[^;]* is perfectly happy matching zero semicolons. So lua matches zero semicolons each time it gets to a delimiter. You can use "[^;]+" instead for a slightly better result but there are reasons the lua-users.org/wiki/SplitJoin page of the lua-users wiki runs as long as it does when talking about splitting strings.Gist
T
24
local s = "one;two;;four"
local words = {}
for w in (s .. ";"):gmatch("([^;]*);") do 
    table.insert(words, w) 
end

By adding one extra ; at the end of the string, the string now becomes "one;two;;four;", everything you want to capture can use the pattern "([^;]*);" to match: anything not ; followed by a ;(greedy).

Test:

for n, w in ipairs(words) do
    print(n .. ": " .. w)
end

Output:

1: one
2: two
3:
4: four
Ternion answered 11/11, 2013 at 13:57 Comment(6)
Wow, thanks. Your solution works perfectly! (I won't close this question yet: if somebody could explain to me why my original code returns spurious empty strings I'd be grateful.)Crosslink
@NiccoloM. Remember that * matches zero or more, the empty string where I marked $ in the string one$;two$;$;four$ is also a match.Ternion
But what about one$;$two$;$;$fo$ur$? Why is the zero match only before ; ? Why isn't it also after the ;, and between every two letters?Crosslink
@NiccoloM. Because * is greedy, it will try to match as long as possible, the non-greedy version to match zero or more is -.Ternion
After thinking very long about this, I now understand. I see that regexps in Ruby (and probably in other languages as well) behave in exactly the same way. Thanks.Crosslink
It's worth noting that LUA Patterns are not actually Regular Expressions, you will notice many differences between 'standard' regexp implementations and how LUA patterns operate.Bal
S
0

Just changing * to + works.

local s = "one;two;;four"
local words = {}
for w in s:gmatch("([^;]+)") do 
    table.insert(words, w) 
    print(w)
end

The magic character * represents 0 or more occurrene, so when it meet ',', lua regarded it as a empty string that [^;] does not exist.

Sorry for my carelessness, the words[3] should be a empty string, but when I run the original code in lua5.4 interpreter, everything works.

code here

running result here (I have to put links because of lack of reputation)

Seaden answered 26/11, 2020 at 13:53 Comment(2)
This does not give the OP's desired output, they want an empty string at index 3. { "one", "two", "", "four" }Ecdysiast
@Ecdysiast sry,I dont read the question carefully.But when I use lua5.4 interpreter, the original code suddenly works!?Seaden
G
-2
function split(str,sep)
    local array = {}
    local reg = string.format("([^%s]+)",sep)
    for mem in string.gmatch(str,reg) do
        table.insert(array, mem)
    end
    return array
end
local s = "one;two;;four"
local array = split(s,";")

for n, w in ipairs(array) do
    print(n .. ": " .. w)
end

result:

1:one

2:two

3:four

Godwit answered 21/11, 2014 at 2:6 Comment(3)
Your answer should contain an explanation of your code and a description how it solves the problem.Inflow
This doesn't work as OP expected, because the empty string between ;; isn't captured.Ternion
I see. Now I understand what you mean.Godwit

© 2022 - 2024 — McMap. All rights reserved.