Lua: split string into words unless quoted
Asked Answered
S

3

6

So I have the following code to split a string between whitespaces:

text = "I am 'the text'"
for string in text:gmatch("%S+") do
    print(string)
end

The result:

I
am
'the
text'

But I need to do this:

I
am
the text --[[yep, without the quotes]]

How can I do this?

Edit: just to complement the question, the idea is to pass parameters from a program to another program. Here is the pull request that I am working, currently in review: https://github.com/mpv-player/mpv/pull/1619

Spartan answered 22/2, 2015 at 22:35 Comment(0)
G
8

There may be ways to do this with clever parsing, but an alternative way may be to keep track of a simple state and merge fragments based on detection of quoted fragments. Something like this may work:

local text = [[I "am" 'the text' and "some more text with '" and "escaped \" text"]]
local spat, epat, buf, quoted = [=[^(['"])]=], [=[(['"])$]=]
for str in text:gmatch("%S+") do
  local squoted = str:match(spat)
  local equoted = str:match(epat)
  local escaped = str:match([=[(\*)['"]$]=])
  if squoted and not quoted and not equoted then
    buf, quoted = str, squoted
  elseif buf and equoted == quoted and #escaped % 2 == 0 then
    str, buf, quoted = buf .. ' ' .. str, nil, nil
  elseif buf then
    buf = buf .. ' ' .. str
  end
  if not buf then print((str:gsub(spat,""):gsub(epat,""))) end
end
if buf then print("Missing matching quote for "..buf) end

This will print:

I
am
the text
and
some more text with '
and
escaped \" text

Updated to handle mixed and escaped quotes. Updated to remove quotes. Updated to handle quoted words.

Gaza answered 22/2, 2015 at 23:29 Comment(5)
I would prefer something using string parsing. Anyway, while I didn't said in the post I need something to work both with single and double quotes, since the idea of this code is to parse parameters from the shell.Spartan
It's easy to update this solution to make it work with single and double quotes; just replace "^"` with [[^["']]] and "'$" with [[['"]$]]. You may also need to check that the opening quote matches the closing one.Gaza
It's possible to do with with string parsing, but the solution is likely to be more complex (and not with one expression as Lua patterns are not powerful enough to express what you need).Gaza
@m45t3r, I updated the code to handle mixed and escaped quotes.Gaza
Well, we did resolve the problem in another way (using mpv's internal key-value pair representation instead of passing a string), but I quite liked your answer (since it doesn't require another library and the code is cleaner than the other non-library answer), so I am marking this as the answer.Spartan
U
1

Try this:

text = [[I am 'the text' and '' here is "another text in quotes" and this is the end]]

local e = 0
while true do
    local b = e+1
    b = text:find("%S",b)
    if b==nil then break end
    if text:sub(b,b)=="'" then
        e = text:find("'",b+1)
        b = b+1
    elseif text:sub(b,b)=='"' then
        e = text:find('"',b+1)
        b = b+1
    else
        e = text:find("%s",b+1)
    end
    if e==nil then e=#text+1 end
    print("["..text:sub(b,e-1).."]")
end
Upali answered 23/2, 2015 at 1:27 Comment(1)
Fixed to handle both single and double quotes, and empty quoted text.Upali
I
1

Lua Patterns aren't powerful to handle this task properly. Here is an LPeg solution adapted from the Lua Lexer. It handles both single and double quotes.

local lpeg = require 'lpeg'

local P, S, C, Cc, Ct = lpeg.P, lpeg.S, lpeg.C, lpeg.Cc, lpeg.Ct

local function token(id, patt) return Ct(Cc(id) * C(patt)) end

local singleq = P "'" * ((1 - S "'\r\n\f\\") + (P '\\' * 1)) ^ 0 * "'"
local doubleq = P '"' * ((1 - S '"\r\n\f\\') + (P '\\' * 1)) ^ 0 * '"'

local white = token('whitespace', S('\r\n\f\t ')^1)
local word = token('word', (1 - S("' \r\n\f\t\""))^1)

local string = token('string', singleq + doubleq)

local tokens = Ct((string + white + word) ^ 0)


input = [["This is a string" 'another string' these are words]]
for _, tok in ipairs(lpeg.match(tokens, input)) do
  if tok[1] ~= "whitespace" then
     if tok[1] == "string" then
        print(tok[2]:sub(2,-2)) -- cut off quotes
     else
       print(tok[2])
     end
  end
end

Output:

This is a string
another string
these
are
words
Intumesce answered 23/2, 2015 at 13:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.