A regex I don't understand
Asked Answered
L

1

9

I'm starring at these few (slightly modified) lines from luadoc that are obviously building a filename with a full path. But I simply don't get it what happens in line 5. The parameter filename could be something like "myfile.lua".

function out_file (filename)
  local h = filename
  h = string.gsub(h, "lua$", "tex")
  h = string.gsub(h, "luadoc$", "tex")
  h = options.output_dir .. string.gsub (h, "^.-([%w_]+%.tex)$", "%1")
  return h
end

What does happen in line 5?

Libb answered 15/4, 2011 at 17:9 Comment(2)
provide an example input string and we can tell you what it is doing?Sensitize
A key to understanding Lua patterns is that they are not actually a "regex". They are similar, but both the syntax and semantics are just different enough to cause confusion. While it may sound silly, learning to call them "patterns" and not "regex" will likely help you improve your understanding by allowing you a mental model that has room for thinking about the differences. After that, it is easy to remember that % is the escape character in a pattern, and `` in regex; that there are is no alternation in a pattern, and so forth.Snead
J
14
h = options.output_dir .. string.gsub (h, "^.-([%w_]+%.tex)$", "%1")

The pattern matches any string that begins with zero or more non-alphanumeric characters (i.e. whitespace, etc.) followed by one or more alphanumeric characters and underscores (probably a filename), a period and the string "tex" which then ends. It captures the filename + ".tex" for later use. Basically it's taking a filename with possible junk characters (whitespace) at the beginning and replacing it with the clean version before tacking the output directory to the front of it.

Now what's probably causing you confusion there is that . matches any character. But when modified by a terminating - that means "the shortest string of zero or more characters before the next match" -- i.e. a non-greedy search. It will match on any characters it finds from the beginning of the string until it finds something that matches the compound [%w_] – alphanumeric or underscore.

Jennings answered 15/4, 2011 at 17:27 Comment(1)
You were absolutely right in your assumption, what was mainly confusing me.Libb

© 2022 - 2024 — McMap. All rights reserved.