Lua string.gsub with Multiple Patterns
Asked Answered
C

4

8

I am working on renaming the Movie titles that has unwanted letters. The string.gsub can replace a string with "" nil value but I have around 200 string patterns that need to be replaces with "".

Right now I have to string.gsub for every pattern. I was thinking is there is a way to put all the string patterns in to single string.gsub line. I have searched around the web for the solution but still didn't got anything.

The movie title is like this B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT and I want to remove the extra characters like 2013, Hindi, 720p, DvDRip, CROPPED, AAC, x264, RickyKT.

Cusped answered 13/8, 2014 at 7:25 Comment(1)
try this regex regex101.com/r/rR0eX0/2, see if it is worth a pennyDistributor
M
11

You can pass to string.gsub a table as the third argument like this:

local movie = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"
movie = movie:gsub("%S+", {["2013"] = "", ["Hindi"] = "", ["720p"] = "", 
                       ["DvDRip"] = "", ["CROPPED"] = "", ["AAC"] = "", 
                       ["x264"] = "", ["RickyKT"] = ""})

print(movie)
Myrtia answered 13/8, 2014 at 7:31 Comment(0)
I
1

Put all of the patterns in a table and then enumerate the table, calling string.gsub() for each pattern:

str = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"

patterns = {"pattern1", "pattern2", "pattern3"}
for i,v in ipairs(patterns) do
    str = string.gsub(str, v, "")
end

This will require many invocations of string.gsub(), but the code should be much more maintainable than having a lot of string.gsub() calls.

Itu answered 13/8, 2014 at 7:28 Comment(1)
One possible problem is, a substring like "722013p" will be matched by "2013", left with "720p", then matched by "720p" which is not what's expected.Myrtia
G
1

To avoid to write keys and values on a table for every new entry, i'd write a function to handle a numerically indexed table (the patterns being the values).

This way I dont need to write {["pattern_n"] = ""} for every new pattern.

Ex:

PATTERNS = {"2013", "Hindi", "720p", "DvDRip", "CROPPED", "AAC", "x264", "RickyKT"}
function replace(match)
    local ret = nil
    for i, v in ipairs(PATTERNS) do
        if v:find(match) then
            ret = ""
        end
    end
    return ret
end


local movie = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"
movie = movie:gsub("%S+", replace)

print(movie)
Gennie answered 4/12, 2017 at 2:41 Comment(0)
U
1

You could do it in a simple function, that way you do not need to write the code each time per string, or just put string.gsub, and the replacement value for the string you need

Function:

local large_name = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"

function clean_name(str)
  local v = string.gsub(str, "(.-)%s([%(%[']?%d%d%d?%d?[%)%]]?)%s*(.*)", "%1")
  return v
end

print(clean_name(large_name))

Only string.gsub for value

local large_name = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"
local clean_name = string.gsub(large_name, "(.-)%s([%(%[']?%d%d%d?%d?[%)%]]?)%s*(.*)", "%1")

print(clean_name)

The replacement pattern places the first value (name of the movie) separated by a space and prints it, also identifies the year as the second value, to avoid error in the titles, so it is not necessary to place all the values ​​that can exist within the name of the movie and will avoid many false positives

I add a testing function to test different movie names

local testing = {"Whiplash 2014 [1080p]",
"Anon (2018) [WEBRip] [1080p] [YTS.AM]",
"Maze Runner The Death Cure 2018 [WEBRip] [1080p] [YTS.AM]",
"12 Strong [2018] [WEBRip] [1080p] [YTS.AM]",
"Kingsman The Secret Service (2014) [1080p]",
"The Equalizer [2014] [1080p]",
"Annihilation 2018 [WEBRip] [1080p] [YTS.AM]",
"The Shawshank Redemption '94",
"Assassin's Creed 2016 HC 720p HDRip 850 MB - iExTV",
"Captain Marvel (2019) [WEBRip] [1080p] [YTS.AM]",}

for k,v in pairs(testing) do
  local result = string.gsub(v, "(.-)%s([%(%[']?%d%d%d?%d?[%)%]]?)%s*(.*)", "%1")
  print(result)
end

Output:

Whiplash
Anon
Maze Runner The Death Cure
12 Strong
Kingsman The Secret Service
The Equalizer
Annihilation
The Shawshank Redemption
Assassin's Creed
Captain Marvel
Untrimmed answered 3/6, 2019 at 14:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.