How to find a duplicate string with Pattern Matching?
Asked Answered
D

1

8

I have a string similar to this:

[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> |Hunit:Player-3693-07420299:DevnullYour [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature. 

In case you wonder, it's from World of Warcraft.

I'd like to end with something like this:

[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training Dummy 33265 Nature. 

If you notice, "Dungeoneer's Training Dummy" is printed twice. I've managed to get rid of the first "|Hunit" portion with something like this:

str = "[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> |Hunit:Player-3693-07420299:DevnullYour [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature."
str = string.gsub(str, "|Hunit:.*:.*Your", "Your")

Which returns this:

print(str)    # => [13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature.

I then add a second gsub:

str = string.gsub(str, "|Hunit:.*:", "")
print(str) # => [13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature.

But the double "Dungeoneer's Training Dummy" string is repeated, obviously.

How could I get rid of the duplicated string? This string can be anything else, in this case is "Dungeoneer's Training Dummy", but it can be the name of any other target.

Dagnah answered 19/3, 2015 at 20:49 Comment(0)
P
5

You can try something like this:

str = "[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature."
-- find a string that starts with 'hit', has some number of non-digits
-- and ends with one or more digit and one or more characters.
-- these characters will be "captured" into three strings,
-- which are then passed to the "replacement" function.
-- the returned result of the function replaces the value in the string.
str = str:gsub("(hit%s+)([^%d]+)(%d+.+)", function(s1, s2, s3)
    local s = s2:gsub("%s+$","") -- drop trailing spaces
    if #s % 2 == 0 -- has an even number of characters
    and s:sub(0, #s / 2) -- first half
    == -- is the same
    s:sub(#s / 2 + 1) -- as the second half
    then -- return the second half
      return s1..s:sub(#s / 2 + 1)..' '..s3
    else
      return s1..s2..s3
    end
  end)
print(str)

This prints: [13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training Dummy

This code will attempt to extract the name of the target and check if the name is a full duplicate. If the match fails, it returns the original string.

Pinelli answered 19/3, 2015 at 22:3 Comment(5)
That does it, although I still require the trailing "33265 Nature.". Would you mind explaining what happens in the function you used? If it's not much trouble.Dagnah
After 33265 Nature is removed, the function checks if the current string can be split into two halves and check if those two halves are the same. I'll add more comments...Pinelli
Update the solution to keep 33265 Nature in it.Pinelli
Ohhh, I get it. An even number of characters is a telltale that it is a duplicate. Clever. Thanks a lot!Dagnah
Right; I thought [^] would be a useful thing to know if someone is not familiar with it, but %D is definitely shorter.Pinelli

© 2022 - 2024 — McMap. All rights reserved.