Regex match Telegram username and delete whole line in PHP
Asked Answered
U

3

8

I wanna match Telegram username in message text and delete entire line, I've tried this pattern but the problem is that it matches emails too:

.*(@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*

Pattern should match all this lines :

Hi @username how are you?

Hi @username.how are you?

😉@username.

And should not match email like this:

Hi email to [email protected]

Untangle answered 7/8, 2020 at 19:35 Comment(3)
It may have more than one emoji before @Untangle
I was thinking .*[^a-zA-Z]@ ... which would be far from perfect. Then I looked up emailregex.com And thought... maybe that would be helpful? You could maybe get your match as you have it, then use another regex to check if the "username" is actually a username, or if it's an email.Wakerly
Is it about emojis only? Or non word characters? Can the @ occur more than once in the string?Dextrose
A
9

Use

.*\B@(?=\w{5,32}\b)[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*.*

See proof

\B before @ means there must be a non-word character or start of string right before the @.

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \B                       the boundary between two word chars (\w)
                           or two non-word chars (\W)
--------------------------------------------------------------------------------
  @                        '@'
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \w{5,32}                 word characters (a-z, A-Z, 0-9, _)
                             (between 5 and 32 times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to 'Z',
                           '0' to '9' (1 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    _                        '_'
--------------------------------------------------------------------------------
    [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9' (1 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
Anecdotist answered 7/8, 2020 at 20:1 Comment(0)
D
1

.*[\W](@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*

I've added this [\W] non-word characters before @ symbol. Here you can check the result https://regex101.com/r/yFGegO/1

Druci answered 7/8, 2020 at 19:46 Comment(0)
A
1

Nothing new under the sun, but basically other patterns can be reduced to:

.*?\B@\w{5}.*

demo

or eventually:

.*?\B\w{5,64}\b.*

if you want to be more precise, but is it really needed?

Notice: if you want to remove the newline sequence too, add \R? at the end of the pattern.

Ampliate answered 7/8, 2020 at 20:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.