Word Boundary Regular Expression Unless Inside HTML Tag
Asked Answered
P

2

1

I have a regular expression using word boundaries that works exceedingly well...

~\b('.$value.')\b~i

...save for the fact that it matches text inside HTML tags (i.e. title="This is blue!"). It's a problem because I'm doing text substitution on anything the regex matches, then making tooltips appear using those title tags. So, as you can imagine, it's substituting text inside the title and breaking the HTML of the tooltip. For example, what should be:

<span class="blue" title="This is blue!">Aqua</span>

...ends up becoming...

<span class="blue" title="This is <span class=" blue"="">Royal Blue</span>"&gt;Aqua</span>

My use of strip_tags didn't solve the issue; I think what I need is a better regular expression which simply will not match content ending in blue"> ('blue' in this case being placeholder for any other color in the array I'm comparing it against).

Can anyone append what I need to the regular expression? Or do you have a better solution?

Permissible answered 17/6, 2013 at 6:13 Comment(4)
Better solution would be to use DOM parser instead of regex to parse and alter HTML text.Annihilation
Have you looked at DOMDocument for example? I'd suggest reading a few examples and try to work it out.Perrins
As @Annihilation suggested, you can look at here: developer.mozilla.org/en-US/docs/Web/API/DOMParserTriserial
Anyway...what should be your wanted result after substitution?Rudbeckia
B
1

Regex replaces often seem like the solution but they can have a lot of ill side-effects, and not really accomplish what you want. Look into DOMDocument models instead (as some commenters have suggested).

But if you insist on using regex, here's a good post on SO. It uses two passes to accomplish what you want.

Brod answered 28/10, 2013 at 20:57 Comment(0)
I
3

Davey, resurrecting this question because apart from the Dom solution, there is a better regex solution than the one mentioned so far. It's a simple solution that requires a single step.

The general solution is

<[^>]*>(*SKIP)(*F)|blue

Here's a demo

Any content within <> tags is simply skipped. Content in between tags, such as blue is matched, which sounds like it fits your needs.

In the expression, replace "blue" for what you like.

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...
Internuncial answered 12/5, 2014 at 2:9 Comment(3)
This is an absolutely brilliant resolution to the issue using purely RegEx. I had no idea *SKIP existed.Permissible
Yes, it's a terrific feature, available only in Perl and PCRE (PHP, R, Delphi, N++...)Internuncial
I'd love to give the answer to you for this, even after all this time, but it seems like DOM parsing is still best practice. But I love RegEx sooooo much!Permissible
B
1

Regex replaces often seem like the solution but they can have a lot of ill side-effects, and not really accomplish what you want. Look into DOMDocument models instead (as some commenters have suggested).

But if you insist on using regex, here's a good post on SO. It uses two passes to accomplish what you want.

Brod answered 28/10, 2013 at 20:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.