I'm attempting to non-greedily parse out TD tags. I'm starting with something like this:
<TD>stuff<TD align="right">More stuff<TD align="right>Other stuff<TD>things<TD>more things
I'm using the below as my regex:
Regex.Split(tempS, @"\<TD[.\s]*?\>");
The records return as below:
""
"stuff<TD align="right">More stuff<TD align="right>Other stuff"
"things"
"more things"
Why is it not splitting that first full result (the one starting with "stuff")? How can I adjust the regex to split on all instances of the TD tag with or without parameters?
.
just means a literal dot in character class[.]
, not 'any character. You may have more success with[^>]*
, but it would break on a>
in an attribute (which is one of the reasons why we often look at parsers rather the regexes to manipulate html & xml). – Gerhardine/s
) to make the dot match all. However[^>]*>
is functionally equivalent to(.|\s)*?>
, and probably easier on the regex. – Gerhardine