I was trying to parse out text between specific tags on a mac in various html files. I was looking for the first <H1>
heading in the body. Example:
<BODY>
<H1>Dublin</H1>
Using regular expressions for this I believe is an anti pattern so I used xmllint and xpath instead.
xmllint --nowarning --xpath '/HTML/BODY/H1[0]'
Problem is some of the HTML files contain badly formed tags. So I get errors on the lines of
parser error : Opening and ending tag mismatch: UL line 261 and LI
</LI>
Problem is I can't just do, 2>/dev/null
as then I loose those files altogether. Is there any way, I can just use an XPath expression here and just say, relax if the XML isn't perfect, just give me the value between the first H1 headings?