The value '<!\[CDATA\[.*?\]\]>|[^<>&]*' of the facet 'pattern' is not a valid regular expression
Asked Answered
L

1

0

I'm trying to use a regex to validate a field in my xml using xsd. I came up with the regex to do what I want which is to disallow special characters unless the text is wrapped in CDATA tags. This is the regular expression I came up with:

<!\[CDATA\[.*?\]\]>|[^<>&]*

Works great when I test it on http://regexr.com/ to match my pattern. The problem is when I try to then plug it into a simpleType pattern restriction I'm getting an error saying its not a valid regular expression.

The value '<!\[CDATA\[.*?\]\]>|[^<>&]*' of the facet 'pattern' is not a valid regular expression.

Here is my xsd code (note I had to replace &<> in the regular expression with &lt; &gt; and &amp; so it would be valid xml):

<xs:element name="description" minOccurs="1" maxOccurs="1">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="&lt;!\[CDATA\[.*?\]\]&gt;|[^&lt;&gt;&amp;]*"/>          
        </xs:restriction>
    </xs:simpleType>
</xs:element>

So I assume there's something about the way regex works in xsd patterns that I'm not getting.

Lachrymose answered 17/7, 2015 at 18:27 Comment(0)
L
0

By process of elimination I was able to find a fix. First I determined that the problem somewhere in the <!\[CDATA\[.*?\]\]> part of the expression. So I tore this out of my expression and slowly adding bits back until I got the same error. The character that what causing the problem was the ?. The expression still works without the ? even if there is nothing inside the CDATA.

I'm not sure 1) why the person I got that pattern from included it, and 2) why doesn't xsd allow it in my pattern?

Lachrymose answered 17/7, 2015 at 21:44 Comment(3)
It's the non-greedy operator, that selects the minimal, rather than the maximal, match. I don't know what regex engine your XSD parser uses. I don't think all engines have the ?. The standard Java engine does, howeverOverwrought
Thanks for the clarification. This is why regex are so frustrating. Everything depends on your environment. In my case it works fine without the non-greedy so I'll stick with it.Lachrymose
.*? is invalid Regex all around actually. .* means it's optional already. (i.e. you're looking for zero or more matches, and you can't get any more optional than that!). -- You should also note that regexes are GREEDY -- so if you have more than one CDATA, this expression will begin matching at the start of the first one, and not end until the second one (because that's what .* does).Blum

© 2022 - 2024 — McMap. All rights reserved.