How do I include &, <, > etc in XML attribute values
Asked Answered
E

2

67

I want to create an XML file which will be used to store the structure of a Java program. I am able to successfully parse the Java program and create the tags as required. The problem arises when I try to include the source code inside my tags, since Java source code may use a vast number of entity reference and reserved characters like &, < ,> , &. I am not able to create a valid XML.

My XML should go like this:

<?xml version="1.0"?>
<prg name="prg_name">
  <class name= "class_name>
    <parent>parent class</parent>
      <interface>Interface name</interface>
.
.
.
      <method name= "method_name">
        <statement>the ordinary java statement</statement>
        <if condition="Conditional Expression">
          <statement> true statements </statement>
        </if>
        <else>
          <statement> false statements </statement>
        </else>
        <statement> usual control statements </statement>
 .
 .
 .
      </method>
    </class>
 .
 .
 .
 </prg>

Like this, but the problem is conditional expressions of if or other statements have a lot of & or other reserved symbols in them which prevents XML from getting validated. Since all this data (source code) is given by the user I have little control over it. Escaping the characters will be very costly in terms of time.

I can use CDATA to escape the element text but it can not be used for attribute values containing conditional expressions. I am using Antlr Java grammar to parse the Java program and getting the attributes and content for the tags. So is there any other workaround for it?

Enamor answered 18/4, 2011 at 21:40 Comment(0)
Y
112

You will have to escape

" to  &quot;
' to  &apos;
< to  &lt;
> to  &gt;
& to  &amp;

for xml.

Yakut answered 18/4, 2011 at 21:42 Comment(2)
How about a + (plus)Keven
@LarsVandeDonk "+" is okay to go as it is, don't need to escape it in XML. Maybe you were talking about URL escape?Crampon
C
39

In XML attributes you must escape

" with &quot;
< with &lt;
& with &amp;

if you wrap attribute values in double quotes ("), e.g.

<MyTag attr="If a&lt;b &amp; b&lt;c then a&lt;c, it's obvious"/>

meaning tag MyTag with attribute attr with text If a<b & b<c then a<c, it's obvious - note: no need to use &apos; to escape ' character.

If you wrap attribute values in single quotes (') then you should escape these characters:

' with &apos;
< with &lt;
& with &amp;

and you can write " as is. Escaping of > with &gt; in attribute text is not required, e.g. <a b=">"/> is well-formed XML.

Crampon answered 15/9, 2015 at 14:29 Comment(6)
Why does XML require that special characters inside the quotes be escaped in case of attribute values? Only " or ' would need to be quoted... and anything else inside that string could simply be considered as content!Trevor
I guess it's a pre-caution against badly written XML parsers and / or incorrect XML. For example, if quotes for attributes are omitted (<tag attr=value></tag>).Crampon
Not an expert but I would suspect this is an historical precaution due to SGML that was originally used to define HTML and other type markup langue.Gruff
Even with modern parsers, the closing tag is the problem. Starting tag doesn't give any error.Annihilator
This is more correct than the accepted answer because it provides the minimal set of necessary escapes.Freon
@Annihilator That's a weird parser you have there then. According to the specification only < needs to be escaped: AttValue ::= '"' ([^<&"] | Reference)* '" -- XML 1.0 (Fifth Edition). That said this is a case in point for favouring caution.Bobettebobina

© 2022 - 2024 — McMap. All rights reserved.