How do I define HTML entity references inside a valid XML document?
Asked Answered
P

2

22

I need to be able to reference named HTML entities like • instead of the Unicode alternative • in an XML document. I have control over some parts of the XML document, such as defining the DOCTYPE, but doing a find-and-replace in the actual XML is not an option. I can get some elements like   and & by including the XHTML transitional DOCTYPE, but I need to define more manually. How do I do this?

-- EDIT --

Thanks to Jim's answer, here's what I ended up with. This is great because I can utilize the XHTML transitional entities, and also add my own:

<!DOCTYPE
   html
   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
   [
      <!ENTITY bull  "&#8226;">
      <!ENTITY ldquo "&#8220;">
      <!ENTITY rdquo "&#8221;">
      ... etc ...
   ]
>
Parhe answered 28/6, 2011 at 15:20 Comment(1)
If you end up using a lot of entity declarations, consider putting them in a separate file and then using a parameter entity to reference them.Luster
L
19

If you can modify the the XML to include an inline DTD you can define the entities there:

<!DOCTYPE yourRootElement [
    <!ENTITY bull "&#8226;">
    ....
]>
Lilongwe answered 28/6, 2011 at 15:31 Comment(1)
This is magic! It can be used to build android manifest files as well, makes it easy to create a template without making any mistakes on the package name or the like...Maryannmaryanna
P
4

I'm not cetain, but I think the XHTML DTD's should give you quite a few entities (253):

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Entities_representing_special_characters_in_XHTML

Also in the w3 spec, there is a mention of additional DTD's for special characters etc. http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_dtd_xhtml_character_entities

However I haven't been able to find an implementation example of the special character DTDs.


Edit by DevNull

Here is an extremely generic example implementation of one of the entity DTD modules. To implement, you only need to add a parameter entity pointing to the module.

<?xml version="1.0"?>
<!DOCTYPE test [
<!ELEMENT test (#PCDATA)>
<!ENTITY % xhtml-special SYSTEM "xhtml-special.ent">
%xhtml-special;
]>
<test>Here is a left double quote: &ldquo;</test>
Provender answered 28/6, 2011 at 15:38 Comment(5)
I added an example of an implementation. I hope you don't mind. If this isn't what you meant, please feel free to delete my edit.Luster
Cool, nice one @DevNull didn't know you could do that. So does "test (#PCDATA)" tell the parser that "test" is a Parsed Character Data?Provender
It means that the test element can contain "parsed character data" (plain text).Luster
The DOM (Chrome) looks then like: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head></head><body>] &gt; So it is not parsed properly and redundant ]> characters are inserted into bodyMoselle
FileNotFoundException: xhtml-special.entWite

© 2022 - 2024 — McMap. All rights reserved.