How to handle the illegal HTML characters in XSL
Asked Answered
H

2

7

I have an XML file which stores data. I am using an XSL to generate HTML files from that XML file. When I try to do that I get the error Illegal HTML character: decimal 150

I am not allowed to change the XML file. I have to map that one and many other illegal characters to a legal character (it can be any) in XSL. So it has to do that mapping in a generic way not only for one type of character.

Hadrian answered 18/4, 2014 at 14:50 Comment(0)
S
10

You can define a character map that maps the characters not allowed to one allowed, for instance a space:

<xsl:output indent="yes" method="html" use-character-maps="m1"/>

<xsl:character-map name="m1">
  <xsl:output-character character="&#150;" string=" "/>
</xsl:character-map>

As an alternative, use a template replacing all illegal characters, according to http://www.w3.org/TR/xslt-xquery-serialization/#HTML_CHARDATA these are control characters #x7F-#x9F so using

<xsl:template match="text()">
  <xsl:value-of select="replace(., '[&#x007F;-&#x009F;]', ' ')"/>
</xsl:template>

should make sure those characters in text nodes in the input document are replaced by a spaces.

As another alternative, you could consider to output XHTML with elements in the XHTML namespaces and output method xhtml.

Based on the list of characters, a full character map mapping all illegal control characters to a space is

<xsl:character-map
                   name="no-control-characters">
   <xsl:output-character character="&#127;" string=" "/>
   <xsl:output-character character="&#128;" string=" "/>
   <xsl:output-character character="&#129;" string=" "/>
   <xsl:output-character character="&#130;" string=" "/>
   <xsl:output-character character="&#131;" string=" "/>
   <xsl:output-character character="&#132;" string=" "/>
   <xsl:output-character character="&#133;" string=" "/>
   <xsl:output-character character="&#134;" string=" "/>
   <xsl:output-character character="&#135;" string=" "/>
   <xsl:output-character character="&#136;" string=" "/>
   <xsl:output-character character="&#137;" string=" "/>
   <xsl:output-character character="&#138;" string=" "/>
   <xsl:output-character character="&#139;" string=" "/>
   <xsl:output-character character="&#140;" string=" "/>
   <xsl:output-character character="&#141;" string=" "/>
   <xsl:output-character character="&#142;" string=" "/>
   <xsl:output-character character="&#143;" string=" "/>
   <xsl:output-character character="&#144;" string=" "/>
   <xsl:output-character character="&#145;" string=" "/>
   <xsl:output-character character="&#146;" string=" "/>
   <xsl:output-character character="&#147;" string=" "/>
   <xsl:output-character character="&#148;" string=" "/>
   <xsl:output-character character="&#149;" string=" "/>
   <xsl:output-character character="&#150;" string=" "/>
   <xsl:output-character character="&#151;" string=" "/>
   <xsl:output-character character="&#152;" string=" "/>
   <xsl:output-character character="&#153;" string=" "/>
   <xsl:output-character character="&#154;" string=" "/>
   <xsl:output-character character="&#155;" string=" "/>
   <xsl:output-character character="&#156;" string=" "/>
   <xsl:output-character character="&#157;" string=" "/>
   <xsl:output-character character="&#158;" string=" "/>
   <xsl:output-character character="&#159;" string=" "/>
</xsl:character-map>

I generated that list with XSLT 2.0 and Saxon, using

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:axsl="http://www.w3.org/1999/XSL/TransformAlias"
  exclude-result-prefixes="xs axsl">

<xsl:param name="start" as="xs:integer" select="127"/>
<xsl:param name="end" as="xs:integer" select="159"/>

<xsl:param name="replacement" as="xs:string" select="' '"/>

<xsl:namespace-alias stylesheet-prefix="axsl" result-prefix="xsl"/>

<xsl:output method="xml" indent="yes" use-character-maps="character-reference"/>

<xsl:character-map name="character-reference">
  <xsl:output-character character="«" string="&amp;"/>
</xsl:character-map>

<xsl:template name="main">
  <axsl:character-map name="no-control-characters">
    <xsl:for-each select="$start to $end">
      <axsl:output-character character="«#{.};" string="{$replacement}"/>
    </xsl:for-each>
  </axsl:character-map>
</xsl:template>

</xsl:stylesheet>
Substitutive answered 18/4, 2014 at 15:13 Comment(2)
Thanks, it solved the problem for decimal 150. But how can I map more than 1 type? Is there a generic way?Hadrian
You can put as many xsl:output-character elements into the xsl:character-map as you need.Substitutive
L
1

use:

<xsl:output indent="yes" method="html" use-character-maps="illegal-characters"/>

with the following map:

<xsl:character-map name="illegal-characters">
    <xsl:output-character character="&#127;" string="&amp;#x7f"/>
    <xsl:output-character character="&#128;" string="&amp;#x80"/>
    <xsl:output-character character="&#129;" string="&amp;#x81"/>
    <xsl:output-character character="&#130;" string="&amp;#x82"/>
    <xsl:output-character character="&#131;" string="&amp;#x83"/>
    <xsl:output-character character="&#132;" string="&amp;#x84"/>
    <xsl:output-character character="&#133;" string="&amp;#x85"/>
    <xsl:output-character character="&#134;" string="&amp;#x86"/>
    <xsl:output-character character="&#135;" string="&amp;#x87"/>
    <xsl:output-character character="&#136;" string="&amp;#x88"/>
    <xsl:output-character character="&#137;" string="&amp;#x89"/>
    <xsl:output-character character="&#138;" string="&amp;#8a"/>
    <xsl:output-character character="&#139;" string="&amp;#x8b"/>
    <xsl:output-character character="&#140;" string="&amp;#x8c"/>
    <xsl:output-character character="&#141;" string="&amp;#x8d"/>
    <xsl:output-character character="&#142;" string="&amp;#x8e"/>
    <xsl:output-character character="&#143;" string="&amp;#x8f"/>
    <xsl:output-character character="&#144;" string="&amp;#x90"/>
    <xsl:output-character character="&#145;" string="&amp;#x91"/>
    <xsl:output-character character="&#146;" string="&amp;#x92"/>
    <xsl:output-character character="&#147;" string="&amp;#x93"/>
    <xsl:output-character character="&#148;" string="&amp;#x94"/>
    <xsl:output-character character="&#149;" string="&amp;#x95"/>
    <xsl:output-character character="&#150;" string="&amp;#x96"/>
    <xsl:output-character character="&#151;" string="&amp;#x97"/>
    <xsl:output-character character="&#152;" string="&amp;#x98"/>
    <xsl:output-character character="&#153;" string="&amp;#x99"/>
    <xsl:output-character character="&#154;" string="&amp;#x9a"/>
    <xsl:output-character character="&#155;" string="&amp;#x9b"/>
    <xsl:output-character character="&#156;" string="&amp;#x9c"/>
    <xsl:output-character character="&#157;" string="&amp;#x9d"/>
    <xsl:output-character character="&#158;" string="&amp;#x9e"/>
    <xsl:output-character character="&#159;" string="&amp;#x9f"/>
</xsl:character-map>
Lianne answered 8/6, 2022 at 15:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.