Whitespace stripping with XslCompiledTransform
Asked Answered
B

4

10

I'm trying to migrate a large app from XslTransform to compiled xsl files and XslCompiledTransform.

The app uses the Xsl to create HTML files, and the transformation data (Xml) was passed to the Xsl with a XmlDataDocument, returned from the database.

I've change all that so now I do (at least temporarily):

C#

 public string ProcessCompiledXsl(XmlDataDocument xml)
 {
       StringBuilder stringControl = new StringBuilder();
       XslCompiledTransform xslTran = new XslCompiledTransform();

       xslTran.Load(
           System.Reflection.Assembly.Load("CompiledXsl").GetType(dllName)
       );

       xslTran.Transform(xml, this.Arguments, XmlWriter.Create(stringControl, othersettings), null);

       return stringControl.ToString();
 }

XSL (just an example)

...
  <xsl:output method="html" indent="yes"/>
  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>

Problem

That works, but the xsl is stripping the whitespaces between the tags outputting:

<a href="#">
   some text
</a><a href="#">
   some text
</a><a href="#">
   some text
</a><a...etc

I've tried:

  • Using xml:space="preserve" but I couldn't get it to work
  • Overriding the OutputSettings, but I didn't get any good results (maybe I missed something)
  • Using an xsl:output method="xml", and that works, but creates self closing tags and a lot of other problems

So I don't know what to do. Maybe I'm not doing something right.Any help it's really appreciated.

Thanks!

EDIT

Just for future references, if you want to tackle this problem leaving every XSL intact, one could try this C# class I wrote, named CustomHtmlWriter.

Basically what I did is extend from XmlTextWriter and modify the methods that write the start and the end of every tag.

In this particular case, you would use it like this:

    StringBuilder sb = new StringBuilder();
    CustomHtmlWriter writer = new CustomHtmlWriter(sb);

    xslTran.Transform(nodeReader, this.Arguments, writer);

    return sb.ToString();

Hope it helps someone.

Brigitta answered 31/8, 2012 at 15:40 Comment(0)
L
5

I. Solution 1:

Let me first analyze the problem here:

Given this source XML document (invented, as you haven't provided any):

<Object>
 <Table>

 </Table>

 <Table>

 </Table>

 <Table>

 </Table>

 <Table>

 </Table>
</Object>

This transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html" indent="yes"/>

  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>
<!--
 <xsl:template match="Table">
   <a href="#">
    Table here
   </a>
 </xsl:template>
 -->
</xsl:stylesheet>

exactly reproduces the problem -- the result is:

<a href="#">
                     some text
              </a><a href="#">
                     some text
              </a><a href="#">
                     some text
              </a><a href="#">
                     some text
              </a>

Now, just uncomment the commented template and comment out the first template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html" indent="yes"/>
<!--
  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>
 -->
 <xsl:template match="Table">
   <a href="#">
    Table here
   </a>
 </xsl:template>
</xsl:stylesheet>

The result has the wanted indentation:

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

And this was solution 1


II. Solution 2:

This solution may reduce to minimum the required modifications to your existing XSLT code:

This is a two-pass transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="ext">
 <xsl:output method="html"/>

  <xsl:template match="/">
    <xsl:variable name="vrtfPass1">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
    </xsl:variable>

    <xsl:apply-templates select=
        "ext:node-set($vrtfPass1)" mode="pass2"/>
  </xsl:template>

 <xsl:template match="node()|@*" mode="pass2">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="pass2"/>
  </xsl:copy>
 </xsl:template>

  <xsl:template mode="pass2" match="*[preceding-sibling::node()[1][self::*]]">
   <xsl:text>&#xA;</xsl:text>
   <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>

The idea is that we don't even touch the existing code, but capture its output and using a few lines of additional code only, we format the output to have the wanted, final appearance.

When this transformation is applied on the same XML document, the same, wanted result is produced:

<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>

Finally, here is a demonstration how this minor change can be introduced, without touching at all any existing XSLT code:

Let's have this existing code in c:\temp\delete\existing.xsl:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html"/>

  <xsl:template match="/">
    <xsl:for-each select="//Object/Table">
      <a href="#">
        some text
      </a>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

If we run this we get the problematic output.

Now, instead of running existing.xsl, we run this transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="ext">
 <xsl:import href="file:///c:/temp/delete/existing.xsl"/>
 <xsl:output method="html"/>


  <xsl:template match="/">
    <xsl:variable name="vrtfPass1">
       <xsl:apply-imports/>
    </xsl:variable>

    <xsl:apply-templates select=
        "ext:node-set($vrtfPass1)" mode="pass2"/>
  </xsl:template>

 <xsl:template match="node()|@*" mode="pass2">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="pass2"/>
  </xsl:copy>
 </xsl:template>

  <xsl:template mode="pass2" match="*[preceding-sibling::node()[1][self::*]]">
   <xsl:text>&#xA;</xsl:text>
   <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>

The result is the wanted one and the existing code is untouched at all:

<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>

Explanation:

  1. We import any existing code that is at the top level of the import-precedence hierarchy (not imported by other stylesheets), using xsl:import.

  2. We capture the output of the existing transformation in a variable. It has the infamous RTF (Result Tree Fragment) that needs to be converted to regular tree to be processed further.

  3. The key moment is performing xsl:apply-imports when capturing the output of the transformation. This ensures that any template from the existing code (even one that we override -- such as the template matching /) will be selected for execution as in the case when the existing transformation is performed by itself).

  4. We convert the RTF into a regular tree using the msxsl:node-set() extension function (XslCompiledTransform also supports the EXSLT node-set() extension function).

  5. We perform our cosmetic adjustments on the so produced regular tree.

Do Note:

This represents a general algorithm for post-processing existing transformations without touching the existing code.

Ladner answered 2/9, 2012 at 22:12 Comment(3)
Ok, perfect. So the "problem" was expected behavior...may I ask if you know why did they implement it this way? XslTransform leaves the XSL untoched. And, if you can, do you know a good source to learn XSL in general?, I can google but it's always better if it's readproof :). Thanks!Brigitta
@Nicosunshine, I don't know the reason why the XSLT processor doesn't produce good indentation. My guess is that this reflects the requirements of the HTML serialization (or the developers wanted to preserve some memory -- especially when sending the transformation results over the wire/network). As for good XSLT learning sources, Michael Kays books are the best. They seem big, but just read them thouroughly and you'll get very good grasp of the subject. See this answer for more resources: #340430Ladner
Thank you very much, I'll try to implement your solution (and promote better coding here). If I can't, I've extended XmlTextWriter to meet some of my needs, I'll edit the answer with a git for future references.Brigitta
U
1

I don't remember the details of XML/XSLT space preservation off the top of my head, but one instance where it's more likely to discard whitespace is between elements where there is no non-whitespace text (i.e. whitespace-only text nodes, like the one between </a> and </xsl:for-each>). You can prevent this by using the <xsl:text> element.

For example, after

          <a href="#">
                 some text
          </a>

put

          <xsl:text>&#10;</xsl:text>

I.e. a literal line end character.

Does that meet your requirements?

Uncouple answered 31/8, 2012 at 16:3 Comment(5)
That "worked", but the problem is that I have a lot of xsl files and if this is the only solution, I'll have to go over each one of them and add the explicit newline. If it is possible I want to avoid that ^^Brigitta
Hm, I don't know if that exactly what I need, but nevertheless adding that to the xsl, results in this exception: White space cannot be stripped from input documents that have already been loaded. Provide the input document as an XmlReader instead.. I'll try to google it to see what happens (thanks by the way)Brigitta
xsl:preserve-space affects the handling of whitespace in the source document, not whitespace in the stylesheet. This advice is completely wrong.Iover
@MichaelKay: OK, I misread the spec as saying that for stylesheets, the set of whitespace-preserving element names consists of just xsl:text, modified by what's in <xsl:preserve-space>. As you point out, it doesn't say the latter part.Uncouple
@Nicosunshine: I removed the update; it looks like you're back to my original answer, using <xsl:text>.Uncouple
E
1

I think the problem is:

  <xsl:output method="html" indent="yes"/> 

If I remember correctly html tries to only care about whitespace which is important to how the HTML will be displayed.

If you try:

  <xsl:output method="xml" indent="yes"/> 

Then it should create the indented whitespace you expect.

Ebullition answered 31/8, 2012 at 16:37 Comment(2)
This works, but it creates another problems in my html, like self closing tags (It's in the question).Brigitta
The trouble about using indent="yes" is that it doesn't let you control where whitespace is output and where it isn't. If you want a visible space between two hyperlinks, you need that control.Iover
I
1

Whitespace text nodes in the stylesheet are always ignored, unless they are contained in xsl:text. If you want to output whitespace to the result tree, use xsl:text.

(It's also possible to use xml:space="preserve" in the stylesheet, but it's generally not advisable as it has unwanted side-effects.)

Iover answered 31/8, 2012 at 18:36 Comment(4)
Ok, there's just one thing I don't understand. Why does it render the newlines correctly when there's no xsl present, but it strip them when I use xsl:each? example: gist.github.com/3557306 . Thanks.Brigitta
@Nicosunshine: It's not clear what you mean. By "when there's no XSL present" do you mean when you're not running an XSLT transformation? In that case, what kind of processing is happening, if any? If no processing is happening, then of course the text won't be changed.Uncouple
Sorry for not being clear. I meant when, using the XSL, I use non XSL structures, like a div. An example of what I mean is in the gist in my first comment, with the structure that has the div with the class of parentBrigitta
Sorry, I don't know what example you are referring to. Remember that it's "whitespace text nodes" in the stylesheet that are ignored, not "whitespace". Newlines in the stylesheet are significant if they are part of a text node that is not all-whitespace.Iover

© 2022 - 2024 — McMap. All rights reserved.