XML to CSV Using XSLT
Asked Answered
C

4

82

I have the following XML document:

<projects>
  <project>
   <name>Shockwave</name> 
   <language>Ruby</language> 
   <owner>Brian May</owner> 
   <state>New</state> 
   <startDate>31/10/2008 0:00:00</startDate> 
  </project>
  <project>
   <name>Other</name> 
   <language>Erlang</language> 
   <owner>Takashi Miike</owner> 
   <state> Canceled </state> 
   <startDate>07/11/2008 0:00:00</startDate> 
  </project>
...

And I'd like to get this from the transformation (XSLT) result:

Shockwave,Ruby,Brian May,New,31/10/2008 0:00:00
Other,Erlang,Takashi Miike,Cancelled,07/11/2008 0:00:00

Does anyone know the XSLT to achieve this? I'm using .net in case that matters.

Crenelation answered 13/12, 2008 at 15:12 Comment(4)
.NET only matters if you are using the XslTransform Class that only supports xslt 1.0. Is this a constraint? If so, it should be re-tagged to xslt-1.0.Unknowable
A good answer is also given here to a similar question if you use Linux askubuntu.com/questions/174143/…Coverdale
There is the tool xml2csv. Maybe, this is useful in your case, too?Rickey
xml2csv worked for me just fine. I found that the documentation is broken, but took a moment to document the work-around which produced usable results for me, here: github.com/fordfrog/xml2csv/issues/5#issuecomment-726542532Plucky
D
47

Found an XML transform stylesheet here (wayback machine link, site itself is in german)

The stylesheet added here could be helpful:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1"/>

<xsl:strip-space elements="*" />

<xsl:template match="/*/child::*">
<xsl:for-each select="child::*">
<xsl:if test="position() != last()">"<xsl:value-of select="normalize-space(.)"/>",    </xsl:if>
<xsl:if test="position()  = last()">"<xsl:value-of select="normalize-space(.)"/>"<xsl:text>&#xD;</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Perhaps you want to remove the quotes inside the xsl:if tags so it doesn't put your values into quotes, depending on where you want to use the CSV file.

Donnetta answered 13/12, 2008 at 15:41 Comment(8)
Be careful, if there is a comma in the original data, it is not escaped. You may want to add a test with contains() and an escape with translate().Alannaalano
I don't think this handles a double quote in the data. To escape a double-quote you must substitute it with two double-quotes.Bein
Normally, one only needs to enclose a value in quotes if it contains any of the following: the delimiter (','), the quote ('"'), a newline (&#xD;). If quoting is required, any inner quotes must first be doubled ('""').Giarla
See the edit in this question if you need to select attributes as well.Guanaco
Correct new line on unix is &#10; (\n). &#xD; is hexadecimal \rGnotobiotics
It works but not exactly as required. It adds new line after every column. Also is there a way to add header (in the first row)? I translate as below to me. Shockwave, Ruby, Brian May, New, 31/10/2008 0:00:00 Other, Erlang, Takashi Miike, Canceled, 07/11/2008 0:00:00Papaverine
The Link is deadGerita
@BotMaster3000: thanks, replaced with a wayback machine linkDonnetta
D
58

Here is a version with configurable parameters that you can set programmatically:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" encoding="utf-8" />

  <xsl:param name="delim" select="','" />
  <xsl:param name="quote" select="'&quot;'" />
  <xsl:param name="break" select="'&#xA;'" />

  <xsl:template match="/">
    <xsl:apply-templates select="projects/project" />
  </xsl:template>

  <xsl:template match="project">
    <xsl:apply-templates />
    <xsl:if test="following-sibling::*">
      <xsl:value-of select="$break" />
    </xsl:if>
  </xsl:template>

  <xsl:template match="*">
    <!-- remove normalize-space() if you want keep white-space at it is --> 
    <xsl:value-of select="concat($quote, normalize-space(), $quote)" />
    <xsl:if test="following-sibling::*">
      <xsl:value-of select="$delim" />
    </xsl:if>
  </xsl:template>

  <xsl:template match="text()" />
</xsl:stylesheet>
Deppy answered 13/12, 2008 at 16:13 Comment(3)
I like the mandatory quoting. At least when importing into, Excel, it takes care of the case where there is a $delim in the original data.Alannaalano
What we need to do if we want to also include the column names??Jollanta
@omer There are a couple of ways to do this, depending on your XML. It's best if you ask a new question, because the comment section is not a good place to discuss things like this, and because in this thread it was not part of the question, so I won't edit the answer.Deppy
D
47

Found an XML transform stylesheet here (wayback machine link, site itself is in german)

The stylesheet added here could be helpful:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1"/>

<xsl:strip-space elements="*" />

<xsl:template match="/*/child::*">
<xsl:for-each select="child::*">
<xsl:if test="position() != last()">"<xsl:value-of select="normalize-space(.)"/>",    </xsl:if>
<xsl:if test="position()  = last()">"<xsl:value-of select="normalize-space(.)"/>"<xsl:text>&#xD;</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Perhaps you want to remove the quotes inside the xsl:if tags so it doesn't put your values into quotes, depending on where you want to use the CSV file.

Donnetta answered 13/12, 2008 at 15:41 Comment(8)
Be careful, if there is a comma in the original data, it is not escaped. You may want to add a test with contains() and an escape with translate().Alannaalano
I don't think this handles a double quote in the data. To escape a double-quote you must substitute it with two double-quotes.Bein
Normally, one only needs to enclose a value in quotes if it contains any of the following: the delimiter (','), the quote ('"'), a newline (&#xD;). If quoting is required, any inner quotes must first be doubled ('""').Giarla
See the edit in this question if you need to select attributes as well.Guanaco
Correct new line on unix is &#10; (\n). &#xD; is hexadecimal \rGnotobiotics
It works but not exactly as required. It adds new line after every column. Also is there a way to add header (in the first row)? I translate as below to me. Shockwave, Ruby, Brian May, New, 31/10/2008 0:00:00 Other, Erlang, Takashi Miike, Canceled, 07/11/2008 0:00:00Papaverine
The Link is deadGerita
@BotMaster3000: thanks, replaced with a wayback machine linkDonnetta
F
22

This xsl:stylesheet can use a specified list of column headers and will ensure that the rows will be ordered correctly. It requires XSLT version 2.0.

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:csv="csv:csv">
    <xsl:output method="text" encoding="utf-8"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="delimiter" select="','"/>

    <csv:columns>
        <column>name</column>
        <column>sublease</column>
        <column>addressBookID</column>
        <column>boundAmount</column>
        <column>rentalAmount</column>
        <column>rentalPeriod</column>
        <column>rentalBillingCycle</column>
        <column>tenureIncome</column>
        <column>tenureBalance</column>
        <column>totalIncome</column>
        <column>balance</column>
        <column>available</column>
    </csv:columns>

    <xsl:template match="/property-manager/properties">
        <!-- Output the CSV header -->
        <xsl:for-each select="document('')/*/csv:columns/*">
                <xsl:value-of select="."/>
                <xsl:if test="position() != last()">
                    <xsl:value-of select="$delimiter"/>
                </xsl:if>
        </xsl:for-each>
        <xsl:text>&#xa;</xsl:text>
    
        <!-- Output rows for each matched property -->
        <xsl:apply-templates select="property"/>
    </xsl:template>

    <xsl:template match="property">
        <xsl:variable name="property" select="."/>
    
        <!-- Loop through the columns in order  -->
        <xsl:for-each select="document('')/*/csv:columns/*">
            <!-- Extract the column name and value  -->
            <xsl:variable name="column" select="."/>
            <xsl:variable name="value" select="$property/*[name() = $column]"/>
        
            <!-- Quote the value if required -->
            <xsl:choose>
                <xsl:when test="contains($value, '&quot;')">
                    <xsl:variable name="x" select="replace($value, '&quot;',  '&quot;&quot;')"/>
                    <xsl:value-of select="concat('&quot;', $x, '&quot;')"/>
                </xsl:when>
                <xsl:when test="contains($value, $delimiter)">
                    <xsl:value-of select="concat('&quot;', $value, '&quot;')"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:value-of select="$value"/>
                </xsl:otherwise>
            </xsl:choose>
        
            <!-- Add the delimiter unless we are the last expression -->
            <xsl:if test="position() != last()">
                <xsl:value-of select="$delimiter"/>
            </xsl:if>
        </xsl:for-each>
    
        <!-- Add a newline at the end of the record -->
        <xsl:text>&#xa;</xsl:text>
    </xsl:template>

</xsl:stylesheet>
Floccus answered 22/2, 2012 at 11:40 Comment(7)
This is nice, but it would not work. replace() is an XPath 2.0 function. In XSLT 1.0 you'd have to use a recursive string replace template.Deppy
Worked for me with xsltproc/libxslt - it was good enough. Thanks for pointing out the requirements though.Floccus
@hd1, I'm still using this script in production so probably you are doing something wrong?Floccus
@hd1, perhaps if you told me exactly what is going wrong I'd be able to help?Floccus
@hd1, perhaps you can explain what your problem was and how you worked around it so other developers may benefit from your knowledge :)Floccus
Didn't use xsl at all, and decided to use SAXLaurence
If using Microsoft xml engine you can try msxsl script <msxsl:script language="C#" implements-prefix="scr"> <![CDATA[ public string Replace(string stringToModify, string pattern, string replacement) { return stringToModify.Replace(pattern, replacement); } ]]> </msxsl:script> Then use <xsl:variable name="x" select="scr:Replace($value, '&quot;', '&quot;&quot;')"/> and add these attributes to stylesheet node <xsl:output method="text" encoding="utf-8" /> <xsl:strip-space elements="*"Dare
M
11

This CsvEscape function is XSLT 1.0 and escapes column values ,, ", and newlines like RFC 4180 or Excel. It makes use of the fact that you can recursively call XSLT templates:

  • The template EscapeQuotes replaces all double quotes with 2 double quotes, recursively from the start of the string.
  • The template CsvEscape checks if the text contains a comma or double quote, and if so surrounds the whole string with a pair of double quotes and calls EscapeQuotes for the string.

Example usage: xsltproc xmltocsv.xslt file.xml > file.csv

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" encoding="UTF-8"/>

  <xsl:template name="EscapeQuotes">
    <xsl:param name="value"/>
    <xsl:choose>
      <xsl:when test="contains($value,'&quot;')">
    <xsl:value-of select="substring-before($value,'&quot;')"/>
    <xsl:text>&quot;&quot;</xsl:text>
    <xsl:call-template name="EscapeQuotes">
      <xsl:with-param name="value" select="substring-after($value,'&quot;')"/>
    </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
    <xsl:value-of select="$value"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="CsvEscape">
    <xsl:param name="value"/>
    <xsl:choose>
    <xsl:when test="contains($value,',')">
      <xsl:text>&quot;</xsl:text>
      <xsl:call-template name="EscapeQuotes">
    <xsl:with-param name="value" select="$value"/>
      </xsl:call-template>
      <xsl:text>&quot;</xsl:text>
    </xsl:when>
    <xsl:when test="contains($value,'&#xA;')">
      <xsl:text>&quot;</xsl:text>
      <xsl:call-template name="EscapeQuotes">
    <xsl:with-param name="value" select="$value"/>
      </xsl:call-template>
      <xsl:text>&quot;</xsl:text>
    </xsl:when>
    <xsl:when test="contains($value,'&quot;')">
      <xsl:text>&quot;</xsl:text>
      <xsl:call-template name="EscapeQuotes">
    <xsl:with-param name="value" select="$value"/>
      </xsl:call-template>
      <xsl:text>&quot;</xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$value"/>
    </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
  <xsl:template match="/">
    <xsl:text>project,name,language,owner,state,startDate</xsl:text>
    <xsl:text>&#xA;</xsl:text>
    <xsl:for-each select="projects/project">
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(name)"/></xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(language)"/></xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(owner)"/></xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(state)"/></xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(startDate)"/></xsl:call-template>
      <xsl:text>&#xA;</xsl:text>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>
Multifold answered 15/5, 2020 at 13:30 Comment(1)
This is the only solution that I have found to duplicate quotes with XSLT 1.0. Thanks!Lapidate

© 2022 - 2024 — McMap. All rights reserved.