Case insensitive XML parser in c#
Asked Answered
N

5

15

Everything you do with XML is case sensitive, I know that.

However, right now I find myself in a situation, where the software I'm writing would yield much fewer errors if I somehow made xml name/attribute recognition case insensitive. Case insensitive XPath would be a god sent.

Is there an easy way/library to do that in c#?

Nugent answered 17/2, 2012 at 20:14 Comment(2)
Not likely. But you could do XElement.Parse(xmlText.Tolower())Hindsight
An XMl document can have two different elements named respectively: MyName and myName -- that are intended to be different. Converting/treating them as the same name is an error that can have gross consequences.Girder
G
15

An XMl document can have two different elements named respectively: MyName and myName -- that are intended to be different. Converting/treating them as the same name is an error that can have gross consequences.

In case the above is not the case, then here is a more precise solution, using XSLT to process the document into one that only has lowercase element names and lowercase attribute names:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="vUpper" select=
 "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>

 <xsl:variable name="vLower" select=
 "'abcdefghijklmnopqrstuvwxyz'"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[name()=local-name()]" priority="2">
  <xsl:element name="{translate(name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="*" priority="1">
  <xsl:element name=
   "{substring-before(name(), ':')}:{translate(local-name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="@*[name()=local-name()]" priority="2">
  <xsl:attribute name="{translate(name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match="@*" priority="1">
  <xsl:attribute name=
   "{substring-before(name(), ':')}:{translate(local-name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
     <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on any XML document, for example this one:

<authors xmlns:user="myNamespace">
  <?ttt This is a PI ?>
  <Author xmlns:user2="myNamespace2">
    <Name idd="VH">Victor Hugo</Name>
    <user2:Name idd="VH">Victor Hugo</user2:Name>
    <Nationality xmlns:user3="myNamespace3">French</Nationality>
  </Author>
  <!-- This is a very long comment the purpose is
       to test the default stylesheet for long comments-->
  <Author Period="classical">
    <Name>Sophocles</Name>
    <Nationality>Greek</Nationality>
  </Author>
  <author>
    <Name>Leo Tolstoy</Name>
    <Nationality>Russian</Nationality>
  </author>
  <Author>
    <Name>Alexander Pushkin</Name>
    <Nationality>Russian</Nationality>
  </Author>
  <Author Period="classical">
    <Name>Plato</Name>
    <Nationality>Greek</Nationality>
  </Author>
</authors>

the wanted, correct result (element and attribute names converted to lowercase) is produced:

<authors><?ttt This is a PI ?>
   <author>
      <name idd="VH">Victor Hugo</name>
      <user2:name xmlns:user2="myNamespace2" idd="VH">Victor Hugo</user2:name>
      <nationality>French</nationality>
   </author><!-- This is a very long comment the purpose is
       to test the default stylesheet for long comments-->
   <author period="classical">
      <name>Sophocles</name>
      <nationality>Greek</nationality>
   </author>
   <author>
      <name>Leo Tolstoy</name>
      <nationality>Russian</nationality>
   </author>
   <author>
      <name>Alexander Pushkin</name>
      <nationality>Russian</nationality>
   </author>
   <author period="classical">
      <name>Plato</name>
      <nationality>Greek</nationality>
   </author>
</authors>

Once the document is converted to your desired form, then you can perform any desired processing on the converted document.

Girder answered 17/2, 2012 at 21:28 Comment(6)
Is there any C++ code to convert XML attributes and nodes to uppercase letters or lowecase letters?Michamichael
@DavidAlex: You will need to call the functions of an XSLT processor that can be called in C++. You need to determine which one is best for you -- MSXML/MSXSL, Saxon/C, or another product. Then read the documentation of the chosen product and understand the code examples.Girder
@DimitreNovatchev, I want to use MSXML/MSXSL. do you have any sample codes for read xml and run the xsl conversion to convert attributes and nodes to uppercase letters or lowercase letters? I am quite new to XSL conversion and need help!. ThanksMichamichael
@DavidAlex -- you can use Microsoft's MSXML6 SDK -- it can be downloaded from here: microsoft.com/en-us/download/details.aspx?id=3988 . This should contain extensive documentation of the major types/classes and examples how to call their methods from different programming languages. There is an older one -- for MSXML4 and it can be downloaded here: microsoft.com/en-us/download/details.aspx?id=19662 Either of these should fit your needs and requirementsGirder
@DimitreNovatchev, the xslt you provided causes all attributes on the root element to be lost. This is shown even in your example above. What would I change in the xlst to not have them be lost?Dazzle
@tig, No attribute is "lost". What you see missing are some namespaces that are not used in the descendents of the element. And there is nothing bad with this :) But if you must have these namespaces retained in the result of the transformation, you can add this: <xsl:copy-of select="namespace::*"/> before any <xsl:apply-templates select="node()|@*"/>Girder
W
13

You can create case-insensitive methods (extensions for usability), e.g.:

public static class XDocumentExtensions
{
    public static IEnumerable<XElement> ElementsCaseInsensitive(this XContainer source,  
        XName name)
    {
        return source.Elements()
            .Where(e => e.Name.Namespace == name.Namespace 
                && e.Name.LocalName.Equals(name.LocalName, StringComparison.OrdinalIgnoreCase));
    }
}
Winonawinonah answered 17/2, 2012 at 20:44 Comment(0)
S
6

XML is text. Just ToLower it before loading to whatever parser you are using.

So long as you don't have to validate against a schema and don't mind the values being all lower case, this should work just fine.


The fact is that any XML parser will be case sensitive. If it were not, it wouldn't be an XML parser.

Slippy answered 17/2, 2012 at 20:15 Comment(6)
But you probably don't want to ToLower your valuesSibby
@Chad - Probably. I did put that caveat in my answer.Slippy
I have thought of that. In most cases this would work, except that sometimes fields might contain information, in which I want the case to be preserved. Like, for example, passwords, hashes and other stuff for external world. On the other hand, I do not really need to differentiate between Name and name attributes in xhtmlNugent
Is there a Name attribute in XHTML, or is it name?Ticklish
sometimes the one, sometimes the other. that's the problem I'm having.Nugent
Such total lowering "kills" certain elements. For example" aName, AName, and anamE all become aname. The second big problem with this idea is that it alters not only names, but also the content of text nodes and attributes. It also changes the values of namespaces, which makes the XML document totally unusable. A quick example: converting "xmlns:xsl="http://www.w3.org/1999/XSL/Transform" to "xmlns:xsl="http://www.w3.org/1999/xsl/transform" and the XML document (a syntactically valid xslt stylesheet) is now rejected by the XSLT processor.Girder
P
3

I use another solution. The reason people want this is because you don't want to duplicate the name of the property in the class file in an attribute as well. So what I do is add a custom attribute to all properties:

[AttributeUsage(AttributeTargets.Property)]
public class UsePropertyNameToLowerAsXmlElementAttribute: XmlElementAttribute
{
    public UsePropertyNameToLowerAsXmlElementAttribute([CallerMemberName] string propertyName = null)
    : base(propertyName?.ToLower())
    {
    }
}

This way the XML serializer can map lower case properties to CamelCased classes.

The properties on the classes still have a decorator that says that something is different, but you don't have the overhead of marking every property with a name:

public class Settings
{
    [UsePropertyNameToLowerAsXmlElement]
    public string VersionId { get; set; }

    [UsePropertyNameToLowerAsXmlElement]
    public int? ApplicationId { get; set; }
}
Parnassus answered 1/4, 2019 at 13:2 Comment(0)
P
1

I would start by converting all tags and attribute names to lowercase, leaving values untouched, by using SAX parsing, ie. with XmlTextReader.

Polston answered 10/3, 2012 at 0:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.