Convert DTD to XSD with defined root (starting) element
Asked Answered
G

2

8

I have several large DTD files. I've used trang to convert them into XSD files, so I could easily use it from JAXB and other utilities. However, the generated XSD file has all declared elements at the top level. It means that any element could be root element of an input XML. I want to specify only a particular element.

Having these multiple root elements causes a few problems, e.g. xjc generates @XmlRootElement for all classes, so I need to add more additional checks.

As I understand, I need to rewrite the generated XSD, moving <xs:element>s to <xs:complexType>s, changing element refs into element types and so on, but this would be too much monkey work, with no way to verify if all done correctly.

Is there a more efficient way to do this?

Girish answered 24/5, 2012 at 10:16 Comment(4)
+1 for a question that makes sense, but let's also make it clear that DTD to XSD conversion is always only approximate.Olomouc
@JirkaHanika As I understand the generated XSD approximates a DTD very well. Except maybe DOCTYPE definitions (no surprise though) and some namespace stuff. Also some weird DTD constructs could not be transformed into XSD neatly. The only problem I'm facing at the moment, that DTD doesn't define notion of root element. (RelaxNG does define it with <start>, but it is poorly supported, xjc failed with it).Girish
Yes but the namespace stuff is a biggie. Additionally lots of constructs that are named similar, mean quite different things. +1 to the first answer because it doesn't pretend any XSD semantics.Olomouc
Getting document definitions right is so important that regardless of which variety of primate does it, it should be done by hand I would think. The only thing worse than having to do it is doing it wrong.Hawkinson
G
0

I've used simple XSLT tranformation to process the generated XSD. Works fine for my case:

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
        >
    <xsl:template match="@*|node()|comment()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()|comment()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="xs:element/@ref"/>
    <xsl:template match="xs:element[@ref]">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:attribute name="type"><xsl:value-of select="@ref"/></xsl:attribute>
            <xsl:attribute name="name"><xsl:value-of select="@ref"/></xsl:attribute>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="xs:element[@name = //xs:element/@ref and xs:complexType]">
        <xs:complexType name="{@name}">
            <xsl:apply-templates select="xs:complexType/node()"/>
        </xs:complexType>
    </xsl:template>
    <xsl:template match="xs:element[@name = //xs:element/@ref and @type]">
        <xsl:choose>
            <xsl:when test="//xs:complexType[@name = current()/@type]">
                <xs:complexType name="{@name}">
                    <xs:complexContent>
                        <xs:extension base="{@type}"/>
                    </xs:complexContent>
                </xs:complexType>
            </xsl:when>
            <xsl:otherwise>
                <xs:simpleType name="{@name}">
                    <xs:restriction base="{@type}"/>
                </xs:simpleType>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

It detects referenced element definitions and make them comptexTypes, changing refs. All unreferenced elements become start elements.

Girish answered 28/5, 2012 at 9:20 Comment(0)
D
0

From what you're describing, and ignoring the "fidelity" of the conversion as pointed out in some of the comments, I am only dealing with the fact that you're simply looking for some automatic way of (what I call) XML Schema Refactoring. I am associated with QTAssistant, a product that's meant for this kind of work, so this is how I would do it...

One thing you have to do by hand, no matter what, is to figure out and capture the list of elements you wish to see as root (or not)... and you're done: hit a button, or invoke a command line, and you'll know for sure if a valid XSD is generated.

A refactoring engine employs a visitor pattern that in your case essentially does what you need: creates global types where needed, removes unwanted global element definitions, and replaces any refed elements with inline declarations.

(For anyone reading this that knows substitution groups, this refactoring is not replacing a reference to a head of a substitution group; since we're talking about an XSD from a DTD, this is not a problem here).

This simplicity, and the fact that is repeatable and reliable, would be main advantages of using a specialized refactoring tool; another advantage: it can also re-assign xml namespaces, anyway you want...

If you're interested in more details, let me know and I'll update this post with a small sample and some illustrations.

Dynamiter answered 25/5, 2012 at 18:23 Comment(0)
G
0

I've used simple XSLT tranformation to process the generated XSD. Works fine for my case:

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
        >
    <xsl:template match="@*|node()|comment()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()|comment()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="xs:element/@ref"/>
    <xsl:template match="xs:element[@ref]">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:attribute name="type"><xsl:value-of select="@ref"/></xsl:attribute>
            <xsl:attribute name="name"><xsl:value-of select="@ref"/></xsl:attribute>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="xs:element[@name = //xs:element/@ref and xs:complexType]">
        <xs:complexType name="{@name}">
            <xsl:apply-templates select="xs:complexType/node()"/>
        </xs:complexType>
    </xsl:template>
    <xsl:template match="xs:element[@name = //xs:element/@ref and @type]">
        <xsl:choose>
            <xsl:when test="//xs:complexType[@name = current()/@type]">
                <xs:complexType name="{@name}">
                    <xs:complexContent>
                        <xs:extension base="{@type}"/>
                    </xs:complexContent>
                </xs:complexType>
            </xsl:when>
            <xsl:otherwise>
                <xs:simpleType name="{@name}">
                    <xs:restriction base="{@type}"/>
                </xs:simpleType>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

It detects referenced element definitions and make them comptexTypes, changing refs. All unreferenced elements become start elements.

Girish answered 28/5, 2012 at 9:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.