how to get the most deeply nested element nodes using xpath? (implementation with XMLTWIG)
Asked Answered
L

4

4

I need to extract (XSLT, xpath, xquery... Preferably xpath) the most deeply nested element nodes with method (DEST id="RUSSIA" method="delete"/>) and his direct ancestor (SOURCE id="AFRICA" method="modify">).

I don't want to get the top nodes with methods ( main method="modify"> or main method="modify"> ).

The deepest nested elements with method correspond to real actions. The top elements with method actually are dummy actions that must not be taken into account.

Here is my XML sample file:

<?xml version="1.0" encoding="UTF-8"?>
<main method="modify">
<MACHINE method="modify">  
  <SOURCE id="AFRICA" method="modify">
    <DEST id="RUSSIA" method="delete"/>
    <DEST id="USA" method="modify"/>
  </SOURCE>

  <SOURCE id="USA" method="modify">
    <DEST id="AUSTRALIA" method="modify"/>
    <DEST id="CANADA" method="create"/>
  </SOURCE>
</MACHINE>
</main>

This is Xpath output I expect:

<SOURCE id="AFRICA" method="modify"><DEST id="RUSSIA" method="delete"/>

<SOURCE id="AFRICA" method="modify"><DEST id="USA" method="modify"/>

<SOURCE id="USA" method="modify"><DEST id="AUSTRALIA" method="modify"/>

<SOURCE id="USA" method="modify"><DEST id="CANADA" method="create"/>

My current xpath command does not provide the adequate result.

Command xpath("//[@method]/ancestor::*") which is returning:

<main><MACHINE method="modify">                                        # NOT WANTED

<MACHINE method="modify"><SOURCE id="AFRICA" method="modify">          # NOT WANTED

<MACHINE method="modify"><SOURCE id="USA" method="modify">             # NOT WANTED

<SOURCE id="AFRICA" method="modify"><DEST id="RUSSIA" method="delete"/>

<SOURCE id="AFRICA" method="modify"><DEST id="USA" method="modify"/>

<SOURCE id="USA" method="modify"><DEST id="AUSTRALIA" method="modify"/>

<SOURCE id="USA" method="modify"><DEST id="CANADA" method="create"/>

My xmltwig code for additional information (context):

#!/usr/bin/perl -w
use warnings;
use XML::Twig;
use XML::XPath;

@my $t= XML::Twig->new;
my $v= XML::Twig::Elt->new;
$t-> parsefile ('input.xml');

@abc=$t->get_xpath("\/\/[\@method]\/ancestor\:\:\*") ;
 foreach $v (@abc)   # outer 1
 {
    foreach $v ($v ->children)  # internal 1
    {
      $w=$v->parent;
      print $w->start_tag;
      print $v->start_tag;
    }
  }
Lylalyle answered 21/6, 2012 at 9:49 Comment(13)
We need XSLT if you want to manipulate nodes so XPath alone can't remove ancestors you don't want. Then consider to post well-formed samples of input and output, so for the input sample at least lacks a closing tag and the wanted result is not well-formed at all, it is not clear if you want the SOURCE element to contain the DEST elements or if you want to flatten the existing hierarchy and output all elements on the same level.Chrysler
I corrected/updated my question. The output file is the result of my xpath command //[@method]/ancestor::*. Let me know if it possible with xpath to filter the farthest node with method (and to include his direct ancestor). If not possible (then we use XSLT), I will modify the question by having an XML file as OUTPUTLylalyle
I think finding the most deeply nested elements is not possible with XPath because XPath does not have a current() function. Otherwise, the solution would be select all elements for which there are no other elements with a greater number of ancestors. Using XSLT, this can be expressed.Redoubt
I've added an answer to illustrate what I said in my previous comment.Redoubt
The XPath expression you show "//[@method]/ancestor::*" is not legal XPath and should give you a syntax error.Dark
@O.R.Mapper [Re: "I think finding the most deeply nested elements is not possible with XPath because XPath does not have a current() function"] Two of the answers do provide such expression.Kleiman
@DimitreNovatchev: Right, that seems to be some XPath 2.0 novelty. That's not available in the software I usually test XPath/XSLT code with, so unfortunately, I couldn't check this.Redoubt
@O.R.Mapper: Yes, I fully understand this. Your statement would be correct if the word "Xpath" were replaced by "XPath 1.0". XPath 1.0 is dated 1999 and we, the developer community, need to start regarding it as obsolete.Kleiman
@DimitreNovatchev: As long as XPath 1.0 is the only thing supported in most of the Xml-related .NET classes, and thereby in thusly-based developer tools, it will remain the default version of XPath to test with for many of us, as sad as it is :-(Redoubt
@O.R.Mapper: There are XPath 2.0 implementations developed specifically for .NET -- such as Saxon.NET and XQSharp/XmlPrime.Kleiman
@DimitreNovatchev: I know there are 3rd party libs. However, the official standard implementation in the BCL in the most frequently-used Xml classes supports only XPath 1.0, and that's what is used by IDEs such as SharpDevelop to my knowledge.Redoubt
@O.R.Mapper: Yes, and IDEs such as oXygen have full support (intellisense, syntax coloring and debuggers) for XPath 2.0, XSLT 2.0, XQuery, ..., etc.Kleiman
@DimitreNovatchev: Fine. I'll be happy to switch as soon as it's either integrated into tools that are running on my machine all the time anyway, or as soon as IDEs such as oXygen enter the set of tools that are running on my machine all the time :-) Looking forward to that :-)Redoubt
D
4

The nodes with maximum depth can be found with

//*[count(ancestor::*) = max(//*/count(ancestor::*))]

but it might perform horribly, depending how smart your optimizer is.

Having found those nodes, it is of course trivial to find their ancestors. But you are looking for output with more structure than XPath alone can provide.

Dark answered 21/6, 2012 at 11:59 Comment(0)
R
1

As I mentioned in my comment on the question, I don't think this is possible with pure XPath as XPath doesn't have anything like a current() function that would allow to refer to the context outside of a [] restriction.

The most similar solution should be this XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ZD="http://xyz.abc">
    <xsl:output method="text"/>

    <xsl:template match="//*">
        <xsl:choose>
            <xsl:when test="not(//*[count(ancestor::node()) > count(current()/ancestor::node())])"><xsl:value-of select="local-name(.)"/><xsl:text>
</xsl:text></xsl:when>
            <xsl:otherwise>
                <xsl:copy>
                    <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template match="text()|@*"/>
</xsl:stylesheet>

The <xsl:when> element finds the most deeply nested elements. As an example, I'm outputting the local names of the found elements, followed by a newline, but of course you can output anything you need there.

Update: Note that this is based on XPath 1.0 knowledge/tools. It seems that this is indeed possible to express in XPath 2.0.

Redoubt answered 21/6, 2012 at 11:37 Comment(0)
K
1

One such XPath2.0 expression is:

//*[not(*)
  and
   count(ancestor::*)
  =
   max(//*[not(*)]/count(ancestor::*))
   ]
     /(self::node|..)

To illustrate this with a complete XSLT 2.0 example:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:variable name="vResult" select=
     "//*[not(*)
        and
          count(ancestor::*)
       =
        max(//*[not(*)]/count(ancestor::*))
        ]
          /(self::node|..)
     "/>

 <xsl:template match="/">
     <xsl:sequence select="$vResult"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<main method="modify">
    <MACHINE method="modify">
        <SOURCE id="AFRICA" method="modify">
            <DEST id="RUSSIA" method="delete"/>
            <DEST id="USA" method="modify"/>
        </SOURCE>
        <SOURCE id="USA" method="modify">
            <DEST id="AUSTRALIA" method="modify"/>
            <DEST id="CANADA" method="create"/>
        </SOURCE>
    </MACHINE>
</main>

the XPath expression is evaluated and the selected elements (the elements at maximum depth and their parents) are copied to the output:

<SOURCE id="AFRICA" method="modify">
            <DEST id="RUSSIA" method="delete"/>
            <DEST id="USA" method="modify"/>
        </SOURCE>
<SOURCE id="USA" method="modify">
            <DEST id="AUSTRALIA" method="modify"/>
            <DEST id="CANADA" method="create"/>
        </SOURCE>
Kleiman answered 22/6, 2012 at 13:19 Comment(0)
C
0

The stylesheet

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/">
  <xsl:apply-templates 
     select="//DEST[@method and not(node())]"/>
</xsl:template>

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="DEST[@method and not(node())]">
  <xsl:apply-templates select="..">
    <xsl:with-param name="leaf" select="current()"/>
  </xsl:apply-templates>
</xsl:template>

<xsl:template match="*[DEST[@method and not(node())]]">
  <xsl:param name="leaf"/>
  <xsl:copy>
    <xsl:copy-of select="@* , $leaf"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

transforms

<?xml version="1.0" encoding="UTF-8"?>
<main method="modify">
<MACHINE method="modify">  
  <SOURCE id="AFRICA" method="modify">
    <DEST id="RUSSIA" method="delete"/>
    <DEST id="USA" method="modify"/>
  </SOURCE>

  <SOURCE id="USA" method="modify">
    <DEST id="AUSTRALIA" method="modify"/>
    <DEST id="CANADA" method="create"/>
  </SOURCE>
</MACHINE>
</main>

into

<SOURCE id="AFRICA" method="modify">
   <DEST id="RUSSIA" method="delete"/>
</SOURCE>
<SOURCE id="AFRICA" method="modify">
   <DEST id="USA" method="modify"/>
</SOURCE>
<SOURCE id="USA" method="modify">
   <DEST id="AUSTRALIA" method="modify"/>
</SOURCE>
<SOURCE id="USA" method="modify">
   <DEST id="CANADA" method="create"/>
</SOURCE>
Chrysler answered 21/6, 2012 at 10:43 Comment(6)
The question sounds more like laurentgnu wants to find the most deeply nested elements in an Xml document.Redoubt
Yes I need Xpath to show the most deeply nested elements and their direct ancestor. Anyway thank you @Martin for that XLST solution. But actually I need to use an xpath command if possible. Here is the xmltwig code I am using : @abc=$t->get_xpath("\/\/[\@method]\/ancestor\:\:\*"); foreach my $v (@abc) { # blabla }Lylalyle
Well you seem to want to reorganize the nodes by eliminating ancestors and by mapping each leaf to its parent, at least that is what I see in your posted result. As XPath does not allow you to do any manipulation of nodes but rather selects nodes in an existing documents I think you need more than XPath. Your comment's sample suggest you want to use some imperative host language and XPath but I don't recognize that language so I can not help with that. Tag your question with that language (e.g. Python, PHP), explain which XPath API you use, then people with experience in that area can help.Chrysler
@MartinHonnen: The question does not ask for reorganizing the nodes or eliminating anything. The posted result just shows the nodes that are expected to be found. I agree that XPath is not enough for finding the respective nodes, though, as it lacks XSLT's current() function.Redoubt
Oh well, the "posted result" is something like <SOURCE id="AFRICA" method="modify"><DEST id="RUSSIA" method="delete"/> <SOURCE id="AFRICA" method="modify"><DEST id="USA" method="modify"/>, that is not well-formed so I had to make some assumptions as to what kind of output is wanted. And the question is tagged as xslt-2.0 so presenting an XSLT solution is an answer in my view. If the poster want to use an imperative language together with XPath, fine, then others can help, I prefer XSLT as the host language for XPath.Chrysler
If xpath/xquery are not possible, then I will parse your XML output @martin. The XMLTWIG is just supporting xpath I thinkLylalyle

© 2022 - 2024 — McMap. All rights reserved.