XQuery/XPath: Using count() and max() function for return of element with highest count
Asked Answered
L

4

8

I have an XML file that contains authors and editors.

<?xml version="1.0" encoding="UTF-8"?>
<?oxygen RNGSchema="file:textbook.rnc" type="compact"?>
<books xmlns="books">

    <book ISBN="i0321165810" publishername="OReilly">
        <title>XPath</title>
        <author>
            <name>
                <fname>Priscilla</fname>
                <lname>Walmsley</lname>
            </name>
        </author>
        <year>2007</year>
        <field>Databases</field>
    </book>

    <book ISBN="i0321165812" publishername="OReilly">
        <title>XQuery</title>
        <author>
           <name>
               <fname>Priscilla</fname>
               <lname>Walmsley</lname>
            </name>
        </author>
        <editor>
            <name>
                <fname>Lisa</fname>
                <lname>Williams</lname>
            </name>
        </editor>
        <year>2003</year>
        <field>Databases</field>
    </book>

    <publisher publishername="OReilly">
        <web-site>www.oreilly.com</web-site>
        <address>
            <street_address>hill park</street_address>
            <zip>90210</zip>
            <state>california</state>
        </address>
        <phone>400400400</phone>
        <e-mail>[email protected]</e-mail>
        <contact>
            <field>Databases</field>
            <name>
                <fname>Anna</fname>
                <lname>Smith</lname>
            </name>
        </contact>
    </publisher>
</books>

I'm looking for a way to return the person who has been listed the most times as an author and/or editor. The solution should be XQuery 1.0 (XPath 2.0) compatible.

I was thinking about using a FLWOR query to iterate through all authors and editors, then doing a count of unique authors/editors, then returning the author(s)/editor(s) that match the highest count. But I haven't been able to find the proper solution.

Does anyone have any suggestion as to how such a FLWOR query would be written? Could this be done in a simpler way, using XPath?

Lucillelucina answered 30/11, 2011 at 23:58 Comment(0)
W
16

This may help:

declare default element namespace 'books';
(for $name in distinct-values($doc/books/*/*/name)
 let $entries := $doc/books/*[data(*/name) = $name]
 order by count($entries) descending
 return $entries/*/name)[1]
Wilks answered 1/12, 2011 at 0:38 Comment(2)
Thanks for the solution, Christian :) Is there a way to return more than one author/editor (if applicable)? For instance if there are two authors/editors that share the same (maximum) count as author/editor?Lucillelucina
@Jea: Both in Christian's and in my solution just remove the ending [1] and you'll get all the nodes that have the maximum value.Lightsome
L
7

Here is a pure XPath 2.0 expression, admittedly not for the faint-hearted:

(for $m in max(for $n in distinct-values(/*/b:book/(b:author | b:editor)
                                        /b:name/concat(b:fname, '|', b:lname)),
               $cnt in count(/*/b:book/(b:author | b:editor)
                             /b:name[$n eq concat(b:fname, '|', b:lname) ])
               return $cnt
               ),
     $name in /*/b:book/(b:author | b:editor)/b:name,
     $fullName in $name/concat(b:fname, '|',  b:lname),
     $count in count( /*/b:book/(b:author | b:editor)
                   /b:name[$fullName eq concat(b:fname, '|',  b:lname)])
  return
     if($count eq $m)
       then $name
       else ()
   )[1]

where the prefix "b:" is associated with the namespace "books".

XSLT 2.0 - based verification:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:b="books">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
   <xsl:sequence select=
   "(for $m in max(for $n in distinct-values(/*/b:book/(b:author | b:editor)
                                            /b:name/concat(b:fname, '|', b:lname)),
                   $cnt in count(/*/b:book/(b:author | b:editor)
                                 /b:name[$n eq concat(b:fname, '|', b:lname) ])
                   return $cnt
                   ),
         $name in /*/b:book/(b:author | b:editor)/b:name,
         $fullName in $name/concat(b:fname, '|',  b:lname),
         $count in count( /*/b:book/(b:author | b:editor)
                       /b:name[$fullName eq concat(b:fname, '|',  b:lname)])
      return
         if($count eq $m)
           then $name
           else ()
       )[1]
   "/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<books xmlns="books">
    <book ISBN="i0321165810" publishername="OReilly">
        <title>XPath</title>
        <author>
            <name>
                <fname>Priscilla</fname>
                <lname>Walmsley</lname>
            </name>
        </author>
        <year>2007</year>
        <field>Databases</field>
    </book>
    <book ISBN="i0321165812" publishername="OReilly">
        <title>XQuery</title>
        <author>
            <name>
                <fname>Priscilla</fname>
                <lname>Walmsley</lname>
            </name>
        </author>
        <editor>
            <name>
                <fname>Lisa</fname>
                <lname>Williams</lname>
            </name>
        </editor>
        <year>2003</year>
        <field>Databases</field>
    </book>
    <publisher publishername="OReilly">
        <web-site>www.oreilly.com</web-site>
        <address>
            <street_address>hill park</street_address>
            <zip>90210</zip>
            <state>california</state>
        </address>
        <phone>400400400</phone>
        <e-mail>[email protected]</e-mail>
        <contact>
            <field>Databases</field>
            <name>
                <fname>Anna</fname>
                <lname>Smith</lname>
            </name>
        </contact>
    </publisher>
</books>

the wanted, correct name element is selected and output:

<name xmlns="books">
   <fname>Priscilla</fname>
   <lname>Walmsley</lname>
</name>
Lightsome answered 1/12, 2011 at 4:35 Comment(0)
C
4

I've always felt this was an omission in XPath: the max() and min() functions return the highest/lowest value, whereas what you usually want is the object(s) in a collection that have the highest/lowest value for some expression. One solution is to sort the objects on that value and take the first/last from the list, which seems inelegant. Computing the min/max and then selecting the items whose value matches this seems equally unappealing. In Saxon there has long been a pair of higher-order extension functions saxon:highest() and saxon:lowest() which take a sequence and a function, and return the item(s) from the sequence having the lowest or highest values of the function result. The good news is that in XPath 3.0 you can write these functions yourself (in fact, they are given as example user-written functions in the spec).

Ceratoid answered 1/12, 2011 at 9:42 Comment(1)
A link to those examples would be nice!Archer
A
2

You are on the right track. The simplest way is to convert the names into strings (separated with a space, for example) and use these: (Note that the following code is untested)

let $names := (//editor | //author)/concat(fname, ' ', lname)
let $distinct-names := distinct-values($names)
let $name-count := for $name in $distinct-names return count($names[. = $name])
for $name at $pos in $distinct-names
where $name-count[$pos] = max($name-count)
return $name

Or, another approach:

(
  let $people := (//editor | //author)
  for $person in $people
  order by count($people[fname = $person/fname and
                         lname = $person/lname])
  return $person
)[last()]
Arequipa answered 1/12, 2011 at 0:27 Comment(3)
@_Oliver: Sorry, but even in XQuery 3.0 / XPath 3.0 this is in error. Hint: look at: $names/count(index-of($names,.) . $names happens to be a sequence of atomic values, but the / operator requires a node(-set) as its left operand.Lightsome
@_Oliver: your first approach also doesn't produce any results. Checked with Saxon 9.3.05 under oXygen.Lightsome
@Dimitre: Good point re '/'. I have removed the XPath example. It was a horrible solution anyway.Arequipa

© 2022 - 2024 — McMap. All rights reserved.