XQuery: How to count how many times a value occurs in sequence
Asked Answered
F

3

5

I know that the function count can be used to count the number of elements in a given sequence, like this:

count(result/actors/actor)

in this XML:

<result>
    <actors>
        <actor id="00000015">Anderson, Jeff</actor>
        <actor id="00000030">Bishop, Kevin</actor>
        <actor id="0000000f">Bonet, Lisa</actor>
        <actor id="916503207">Parillaud, Anne</actor>
        <actor id="916503208">Pitt, Brad</actor>
        <actor id="916503209">Freeman, Morgan</actor>
        <actor id="916503211">Domingo, Placido</actor>
        <actor id="916503210">Sharif, Omar</actor>
        <actor id="1337">Doqumenteriet2011</actor>
    </actors>
</result>

But what if I want to know how many times a value occurs in a given sequence?

For example if I would like to know how many movies each actor (actorRef) appeared in in the following XML:

<videos>
    <video id="id1235AA0">
        <title>The Fugitive</title>
        <actorRef>00000003</actorRef>
        <actorRef>00000006</actorRef>
    </video>
    <video id="id1244100">
        <title>Enemy of the State</title>
        <actorRef>00000009</actorRef>
        <actorRef>0000000c</actorRef>
        <actorRef>0000000f</actorRef>
        <actorRef>00000012</actorRef>
    </video>
    <video id="id124E230">
        <title>Clerks</title>
        <actorRef>00000015</actorRef>
        <actorRef>00000018</actorRef>
        <actorRef>0000001b</actorRef>
    </video>

I can easily produce a list of all the appearing actors, and even have them appear as many times in my produced sequence as in the XML:

result/videos//actorRef

but I am not able to do anything similar to what for example COUNT() and GROUP BY do together in SQL to get a list of the actors by count of their multiplicity in the sequence produced by the above line of XQuery.

How can I produce this list?

PS: The end goal is to find the actors that appeared the most movies.

Ferd answered 3/10, 2015 at 17:37 Comment(2)
The index-of xpath function could be used in combination with the count function. From the w3c spec: Summary: Returns a sequence of positive integers giving the positions within the sequence $seqParam of items that are equal to $srchParam.Reluctance
@pgfearo, Yes, See my answer.Trevor
P
3

This is the kind of question that isn't good for a document store when you are just storing the list of actors in videos. I'd suggest also storing the lists of videos that an actor is part of. Then you'd just have to query for the actor that has the most videos elements.

All that said, you can do it with the data you have it just isn't going to be all that fast. You first need to get a distance list of actors. Then query for each actor filter the videos that have that actor and do a count. and then order by count.

let $actors := fn:distinct-values($results/videos/video/actorRef)

for $actor in $actors
let $count := fn:count($results/videos/video[actorRef = $actor])
Order by $count
return ($actor, $count)
Pollitt answered 3/10, 2015 at 22:45 Comment(0)
T
4

Here is a pure XPath 2.0 expression (XPath 2.0 is a proper subset of XQuery), that produces the sequence of actorRef values identifying the actors that participated in maximum number of movies:

 for $maxMovies in 
       max(for $actorId in distinct-values(/*/*/actorRef) 
            return
               count(index-of(/*/*/actorRef, $actorId))
           )
    return 
      (/*/*/actorRef)[index-of(/*/*/actorRef, .)[$maxMovies]]/string()

When this expression is evaluated on the following source XML document:

<videos>
    <video id="id1235AA0">
        <title>The Fugitive</title>
        <actorRef>00000003</actorRef>
        <actorRef>00000009</actorRef>
        <actorRef>0000000x</actorRef>
    </video>
    <video id="id1244100">
        <title>Enemy of the State</title>
        <actorRef>00000009</actorRef>
        <actorRef>0000000c</actorRef>
        <actorRef>0000000f</actorRef>
        <actorRef>00000012</actorRef>
    </video>
    <video id="id124E230">
        <title>Clerks</title>
        <actorRef>00000015</actorRef>
        <actorRef>00000018</actorRef>
        <actorRef>0000001b</actorRef>
    </video>
</videos>

The correct, wanted result is produced:

00000009

Using XPath 3.0 (proper subset of XQuery 3.0) one can even write this quite shorter:

let $vSeq := /*/*/actorRef/string()
  return
    for $maxMovies in 
       max(for $actorId in distinct-values($vSeq) 
            return
              index-of($vSeq, $actorId) ! last()
           )
      return 
        $vSeq[index-of($vSeq, .)[$maxMovies]]

And this can be shortened even further using the simple mapping operator (!) to avoid any for-expression:

let $vSeq := /*/*/actorRef/string(),
    $maxOccurs := 
      max(distinct-values($vSeq) ! count(index-of($vSeq, .)) ) 
  return 
    $vSeq[index-of($vSeq, .)[$maxOccurs]]
Trevor answered 24/10, 2015 at 4:19 Comment(0)
P
3

This is the kind of question that isn't good for a document store when you are just storing the list of actors in videos. I'd suggest also storing the lists of videos that an actor is part of. Then you'd just have to query for the actor that has the most videos elements.

All that said, you can do it with the data you have it just isn't going to be all that fast. You first need to get a distance list of actors. Then query for each actor filter the videos that have that actor and do a count. and then order by count.

let $actors := fn:distinct-values($results/videos/video/actorRef)

for $actor in $actors
let $count := fn:count($results/videos/video[actorRef = $actor])
Order by $count
return ($actor, $count)
Pollitt answered 3/10, 2015 at 22:45 Comment(0)
M
0

Tyler's answer is the best solution for what you're ultimately trying to achieve, so I'd go with that, but to answer the specific question of how to count the number of times a value occurs in a sequence: you can use a predicate on the sequence to create a new sequence containing only the values that match the one you care about and then count that:

let $actors := result/videos//actorRef
for $actor in distinct-values($actors)
return
  ($actor, count($actors[. = $actor]))
Meow answered 4/10, 2015 at 11:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.