Remove duplicates from return XQuery
Asked Answered
B

3

5

My XQuery is:

declare namespace xsd="http://www.w3.org/2001/XMLSchema"; 
for $schema in xsd:schema
for $nodes in $schema//*,
    $attr in $nodes/xsd:element/@name
where fn:contains($attr,'city')
return $attr

return: name="city" name="city" name="city" name="city" name="city"

When I add distinct-values like:

declare namespace xsd="http://www.w3.org/2001/XMLSchema"; 
for $schema in xsd:schema
for $nodes in $schema//*,
    $attr in $nodes/xsd:element/@name
where fn:contains($attr,'city')
return distinct-values($attr)

return: city city city city city

I need only one "city", how can I do it ?

Biedermeier answered 1/6, 2013 at 14:32 Comment(1)
Could you post an example of a document you're querying?Dichloride
W
8

You need to apply the distinct-values function on the whole result (i. e., not to each single result item):

declare namespace xsd="http://www.w3.org/2001/XMLSchema"; 
distinct-values(
  for $schema in xsd:schema
  for $nodes in $schema//*,
      $attr in $nodes/xsd:element/@name
  where fn:contains($attr,'city')
  return $attr
)

The query can also be written as a single XPath expression:

distinct-values(//xs:element/@name[contains(., 'city')])
Wichern answered 1/6, 2013 at 15:12 Comment(3)
is there a way to speed this up a bit? I am facing the problem that baseX is pretty slow when using distinct-values() or the group by statement. without, the xquery is finished correctly in 4s. with elimination of double values, however, it takes about 4 minutes. the values are about 60.000 text nodes; each contains a lemma. 1G memory is available for each queries. the only way of speeding this up right now is the work-around to let another language remove the double values...now what I want, actuallyMccaleb
In invite you to send this to the BaseX mailing list, and add further information on your use case, the specific query, etc.Nora
thanks, I already joined the list. As for the problem mentioned above, I found out that there is a huge difference between case-sensitive and case-insensitive grouping: the latter is slow. So my solution was to use the default collation instead of my previously used "html-ascii-case-insensitive"Mccaleb
L
4

Use group by. Your query returns multiple times city, because in each iteration (of the for loop) there is only one such element in $attr. So you are doing the distinct-values on a single element, but you are doing this multiple times.

declare namespace xsd="http://www.w3.org/2001/XMLSchema"; 
for $schema in xsd:schema
for $nodes in $schema//*,
    $attr in $nodes/xsd:element/@name
where fn:contains($attr,'city')
group by $attr
return $attr
Lauraine answered 1/6, 2013 at 15:8 Comment(0)
B
0

This work

distinct-values(for $schema in xsd:schema
for $nodes in $schema//*,
    $attr in $nodes/xsd:element/@name
where fn:contains($attr,'city')
return distinct-values($attr))
Biedermeier answered 1/6, 2013 at 15:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.