XPath: Match whole word (using matches function with case insensitive flag)
Asked Answered
R

5

7

Using XPath, I would like to "Match whole word" (option for user, just like in VS search).

It seems as though the functions contains and matches work similarly though matches allows for flags like i for case insensitivity.

In other words, I am getting the same results with these two XPath queries:

<pets>
    <dog name="Rupert" color="grey"/>
    <dog name="Ralph" color="brown"/>
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>
    <cat name="Fluffy" color="black"/>
</pets>

Matches XPath: //cat[descendant-or-self::*[@*[matches(.,'Cat')]]]
    returns:
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>


Contains XPath: //cat[descendant-or-self::*[@*[contains(.,'Cat')]]]
    returns:
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>

But I would like to use matches to return results that match "Cat" whole word only:

<cat name="Cat" color="grey"/>

How can I adjust the matches query so it matches whole word?

EDIT: I forgot to mention that I need to still use the matches function because I need the case insensitivity flag.

Rhoea answered 1/5, 2012 at 20:9 Comment(0)
S
6

What about using ^ and $ characters as anchors?

//cat[descendant-or-self::*[@*[matches(.,'^Cat$')]]]

From RegEx Syntax in XQuery 1.0 and XPath 2.0:

Two meta-characters, ^ and $ are added. By default, the meta-character ^ matches the start of the entire string, while $ matches the end of the entire string.

Shelving answered 1/5, 2012 at 21:3 Comment(4)
Hmm.. this gives me the result I want. But could you explain the ^$ anchors? I've never used them before..Rhoea
Added a link into an answer, see the "Two meta-characters..." section.Wergild
Thanks, I will still need to do some testing, but this seems to do the trick!Rhoea
^ and $ match start/end of the line, not start/end of substrings with word boundarySlifka
B
4

There are three functions/operators of relevance here.

matches() does a regular expression match; you can use it to match a substring or to match the entire string by use of anchors (^cat$), and you can set the 'i' flag to make it case-blind.

contains() does an exact match of a substring; you can use the third argument (collation) to request a case-blind match, but the way in which collations are specified depends on the processor you are using.

The eq operator does an exact match of the entire string; the "default collation" (which in the case of XPath will typically be set using the processor's API) can be used to request case-blind matching. This seems to be the one that is closest to your requirement, the only drawback is that specifying the collation is more system-dependent than using the "i" flag with matches().

Birdseed answered 1/5, 2012 at 22:48 Comment(0)
R
2

Would this work for you?

//cat[@*='Cat']
Rogovy answered 1/5, 2012 at 20:23 Comment(1)
Not quite what I'm looking for. I still want to use the matches function because I need case insensitivity... (see edit above).Rhoea
S
2

But I would like to use matches to return results that match "Cat" whole word only:

<cat name="Cat" color="grey"/>

There are different XPath expression that select the wanted element:

Use:

/*/cat[matches(@name, '^cat$', 'i')]

Or use:

/*/cat[lower-case(@name) eq 'cat']

XSLT - based verification:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "/*/cat[matches(@name, '^cat$', 'i')]"/>
======
  <xsl:copy-of select=
   "/*/cat[lower-case(@name) eq 'cat']"/>

 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<pets>
    <dog name="Rupert" color="grey"/>
    <dog name="Ralph" color="brown"/>
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>
    <cat name="Fluffy" color="black"/>
</pets>

this transformation evaluates the two XPath expressions and copies the selected elements to the output:

  <cat name="Cat" color="grey"/>
======
  <cat name="Cat" color="grey"/>
Slapjack answered 2/5, 2012 at 1:47 Comment(0)
M
1

This:

//cat[@*='Cat']

results in:

<cat name="Cat" color="grey"/>

I verified using Xacobeo.

Mide answered 1/5, 2012 at 20:24 Comment(2)
Not quite what I'm looking for. I still want to use the matches function because I need case insensitivity... (see edit above).Rhoea
@Rhoea Try this: //cat[translate(@*,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='cat'] This assumes you always pass the string you want to match by as lowercase.Mide

© 2022 - 2024 — McMap. All rights reserved.