How to use regex in selenium locators

Asked 9/9, 2009 at 10:5 Answered 29/1, 2020 at 9:40

I'm using selenium RC and I would like, for example, to get all the links elements with attribute href that match:

http://[^/]*\d+com

I would like to use:

sel.get_attribute( '//a[regx:match(@href, "http://[^/]*\d+.com")]/@name' )

which would return a list of the name attribute of all the links that match the regex. (or something like it)

thanks

Logogram answered 9/9, 2009 at 10:5 Comment(2)

So what's not working, and in what way is it not working? Can you post the HTML (or a fragment of it) that you're matching against? – Aggregation 9/9, 2009 at 10:14

@Paul, given example and method get_attribute() only returns a single item, not a list. Poster is asking what's the equivalent for returning a list of attributes for example. – Voyageur 9/11, 2011 at 20:59

A possible solution is to use sel.get_eval() and write a JS script that returns a list of the links. something like the following answer: selenium: Is it possible to use the regexp in selenium locators

Logogram answered 24/2, 2010 at 15:40 Comment(0)

The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:

xpath=//div[matches(@id,'che.*boxes')]

(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')

Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).

If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)

Isocracy answered 17/9, 2009 at 14:38 Comment(3)

how do you check your xpath? I usually use firefox's add on xpath-checker. But it doesn't recognize the regex in the xpath. – Logogram 5/1, 2010 at 15:48

Using that xpath-checker add-on is a great idea! I never thought to look for one. I don't have write too many xpath locators, though. At my job, I built a tool-independent test framework that builds locators for multiple tools, including Selenium, using our own simple syntax. I only had to learn these xpath locators well enough to write some code that could generate them. :) – Isocracy 7/1, 2010 at 20:8

+1 for allowNaticeXPath(false) tip. Saved me a lot of head-scratching right now :) – Waterscape 28/2, 2012 at 8:54

You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an @ and the attribute name. For example in Java this might be:

String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();

for (String linkId : allLinks) {
    String linkHref = selenium.getAttribute("id=" + linkId + "@href");
    if (linkHref.matches("http://[^/]*\\d+.com")) {
        matchingLinks.add(link);
    }
}

Lotuseater answered 9/9, 2009 at 11:2 Comment(5)

I don't think that's what he wanted - he wants to find an element using a regex as the locator (as part of the XPATH) – Phenomenal 9/9, 2009 at 11:39

The question mentions getting all links that match the regex. As Selenium doesn't support this (to my knowledge), getting all links from the page and then using your client language to check the locations against a regular expression is a sensible solution. – Lotuseater 9/9, 2009 at 11:41

I've edited my example code to do a regular expression match. I didn't do this originally because it depends on the client language in use, and wanted to keep the answer simple. – Lotuseater 9/9, 2009 at 11:54

There's no way to use regular expressions in the locators. There's some stuff that can be done, like using the contains() function in xpath. Anyway, for regexp, I think this is the best alternative. +1 – Hyposensitize 9/9, 2009 at 14:9

Note that getAllLinks() I believe is only useful if the links have IDs. Otherwise, you end up with a list of empty string / null items "" to iterate through. – Voyageur 9/11, 2011 at 20:52

Logogram answered 24/2, 2010 at 15:40 Comment(0)

Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.

You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.

Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.

Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.

//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements

//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:

http://jsoup.org/cookbook/extracting-data/dom-navigation

the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/

Voyageur answered 9/11, 2011 at 20:57 Comment(4)

Where are they? ('Here's some alternate methods...') please give more explicit code examples. – Jarrettjarrid 27/2, 2013 at 16:59

Ok, I'll update the answer with actual example or link to one when I get a chance (maybe in a few days or weeks, kinda busy right now). – Voyageur 28/2, 2013 at 5:33

Ok, I've now updated the post with code examples. Not fully tested, but I believe to be good enough as proof of concept examples. – Voyageur 15/7, 2013 at 7:3

By the way, I don't see anyone presenting my type of solutions online, so I can't reference them. And it takes time to craft good executable demo code examples, so I'll clean it up with better examples when I have more time, but what's there now is good enough to start with as long as one has worked with XPath, regex, arrays, and objects. – Voyageur 15/7, 2013 at 7:10

Selenium's By.Id and By.CssSelector methods do not support Regex and By.XPath only does where XPath 2.0 is enabled. If you want to use Regex, you can do something like this:

void MyCallingMethod(IWebDriver driver)
{
    //Search by ID:
    string attrName = "id";
    //Regex = 'a number that is 1-10 digits long'
    string attrRegex= "[0-9]{1,10}";
    SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{    
     List<IWebElement> elements = new List<IWebElement>();

     //Allows spaces around equal sign. Ex: id = 55
     string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
     //Search page source
     MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
    //iterate over matches
    foreach (Match match in matches)
    {
        //Get exact attribute value
        Match innerMatch = Regex.Match(match.Value, attrRegex);
        cssSelector = "[" + attrName + "=" + attrRegex + "]";
       //Find element by exact attribute value
       elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
   }

   return elements;
}

Note: this code is untested. Also, you can optimize this method by figuring out a way to eliminate the second search.

Wattle answered 29/1, 2020 at 9:40 Comment(0)

Recommended topics

Hot tags