Get the on-screen location of web page with Selenium WebDriver
Asked Answered
N

11

11

Is there a way to get the on-screen coordinates of HTML window (page body) with Selenium WebDriver?

Nolasco answered 24/12, 2013 at 13:4 Comment(4)
al0 - are you saying that you want to get the window size and size of google logo/image?Jerricajerrie
No. I need to know the position of the HTML content render pane on the physical screen.Nolasco
What are you trying to do just out of curiosity? Maybe it will help me come up with a different solution to your problem.Jerricajerrie
I'm trying to take a screenshot of Flash object on a web page. WebDriver's GetScreenshot() method unable to do that because of the bug: code.google.com/p/selenium/issues/detail?id=5705 . So I'm trying to grab the screenshot of the whole screen with Graphics.CopyFromScreen Method and cut the image I need. This means that I need the coordinates relative to the upper-left corner of the screen.Nolasco
A
4

Seen this a few times and haven't found an elegant solution from WebDriver yet (they have a param that looks to support in in their ILocatable settings but the method is not implemented yet).

What I do is use UIAutomation to get the windows AutomationElement and use a treewalker to find the actual object of the window - downside is I noticed the browsers occasionally update what their window is so the conditionals have to change every once in awhile to accommodate.

Here is some example code (I removed some company code here so it's more elegant on my end but this should work for C#)

    public static Rectangle GetAbsCoordinates(this IWebElement element)
    {
        var driver = GetDriver(element);
        var handle = GetIntPtrHandle(driver);
        var ae = AutomationElement.FromHandle(handle);
        AutomationElement doc = null;
        var caps = ((RemoteWebDriver) driver).Capabilities;
        var browserName = caps.BrowserName;
        switch (browserName)
        {
            case "safari":
                var conditions = (new AndCondition(new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Pane),
                    new PropertyCondition(AutomationElement.ClassNameProperty, "SearchableWebView")));
                doc = ae.FindFirst(TreeScope.Descendants, conditions);
                break;
            case "firefox":
                doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Document));
                break;
            case "chrome":
                doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "Chrome Legacy Window"));
                if (doc == null)
                {
                    doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "Google Chrome"));
                    if (doc == null)
                        throw new Exception("unable to find element containing browser window");
                    doc = doc.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Document));
                }
                break;
            case "internet explorer":
                doc = ae.FindFirst(TreeScope.Descendants, new AndCondition(new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Pane),
                    new PropertyCondition(AutomationElement.ClassNameProperty, "TabWindowClass")));
                break;
        }

        if (doc == null)
            throw new Exception("unable to find element containing browser window");

        var iWinLeft = (int) doc.Current.BoundingRectangle.Left;
        var iWinTop = (int)doc.Current.BoundingRectangle.Top;

        var coords = ((ILocatable) element).Coordinates;
        var rect = new Rectangle(iWinLeft + coords.LocationInDom.X, iWinTop + coords.LocationInDom.Y, element.Size.Width, element.Size.Height);
        return rect;
    }

    public static IWebDriver GetDriver(this IWebElement e)
    {
        return ((IWrapsDriver)e).WrappedDriver;
    }

    public static IntPtr GetIntPtrHandle(this IWebDriver driver, int timeoutSeconds = Timeout)
    {
        var end = DateTime.Now.AddSeconds(timeoutSeconds);
        while(DateTime.Now < end)
        {
            // Searching by AutomationElement is a bit faster (can filter by children only)
            var ele = AutomationElement.RootElement;
            foreach (AutomationElement child in ele.FindAll(TreeScope.Children, Condition.TrueCondition))
            {
                if (!child.Current.Name.Contains(driver.Title)) continue;
                return new IntPtr(child.Current.NativeWindowHandle);;
            }
        }
        return IntPtr.Zero;
    }
Anuran answered 9/4, 2014 at 23:26 Comment(2)
This seems to work correctly for Firefox and Internet Explorer. In Chrome it doesn't rise any exceptions, but the doc.Current.BoundingRectangle appears to be empty. I've never worked with UIAutomation so now i'm unable to fix this by myself. Can you help?Nolasco
There is a big problem with your code: It will return wrong coordinates if the element is inside a frame or iframe. You would have to get the parent frame and its coordinates by Driver.SwichTo().ParentFrame() but this will not work because then the element itself becomes invalid and you get a StaleElementReferenceException. Selenium really sucks. It is misdesigned from the beginning. I see clearly that the folks writing Selenium are no experts in automation. The only way to do this correctly is inside the driver but there it is not implemented although requested from many people in 2012.Toon
C
2

The posted code by Zechtitus is amazing, I tried it under IE11 and Chrome Version 39.0.2171.95 m and it worked like a charm. Although I had to pass the real object of IWebDriver instead of using WrappedDriver because it doesn't work with Chrome. Just for your info, I have Win 7 ultimate x64 and using Selenium WebDriver 2.44. this is the code that I took it from Zechtitus and modified it:

    public static Rectangle GetAbsCoordinates(IWebDriver driver, IWebElement element)
    {
        var handle = GetIntPtrHandle(driver);
        var ae = AutomationElement.FromHandle(handle);
        AutomationElement doc = null;
        var caps = ((RemoteWebDriver)driver).Capabilities;
        var browserName = caps.BrowserName;
        switch (browserName)
        {
            case "safari":
                var conditions = (new AndCondition(new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Pane),
                    new PropertyCondition(AutomationElement.ClassNameProperty, "SearchableWebView")));
                doc = ae.FindFirst(TreeScope.Descendants, conditions);
                break;
            case "firefox":
                doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Document));
                break;
            case "chrome":
                doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "Chrome Legacy Window"));
                if (doc == null)
                {
                    doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "Google Chrome"));
                    if (doc == null)
                        throw new Exception("unable to find element containing browser window");
                    doc = doc.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Document));
                }
                break;
            case "internet explorer":
                doc = ae.FindFirst(TreeScope.Descendants, new AndCondition(new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Pane),
                    new PropertyCondition(AutomationElement.ClassNameProperty, "TabWindowClass")));
                break;
        }

        if (doc == null)
            throw new Exception("unable to find element containing browser window");

        var iWinLeft = (int)doc.Current.BoundingRectangle.Left;
        var iWinTop = (int)doc.Current.BoundingRectangle.Top;

        var coords = ((ILocatable)element).Coordinates;
        var rect = new Rectangle(iWinLeft + coords.LocationInDom.X, iWinTop + coords.LocationInDom.Y, element.Size.Width, element.Size.Height);
        return rect;
    }

    public static IntPtr GetIntPtrHandle(this IWebDriver driver, int timeoutSeconds = 20)
    {
        var end = DateTime.Now.AddSeconds(timeoutSeconds);
        while (DateTime.Now < end)
        {
            // Searching by AutomationElement is a bit faster (can filter by children only)
            var ele = AutomationElement.RootElement;
            foreach (AutomationElement child in ele.FindAll(TreeScope.Children, Condition.TrueCondition))
            {
                if (!child.Current.Name.Contains(driver.Title)) continue;
                return new IntPtr(child.Current.NativeWindowHandle); ;
            }
        }
        return IntPtr.Zero;
    }

and I used it like this:

Rectangle recView = GetAbsCoordinates(MyWebDriverObj, myIWebElementObj);

the correct X, Y are then stored in recView.X and recView.Y As I said, it's working for me for both IE11 and Chrome. Good luck

Coastguardsman answered 28/1, 2015 at 16:8 Comment(0)
G
1

hmmm, I cannot directly comment to the one user asking about chrome so I will have to add another comment here.

Basically for UIAutomation you will want to get your hands on a tool called inspect (comes free in the 8.1 SDK). Older tools like uispy would probably work as well.

Basically you would fire up chrome and then fire up the inspector tool - your going to look at the tree like structure and then navigate down to the document which contains the DOM. Turn on highlighting in the tool to make this easier.

Chrome is quite dynamic it seems in the layout of the tree controls - have had to modify it a few times to accomodate the control I am looking at. If your using a different version than I had - basically find the document window in the tree and take a look at all of the control patterns associated with it - this is what I am passing into the PropertyCondition for how to search for the control. Intellisense should bring you up different things to query for like AutomationElement.NameProperty. It the example I had - I noticed there is a difference between when I run chrome on a winXP machine vs a win8 machine... hence the checking for null.

Like I have said before - this is not elegant and would be awesome if it was inbuilt into Selenium (I imagine they have much better methods for determining the coords of the DOM area)... I think this will also be problematic for people moving to Selenium Grid (like I am looking at doing) - far as I know with using it I don't know if you can shuttle over a bunch of supporting dll to selenium to the remote machine... at least without a lot of hacks.

If it still doesn't work for you - give me a specific idea on the OS, Chrome version and I'll try to take a look and give exact Property match. Probably best though if you fiddle yourself as these things are not static unfortunately.

Galla answered 17/4, 2014 at 19:41 Comment(0)
T
1

Yes. Its possible. With little trick. Find below my code to get on screen top position of web element.

public static long getScrollYPosition() {
    WebDriver driver = DriverFactory.getCurrentDriver();

    JavascriptExecutor jse = (JavascriptExecutor) driver;
    Long scrollYPos = (Long) jse.executeScript("return window.scrollY;");

    return scrollYPos;
}

long scrollPosition = getScrollYPosition();
long elemYPositionOnScreen = (long) elem.getLocation().getY() - scrollPosition;
Tegucigalpa answered 18/4, 2017 at 10:48 Comment(2)
window.scrollY gives 0 when scrolled on top, regardless where the window isMedardas
@Medardas you are right it gives 0 can anybody provide an answer.Steinke
I
0

you can try in this way:

   WebDriver driver=new FirefoxDriver();
   driver.get("http://www.google.com");
   JavascriptExecutor js=(JavascriptExecutor) driver;
   Double i= (Double) js.executeScript("var element = document.getElementById('hplogo');var position = element.getBoundingClientRect();return position.left");
   System.out.print(i);
Inesita answered 24/12, 2013 at 17:3 Comment(1)
This doesn't help, unfortunately. JavaScript knows nothing about browser window position. Thanks anyway.Nolasco
A
0

I took a quick look at chrome and you may have better luck with the following.

doc = win.Find.ByConditions(new PropertyCondition(AutomationElement.ClassNameProperty, "Chrome_RenderWidgetHostHWND"));

I think that class name is consistent for chrome... seems to work on older and newer OS's for me - chrome version 34.0.1847.116m. Hope that helps.

Anuran answered 17/4, 2014 at 19:57 Comment(0)
S
0

This should work once it's supported:

  WebElement htmlElement = driver.findElement(By.tagName("html"));
  Point viewPortLocation = ((Locatable) htmlElement).getCoordinates().onScreen();
  int x = viewPortLocation.getX();
  int y = viewPortLocation.getY();

However right now it's throwing the following error:

java.lang.UnsupportedOperationException: Not supported yet.
at org.openqa.selenium.remote.RemoteWebElement$1.onScreen(RemoteWebElement.java:342)

(on org.seleniumhq.selenium:selenium-java:2.46.0)

Speedy answered 11/8, 2015 at 3:8 Comment(0)
C
0

I needed a this in Robot Framework and I was inspired by Jeyabal's solution, so here is an adaptation that works for me:

${verticalWindow}=     Execute Javascript          return window.scrollY;
${verticalElement} =   Get Vertical Position       /xpath
${hasScrolled} =       Evaluate                    (${verticalElement} - ${verticalWindow}) == 0
Covet answered 24/8, 2018 at 13:54 Comment(0)
M
0

Nothing from above worked for me. A workaround is to use window.innerHeight and window.innerWidth and work your way up from the the left, bottom corner. This assumes that the browser bottom border is almost 0 (no horizontal scrollbar or thick window decoration).

win_pos = selenium.get_window_position()
win_size = selenium.get_window_size()
win_bottom_y = win_pos['y'] + win_size['height']

# We assume viewport x == window x. For y coordinate we take the bottom
# of the browser and subtract the viewport height 
viewport_height = selenium.execute_script('return window.innerHeight')
viewport_width = selenium.execute_script('return window.innerWidth')
viewport_y = win_bottom_y - viewport_height

This is not 100% accurate but it's a good workaround that can be tweaked for your case.

Medardas answered 2/10, 2020 at 10:9 Comment(0)
M
0

SOOO many factors have to be considered to get the element position relative to the screen. For the longest time I was using the UIAutomation code above, but UIAutomation is unreliable, it crashes or fails to find the browser (for some reason), with EdgeDriver, tabs crash consistently making getting the values via UIAutomation now as a 'fallback'.

That said, when it works, the answer is GOLDEN as to the on-screen coordinates of the HTML page. However, something that always works is using javascript. So we calculate that first, then also attempt to call UIAutomation. If UIAutomation fails, we use this code's answer. If UIAutomation works, we use the UIAutomation values.

        // use javascript to get our html document location, but it's off by 2 pixels compared to uiautomation. uiautomation, however, failed so often that is was unreliable
            int outerHeight = Int32.Parse(BrowserHelper.ExecuteJavascript(browser, "return window.outerHeight"));
            int innerHeight = Int32.Parse(BrowserHelper.ExecuteJavascript(browser, "return window.innerHeight"));
            int outerWidth = Int32.Parse(BrowserHelper.ExecuteJavascript(browser, "return window.outerWidth"));
            int innerWidth = Int32.Parse(BrowserHelper.ExecuteJavascript(browser, "return window.innerWidth"));

            int browserNavHeight = outerHeight - innerHeight;
            int browserNavWidth = outerWidth - innerWidth;

            iWinLeft = browserNavWidth + 2;
            iWinTop = browserNavHeight + 2;

`

Molarity answered 29/3, 2022 at 14:59 Comment(0)
I
-1

Try this, I hope it will help you :

Rectangle rec = new Rectangle(element.getLocation(), element.getSize());
Investment answered 4/7, 2016 at 7:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.