Python Selenium - get href value

Asked 25/2, 2019 at 8:48 Answered 1/10, 2022 at 14:8

Solved python selenium xpath css-selectors webdriverwait

I am trying to copy the href value from a website, and the html code looks like this:

<p class="sc-eYdvao kvdWiq">
 <a href="https://www.iproperty.com.my/property/setia-eco-park/sale- 
 1653165/">Shah Alam Setia Eco Park, Setia Eco Park
 </a>
</p>

I've tried driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") but it returned 'list' object has no attribute 'get_attribute'. Using driver.find_element_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") returned None. But i cant use xpath because the website has like 20+ href which i need to copy all. Using xpath would only copy one.

If it helps, all the 20+ href are categorised under the same class which is sc-eYdvao kvdWiq.

Ultimately i would want to copy all the 20+ href and export them out to a csv file.

Appreciate any help possible.

Isonomy answered 25/2, 2019 at 8:48 Comment(2)

Update the question with a bit more of the outerHTML – Scutcheon 25/2, 2019 at 9:9

Can you also include the url? – Gracegraceful 25/2, 2019 at 9:17

103

You want driver.find_elements if more than one element. This will return a list. For the css selector you want to ensure you are selecting for those classes that have a child href

elems = driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq [href]")
links = [elem.get_attribute('href') for elem in elems]

You might also need a wait condition for presence of all elements located by css selector.

elems = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sc-eYdvao.kvdWiq [href]")))

Gracegraceful answered 25/2, 2019 at 9:20 Comment(2)

There's a typo in the wait condition: it should be WebDriverWait(driver,10).until ... [with a closing bracket between '10' and '.until'). – Straw 11/1, 2021 at 11:3

Thank you worked beautifully also for my case although I am using find_elements_by_class_name – Hackneyed 28/12, 2021 at 17:14

As per the given HTML:

<p class="sc-eYdvao kvdWiq">
    <a href="https://www.iproperty.com.my/property/setia-eco-park/sale-1653165/">Shah Alam Setia Eco Park, Setia Eco Park</a>
</p>

As the href attribute is within the <a> tag ideally you need to move deeper till the <a> node. So to extract the value of the href attribute you can use either of the following Locator Strategies:

Using css_selector:

print(driver.find_element_by_css_selector("p.sc-eYdvao.kvdWiq > a").get_attribute('href'))

Using xpath:

print(driver.find_element_by_xpath("//p[@class='sc-eYdvao kvdWiq']/a").get_attribute('href'))

If you want to extract all the values of the href attribute you need to use find_elements* instead:

Using css_selector:

print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_css_selector("p.sc-eYdvao.kvdWiq > a")])

Using xpath:

print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_xpath("//p[@class='sc-eYdvao kvdWiq']/a")])

Dynamic elements

However, if you observe the values of class attributes i.e. sc-eYdvao and kvdWiq ideally those are dynamic values. So to extract the href attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a"))).get_attribute('href'))

Using XPATH:

print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//p[@class='sc-eYdvao kvdWiq']/a"))).get_attribute('href'))

If you want to extract all the values of the href attribute you can use visibility_of_all_elements_located() instead:

Using CSS_SELECTOR:

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a")))])

Using XPATH:

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='sc-eYdvao kvdWiq']/a")))])

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait     
from selenium.webdriver.common.by import By     
from selenium.webdriver.support import expected_conditions as EC

Scutcheon answered 9/7, 2020 at 18:34 Comment(0)

Get the whole element you want with driver.find_elements(By.XPATH, 'path'). To extract the href link use get_attribute('href'). Which gives,

driver.find_elements(By.XPATH, 'path').get_attribute('href')

Primal answered 1/10, 2022 at 14:8 Comment(0)

The XPATH

//p[@class='sc-eYdvao kvdWiq']/a

return the elements you are looking for.

Writing the data to CSV file is not related to the scraping challenge. Just try to look at examples and you will be able to do it.

Hornbeck answered 25/2, 2019 at 9:10 Comment(0)

To crawl any hyperlink or Href, proxycrwal API is ideal as it uses pre-built functions for fetching desired information. Just pip install the API and follow the code to get the required output. The second approach to fetch Href links using python selenium is to run the following code.

Source Code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time

list = ['https://www.heliosholland.com/Ampullendoos-voor-63-ampullen','https://www.heliosholland.com/lege-testdozen’]
driver = webdriver.Chrome()
wait = WebDriverWait(driver,29)

for i in list: 
  driver.get(i)
  image = wait.until(EC.visibility_of_element_located((By.XPATH,'/html/body/div[1]/div[3]/div[2]/div/div[2]/div/div/form/div[1]/div[1]/div/div/div/div[1]/div/img'))).get_attribute('src')
  print(image)

To scrape the link, use .get_attribute(‘src’).

Hydropathy answered 3/6, 2021 at 19:4 Comment(0)

-3

try something like:

elems = driver.find_elements_by_xpath("//p[contains(@class, 'sc-eYdvao') and contains(@class='kvdWiq')]/a")
for elem in elems:
   print elem.get_attribute['href']

Vd answered 25/2, 2019 at 9:1 Comment(0)

Dynamic elements

Recommended topics

Hot tags