All of the accepted answers using Selenium's driver.find_elements_by_***
no longer work with Selenium 4. The current method is to use find_elements()
with the By
class.
Method 1: For loop
The below code utilizes 2 lists. One for By.XPATH
and the other, By.TAG_NAME
. One can use either-or. Both are not needed.
By.XPATH
IMO is the easiest as it does not return a seemingly useless None
value like By.TAG_NAME
does. The code also removes duplicates.
from selenium.webdriver.common.by import By
driver.get("https://www.amazon.com/")
href_links = []
href_links2 = []
elems = driver.find_elements(by=By.XPATH, value="//a[@href]")
elems2 = driver.find_elements(by=By.TAG_NAME, value="a")
for elem in elems:
l = elem.get_attribute("href")
if l not in href_links:
href_links.append(l)
for elem in elems2:
l = elem.get_attribute("href")
if (l not in href_links2) & (l is not None):
href_links2.append(l)
print(len(href_links)) # 360
print(len(href_links2)) # 360
print(href_links == href_links2) # True
Method 2: List Comprehention
If duplicates are OK, one liner list comprehension can be used.
from selenium.webdriver.common.by import By
driver.get("https://www.amazon.com/")
elems = driver.find_elements(by=By.XPATH, value="//a[@href]")
href_links = [e.get_attribute("href") for e in elems]
elems2 = driver.find_elements(by=By.TAG_NAME, value="a")
# href_links2 = [e.get_attribute("href") for e in elems2] # Does not remove None values
href_links2 = [e.get_attribute("href") for e in elems2 if e.get_attribute("href") is not None]
print(len(href_links)) # 387
print(len(href_links2)) # 387
print(href_links == href_links2) # True