I am trying to extract the link inside a href but all I am finding it is the text inside the element
The website code is the following:
<div class="item-info-container ">
<a href="/imovel/32600863/" role="heading" aria-level="2" class="item-link xh-highlight"
title="Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga">
Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga
</a>
And the code I am using is:
element_handle = page.locator('//div[@class="item-info-container "]//a').all_inner_texts()
No matter if I specify //a[@href]
or not, my output is always the title text:
Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga
When what I really want to achieve is:
/imovel/32600863/
Any ideas of where my logic is failing me?
<a>
element. Once you have that element, you need to useget_attribute
to fetch itshref
attribute. Playwright was not designed for web scraping. Why are you using it? There are several packages that were designed specifically for scraping. – Intolerancelocator.wait_for(state='visible')
andlocator.scroll_into_view_if_needed()
. What should be used instead of Playwright for scraping dynamic content? – Disapprobation