How to add a waiting time with playwright
Asked Answered
K

1

6

I am integrating scrapy with playwright but find myself having difficulties with adding a timer after a click. Therefore, when I take a screenshot of the page after a click it's still hanging on the log-in page.

How can I integrate a timer so that the page waits a few seconds until the page loads?

The selector

  • .onetrust-close-btn-handler.onetrust-close-btn-ui.banner-close-button.onetrust-lg.ot-close-icon below was replaced with
  • .onetrust-close-btn-handler
import scrapy
from scrapy_playwright.page import PageCoroutine

class DoorSpider(scrapy.Spider):
    name = 'door'
    start_urls = ['https://nextdoor.co.uk/login/']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url, 
                callback = self.parse, 
                meta= dict(
                        playwright = True,
                        playwright_include_page = True,
                        playwright_page_coroutines = [
                        PageCoroutine("click", 
                           selector = ".onetrust-close-btn-handler"),
                        PageCoroutine("fill", "#id_email", 'my_email'),
                        PageCoroutine("fill", "#id_password",
                                                   'my_password'),
                        PageCoroutine('waitForNavigation'),
                        PageCoroutine("click", selector="#signin_button"),
                        PageCoroutine("screenshot", path="cookies.png", 
                                                    full_page=True),                        
                        ]
                )
            )

    def parse(self, response):
        yield {
            'data':response.body
        }

Kachine answered 19/2, 2022 at 20:58 Comment(0)
G
6

There are many waiting methods that you can use depending on your particular use case. Below are a sample but you can read more from the docs

  1. wait_for_event(event, **kwargs)
  2. wait_for_selector(selector, **kwargs)
  3. wait_for_load_state(**kwargs)
  4. wait_for_url(url, **kwargs)
  5. wait_for_timeout(timeout

For your question, if you need to wait until page loads, you can use below coroutine and insert it at the appropriate place in your list:

...
PageMethod("wait_for_load_state", "load"),
...

or

...
PageMethod("wait_for_load_state", "domcontentloaded"),
...

You can try any of the other wait methods if the two above don't work or you can use an explicit timeout value like 3 seconds.(this is not recommended as it will fail more often and is not optimal when webscraping)

...
PageMethod("wait_for_timeout", 3000),
...

Pass these methods inside of meta under playwright_page_methods list, like this:

from scrapy_playwright.page import PageMethod
...

def start_requests(self):

    for url in self.start_urls:

        yield scrapy.Request(url, meta=dict(
            ...,
            playwright_page_methods = [
                PageMethod("wait_for_load_state", "domcontentloaded"),
                ...              
            ]
        ))
Gabbert answered 21/2, 2022 at 2:38 Comment(1)
PageCoroutine is deprecated, replace with PageMethodAmbulant

© 2022 - 2025 — McMap. All rights reserved.