How to simulate a button click in a request?
Asked Answered
V

1

4

Please do not close this question - this is not a duplicate. I need to click the button using Python requests, not Selenium, as here

I am trying to scrape Reverso Context translation examples page. And I have a problem: I can get only 20 examples and then I need to click the "Display more examples" button lots of times while it exists on the page to get the full results list. It can simply be done using a web browser, but how can I do it with Python Requests library?

I looked at the button's HTML code, but I couldn't find an onclick attribute to look at JS script attached to it, and I don't understand what request I need to send:

<button id="load-more-examples" class="button load-more " data-default-size="14px">Display more examples</button>

And here is my Python code:

from bs4 import BeautifulSoup
import requests
import re


with requests.Session() as session:  # Create a Session
    # Log in
    login_url = 'https://account.reverso.net/login/context.reverso.net/it?utm_source=contextweb&utm_medium=usertopmenu&utm_campaign=login'
    session.post(login_url, "[email protected]&Password=sample",
           headers={"User-Agent": "Mozilla/5.0", "content-type": "application/x-www-form-urlencoded"})

    # Get the HTML
    html_text = session.get("https://context.reverso.net/translation/russian-english/cat", headers={"User-Agent": "Mozilla/5.0"}).content

    # And scrape it
    for word_pair in BeautifulSoup(html_text).find_all("div", id=re.compile("^OPENSUBTITLES")):
        print(word_pair.find("div", class_="src ltr").text.strip(), "=", word_pair.find("div", class_="trg ltr").text.strip())

Note: you need to log in, otherwise it will show only first 10 examples and will not show the button. You may use this real authentication data:
E-mail: [email protected]
Password: sample

Vulgus answered 22/2, 2020 at 14:25 Comment(17)
Does this answer your question? invoking onclick event with beautifulsoup pythonEvalyn
Thank you very much for your efforts, though unfortunately this doesn't answer my questionVulgus
I use requests, and they use selenium webdriverVulgus
you can't do that with requests https://mcmap.net/q/964665/-quot-clicking-quot-button-with-requestsEvalyn
It can simply be done using a web browser, but how can I do it with Python Requests library? You can't, Requests does not execute JavaScript or anything like that.Gadgeteer
Does this answer your question? "Clicking" button with requestsGadgeteer
How does the first question which @evgenifotia shared not answer the question, by the way?Gadgeteer
@evgenifotia, thank you very much for your answer! :) I have a very good day today and I could do it with requests. I have answered this question below.Vulgus
@AMC, thank you very much for your answer! :) I agree with you, that it can't execute JS, but in some cases (like this one, for example), it is possible to explore the browser's behavior in more detail: get the requests it sends and try to do the same thing using Python requests (I have described how to do this in more detail in the answer below).Vulgus
in some cases (like this one, for example), it is possible to explore the browser's behavior in more detail: get the requests it sends and try to do the same thing using Python requests Of course, but I wouldn't call that simulating a button press.Gadgeteer
@AMC, considering the link to the question you gave, it was very useful for me - it put the idea to explore the browser's behavior into my head. Thank you very much! :)Vulgus
@AMC, I think you are right)Vulgus
@AMC, and considering the first question which evgenfotia gave, it is very useful, but not in my case, because my app shouldn't have any dependencies (except Python site-packages which can be installed using pip). And Selenium, in contradistinction to requests, is browser-dependent and requires the chrome-driver to be installed manually by the user.Vulgus
@DemianWolf You're welcome, I'm glad you got lucky and were able to find a way to make the requests directly!Gadgeteer
@DemianWolf Do you not count something as a dependency if it can be installed with pip, or am I misunderstanding something?Gadgeteer
@AMC, I was very glad to talk to you :)Vulgus
@AMC, no, I count it as a dependency. I have written (except Python...)Vulgus
V
6

Here is a solution that gets all the example sentences using requests and removes all the HTML tags from them using BeautifulSoup:

from bs4 import BeautifulSoup
import requests
import json


headers = {
    "Connection": "keep-alive",
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
    "Content-Type": "application/json; charset=UTF-8",
    "Content-Length": "96",
    "Origin": "https://context.reverso.net",
    "Sec-Fetch-Site": "same-origin",
    "Sec-Fetch-Mode": "cors",
    "Referer": "https://context.reverso.net/^%^D0^%^BF^%^D0^%^B5^%^D1^%^80^%^D0^%^B5^%^D0^%^B2^%^D0^%^BE^%^D0^%^B4/^%^D0^%^B0^%^D0^%^BD^%^D0^%^B3^%^D0^%^BB^%^D0^%^B8^%^D0^%^B9^%^D1^%^81^%^D0^%^BA^%^D0^%^B8^%^D0^%^B9-^%^D1^%^80^%^D1^%^83^%^D1^%^81^%^D1^%^81^%^D0^%^BA^%^D0^%^B8^%^D0^%^B9/cat",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",
}

data = {
    "source_text": "cat",
    "target_text": "",
    "source_lang": "en",
    "target_lang": "ru",
    "npage": 1,
    "mode": 0
}

npages = requests.post("https://context.reverso.net/bst-query-service", headers=headers, data=json.dumps(data)).json()["npages"]
for npage in range(1, npages + 1):
    data["npage"] = npage
    page = requests.post("https://context.reverso.net/bst-query-service", headers=headers, data=json.dumps(data)).json()["list"]
    for word in page:
        print(BeautifulSoup(word["s_text"]).text, "=", BeautifulSoup(word["t_text"]).text)

At first, I got the request from the Google Chrome DevTools:

  1. Pressed F12 key to enter it and selected the Network Tab
  2. Clicked the "Display more examples" button
  3. Found the last request ("bst-query-service")
  4. Right-clicked it and selected Copy > Copy as cURL (cmd)

Then, I opened this online-tool, insert the copied cURL to the textbox on the left and copied the output on the right (use Ctrl-C hotkey for this, otherwise it may not work).

After that I inserted it to the IDE and:

  1. Removed the cookies dict - it is not necessary here
  2. Important: Rewrote the data string as a Python dictionary and wrapped it with json.dumps(data), otherwise, it returned a request with empty words list.
  3. Added a script, that: gets a number of times to fetch the words ("pages") and created a for loop that gets words this number of times and prints them without HTML tags (using BeautifulSoup)

UPD:
For those, who visited the question to learn how to work with Reverso Context (not just to simulate a button click request on other website) there is a Python wrapper for Reverso API released: Reverso-API. It can do the same thing as above but much simpler:

from reverso_api.context import ReversoContextAPI


api = ReversoContextAPI("cat", "", "en", "ru")
for source, target in api.get_examples_pair_by_pair():
    print(highlight_example(source.text), "==", highlight_example(target.text))
Vulgus answered 22/2, 2020 at 19:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.