I am trying to scrape a page on a website that requires a login and am consitently getting a 403 Error.
I have modified the code from these 2 posts for my site, Using rvest or httr to log in to non-standard forms on a webpage and how to reuse a session to avoid repeated login when scraping with rvest?
library(rvest)
pgsession <- html_session("https://www.optionslam.com/earnings/stocks/MSFT?page=-1")
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform, 'username'='user', 'password'='pass')
s <- submit_form(pgsession, filled_form) # s is your logged in session
When the code is run, I get this message:
Submitting with 'NULL'
Warning message:
In request_POST(session, url = url, body = request$values, encode = request$encode, :
Forbidden (HTTP 403).
I have also run the code this way, by updating user_agent as R.S. suggested in the comments, however, I receive the same error as above.
library(rvest)
library(httr)
uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"
pgsession <- html_session("https://www.optionslam.com/earnings/stocks/MSFT?page=-1", user_agent(uastring))
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform, 'username'='user', 'password'='pass')
s <- submit_form(pgsession, filled_form) # s is your logged in session
If you pull the page up without logging in, it shows you a bit of the data table at the bottom right below the text: "Earnings Events Available: 65"
Once logged in, it will show all 65 events and the table will be filled in which is what I want to download. I have all the code necessary to do that in place but am stuck just on the login part.
Thank you for your help.
submit_form(pgsession, pgform)
besubmit_form(pgsession, filled_form)
– Charleton