Scraping password protected forum in r
Asked Answered
D

1

10

I have a problem with logging in in my script. Despite all other good answers that I found on stackoverflow, none of the solutions worked for me.

I am scraping a web forum for my PhD research, its URL is http://forum.axishistory.com.

The webpage I want to scrape is the memberlist - a page that lists the links to all member profiles. One can only access the memberlist if logged in. If you try to access the memberlist without logging in, it shows you the log in form.

The URL of the memberlist is this: http://forum.axishistory.com/memberlist.php.

I tried the httr-package:

library(httr)
members  <-  GET("http://forum.axishistory.com/memberlist.php", authenticate("username", "password"))
members_html <- html(members)

The output is the log in form.

Then I tried RCurl:

library(RCurl)
members_html <- htmlParse(getURL("http://forum.axishistory.com/memberlist.php", userpwd = "username:password"))
members_html

The output is the log in form - again.

Then i tried the list() function from this topic - Scrape password-protected website in R :

handle <- handle("http://forum.axishistory.com/")
path   <- "ucp.php?mode=login"

login <- list(
  amember_login = "username"
  ,amember_pass  = "password"
  ,amember_redirect_url = 
    "http://forum.axishistory.com/memberlist.php"
)

response <- POST(handle = handle, path = path, body = login)

and again! The output is the log in form.

The next thing I am working on is RSelenium, but after all these attempts I am trying to figure out whether I am probably missing something (probably something completely obvious).

I have looked at other relevant posts in here, but couldn't figure out how to apply the code to my case:

How to use R to download a zipped file from a SSL page that requires cookies

Scrape password-protected website in R

How to use R to download a zipped file from a SSL page that requires cookies

https://stackoverflow.com/questions/27485311/scrape-password-protected-https-website-in-r

Web scraping password protected website using R

Decasyllable answered 7/9, 2015 at 8:44 Comment(1)
when i click on "edited x mins ago" i can still see your data... just a hint for your next post. have you changed your login data in the forum and whereever you might use it as well? :)Buddy
D
10

Thanks to Simon I found the answer here: Using rvest or httr to log in to non-standard forms on a webpage

library(rvest)
url       <-"http://forum.axishistory.com/memberlist.php"
pgsession <-html_session(url)

pgform    <-html_form(pgsession)[[2]]

filled_form <- set_values(pgform,
                      "username" = "username", 
                      "password" = "password")

submit_form(pgsession,filled_form)
memberlist <- jump_to(pgsession, "http://forum.axishistory.com/memberlist.php")

page <- html(memberlist)

usernames <- html_nodes(x = page, css = "#memberlist .username") 

data_usernames <- html_text(usernames, trim = TRUE) 
Decasyllable answered 8/9, 2015 at 8:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.