The following script allows me to get to a website with several links with similar names. I want to get only one of them, which can be diferentiated from the others because it is printed in bold in the website. However, i could not find a way of selecting a bold link within a list.
Would anyone have ahint on this? Thanks in advance!
library(httr)
library(rvest)
sp="Alnus japonica"
res <- httr::POST(url ="http://apps.kew.org/wcsp/advsearch.do",
body = list(page ="advancedSearch",
AttachmentExist ="",
family ="",
placeOfPub ="",
genus = unlist(strsplit(as.character(sp), split=" "))[1],
yearPublished ="",
species = unlist(strsplit(as.character(sp), split=" "))[2],
author ="",
infraRank ="",
infraEpithet ="",
selectedLevel ="cont"),
encode ="form")
pg <- content(res, as="parsed")
lnks <- html_attr(html_nodes(pg,"a"),"href")
#how get the url of the link wth accepted name (in bold)?
res2 <- try(GET(sprintf("http://apps.kew.org%s", lnks[grep("id=",lnks)] [1])),silent=T)
#this gets a link but often fails to get the bold one
<b>
tag, but it doesn't seem to show up in thehttr
results, so it must be inserted after the fact somehow. – Thorley<b>
tags, so you should be able to get them that way. Like alistaire said, not sure whyhttr
is deleting them (I've no experience withhttr
, there may be an option...) – Liquorlibxml2
(which powersrvest
&XML
) is not as flexible as a browser.<b>
outside a<p>
is technically invalid HTML/XML andlibxml2
parses it that way. – Tunnell