I have a file on my desktop that's an HTML file. (In chrome, I right-clicked on the web page, chose "save-as" and then "Webpage, HTML"). How can I read this local file into R? Once in R I'm going to need to write some regular expressions to parse the strings and extract certain values.
Read local HTML file into R
Check out this post about parsing with RegEx! –
Thebes
use readLines
as follows
rawHTML <- paste(readLines("path/to/file.html"), collapse="\n")
Another possibility is htmltools
's includehtml()
:
rawHTML <- includeHTML('path/to/file.html')
class(rawHTML)
[1] "html" "character"
Today, a better (and faster) approach is to use xml2::read_html
which is included in the tidyverse, and can read html content from either a local file or URL.
library(xml2)
rawHTML <- read_html(x = "path/to/file.html")
Because this function can read html content from either a local file or URL, it offers input flexibility for automation built on the rvest
library for html extraction.
© 2022 - 2024 — McMap. All rights reserved.