Read local HTML file into R
Asked Answered
E

3

16

I have a file on my desktop that's an HTML file. (In chrome, I right-clicked on the web page, chose "save-as" and then "Webpage, HTML"). How can I read this local file into R? Once in R I'm going to need to write some regular expressions to parse the strings and extract certain values.

Effortless answered 29/9, 2014 at 20:33 Comment(1)
Check out this post about parsing with RegEx!Thebes
B
30

use readLines as follows

 rawHTML <- paste(readLines("path/to/file.html"), collapse="\n")
Brucie answered 29/9, 2014 at 20:38 Comment(0)
S
2

Another possibility is htmltools's includehtml():

rawHTML <- includeHTML('path/to/file.html')

class(rawHTML)
[1] "html"      "character"
Swetiana answered 9/1, 2023 at 12:57 Comment(0)
B
1

Today, a better (and faster) approach is to use xml2::read_html which is included in the tidyverse, and can read html content from either a local file or URL.

library(xml2)
rawHTML <- read_html(x = "path/to/file.html")

Because this function can read html content from either a local file or URL, it offers input flexibility for automation built on the rvest library for html extraction.

Binucleate answered 28/3, 2022 at 23:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.