You can use tryCatch
to catch errors and return something in particular (just try(read_html('http://tweg.com'), silent = TRUE)
will work if you just want to return the error and continue). You'll need to pass tryCatch
a function for what to return when error is caught, which you can structure as you like.
library(rvest)
tryCatch(read_html('http://tweg.com'),
error = function(e){'empty page'}) # just return "empty page"
#> [1] "empty page"
tryCatch(read_html('http://tweg.com'),
error = function(e){list(result = 'empty page',
error = e)}) # return error too
#> $result
#> [1] "empty page"
#>
#> $error
#> <Rcpp::exception in eval(substitute(expr), envir, enclos): Failed to parse text>
The purrr
package also contains two functions possibly
and safely
that do the same thing, but accept more flexible function definitions. Note that they are adverbs, and thus return a function that still must be called, which is why the URL is in parentheses after the call.
library(purrr)
possibly(read_html, 'empty page')('http://tweg.com')
#> [1] "empty page"
safely(read_html, 'empty page')('http://tweg.com')
#> $result
#> [1] "empty page"
#>
#> $error
#> <Rcpp::exception in eval(substitute(expr), envir, enclos): Failed to parse text>
A typical usage would be to map the resulting function across a vector of URLs:
c('http://tweg.com', 'http://wikipedia.org') %>%
map(safely(read_html, 'empty page'))
#> [[1]]
#> [[1]]$result
#> [1] "empty page"
#>
#> [[1]]$error
#> <Rcpp::exception in eval(substitute(expr), envir, enclos): Failed to parse text>
#>
#>
#> [[2]]
#> [[2]]$result
#> {xml_document}
#> <html lang="mul" dir="ltr" class="no-js">
#> [1] <head>\n <meta charset="utf-8"/>\n <title>Wikipedia</title>\n <me ...
#> [2] <body id="www-wikipedia-org">\n<h1 class="central-textlogo" style="f ...
#>
#> [[2]]$error
#> NULL
tryCatch
:tryCatch(read_html('http://tweg.com'), error = function(e){'empty page'})
or its tidyverse (purrr) versions,possibly
andsafely
. – ButyltryCatch(read_html('http://tweg.com'), error = function(e){list(result = 'empty page', error = e)})
, which returns the same thing assafely
. – Butyl