How to resolve Java error when extracting tables from pdf using Tabulizer in R
Asked Answered
Z

1

5

I'm trying to extract tables from a pdf using the tabulizer package in R. I run the following line:

table <- extract_tables('https://fm.dk/media/17137/oekonomisk-redegoerelse-august-2019_weba.pdf', pages = 20)

However I keep getting this error:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.IllegalAccessException: class RJavaTools cannot access a member of class java.util.ArrayList$Itr (in module java.base) with modifiers "public"

I'm able to extract metadata from the pdf, so I'm pretty certain it is not problem with the installation of the tabulizer package, but more a java problem, which I'm not very experienced.

Zwickau answered 30/9, 2021 at 15:19 Comment(1)
Sounds like problem with dependencies.Sweetbread
S
1

Okay, I got this figured out, at least on my machine. With this hint from swsoyee to a sort-of similar open issue on tabulizer's GitHub page, I backed all the way down to Java 8. On the new MBPs, this means getting Java from Azul, since Oracle doesn't (yet?) put out an arm64 build for that version.

I'm sure there's a more elegant way, but I don't use Java otherwise, so I trashed all the other Java versions I'd installed before installing zulu-8.jdk. (I also had to trash the plugin, but ymmv). That did the trick:

library(tabulizer)
table <- extract_tables('https://fm.dk/media/17137/oekonomisk-redegoerelse-august-2019_weba.pdf', pages = 20)
table[[1]]
#>       [,1]                                                             [,2]  
#>  [1,] "Tabel 1.1"                                                      ""    
#>  [2,] "Centrale skøn vedrørende tilrettelæggelsen af finanspolitikken" ""    
#>  [3,] "2018"                                                           "2019"
#>  [4,] "Strukturel saldo, pct. af strukturelt BNP 0,2"                  "-0,1"
#>  [5,] "Faktisk saldo, pct. af BNP 0,6"                                 "1,9" 
#>  [6,] "ØMU-gæld, pct. af BNP 34,1"                                     "33,7"
#>  [7,] "Offentlig forbrugsvækst, pct.1) 0,7"                            "0,8" 
#>  [8,] "Ét-årig finanseffekt, pct. af BNP2) -0,2"                       "-0,1"
#>  [9,] "Outputgab, pct.3) 0,1"                                          "0,8" 
#> [10,] "Beskæftigelsesgab, pct.3) 0,2"                                  "0,7" 
#>       [,3]  
#>  [1,] ""    
#>  [2,] ""    
#>  [3,] "2020"
#>  [4,] "0,0" 
#>  [5,] "0,4" 
#>  [6,] "33,5"
#>  [7,] "0,7" 
#>  [8,] "0,0" 
#>  [9,] "1,0" 
#> [10,] "0,9"

Created on 2021-12-14 by the reprex package (v2.0.1)

Sorbitol answered 16/12, 2021 at 15:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.