R Tesseract error on Windows 10: Wrong JPEG library version: library is 90, caller expects 80

Asked 1/7, 2022 at 7:0 Answered 10/3, 2024 at 10:43

I am trying to use Tesseract in R to scrape data from an image, however I get an error. This is the code I am using and the error:

library(tesseract)

eng <- tesseract("eng")
text <- tesseract::ocr("https://cdn.who.int/media/images/default-source/emergencies/disease-outbreak-news/table19f24bf8a-7733-400f-abaa-150c481f876a.jpg", engine = eng)

Wrong JPEG library version: library is 90, caller expects 80
Error in pixReadStreamJpeg: internal jpeg error
Error in pixReadStream: jpeg: no pix returned
Error in pixRead: pix not read
Error in FUN(X[[i]], ...) : Failed to read image

I have tried looking up the problem but almost everything on stackoverflow refers to linux and I need to get this to work on Windows 10. Any help appreciated!

Oralee answered 1/7, 2022 at 7:0 Comment(0)

I don't know why this worked, but just downloading the image using magick::image_read and passing that result into tesseract::ocr (instead of passing the URL directly) worked for me.

Yumuk answered 16/7, 2022 at 15:58 Comment(0)

Rohan Mehta's solution still works!

To make this clear for other novices, this is an example that works:

image_path <- "C:/..." #your path
img<-magick::image_read(image_path)
text <- ocr(img, engine = eng)

Whereas this did not work:

image_path <- "C:/..." #your path
text <- ocr(img_path, engine = eng)

Pseudonymous answered 10/3, 2024 at 10:43 Comment(0)

Recommended topics

Hot tags