Converting Images to Black and White for Image Recognition in R

Asked 31/1, 2018 at 22:34 Answered 16/12, 2023 at 18:37

Solved r image-processing tesseract contrast text-recognition

I'm trying to gain some experience with automatic text recognition and i'm using the package tesseract to perform ocr on some images (i.e. some screenshots I took).

To improve the performance of my program's recognition of the prices in the image below, I implemented some preprocessing on the image using the magick package by increasing the contrast of the image by changing brightness and saturation parameters.

However, I think the performance could be further increased by converting to a black and white image.

How can this be efficiently achieved in R?

Original Image

After preprocessing

Declan answered 31/1, 2018 at 22:34 Comment(0)

You can convert the colorspace with magick::image_quantize:

library(magick)
#> Linking to ImageMagick 6.9.9.25
#> Enabled features: cairo, fontconfig, freetype, fftw, lcms, pango, rsvg, webp
#> Disabled features: ghostscript, x11

i <- image_read('https://i.sstatic.net/nn9k0.png')

i

i %>% image_quantize(colorspace = 'gray')

Depending on your desired image structure, you could also use image_convert to do the same thing:

i %>% image_convert(colorspace = 'gray')
# or
i %>% image_convert(type = 'Grayscale')

or to convert to true black and white (not grayscale),

i %>% image_convert(type = 'Bilevel')

which in this case returns an image with salt and pepper noise, which may or may not be useful.

Note, however, that while this might be good practice for OCR, it would be a lot simpler to get this data by webscraping, e.g. with rvest should it be permissible (presumably the same issues apply to grabbing these images). Better, should it contain the information you need, is to use the appropriate RyanAir API.

Cavil answered 31/1, 2018 at 23:22 Comment(0)

In ImageMagick command line, you can simply threshold at some percent. I used 50% here, but adjust as desired.

convert image.png -threshold 50% result.png

In Imagick, the command is Imagick::thresholdImage. See http://php.net/manual/en/imagick.thresholdimage.php. Sorry I do not know which "Magick" package you are using.

Emanative answered 31/1, 2018 at 23:31 Comment(0)

To convert the image to black and white using magick library in R, the following steps need to be performed:

First, we convert the image to the gray color space
Then we force all pixels below the threshold to black, while leaving all pixels at or above the threshold unchanged
Finally, we force all pixels above the threshold to white while leaving all pixels at or below the threshold unchanged

The result is a black and white image. Here is a code snippet:

library(magick)
library(magrittr)

img <- image_read('https://i.sstatic.net/nn9k0.png')
img %>%
    image_convert(colorspace = "Gray") %>% 
    image_threshold(type = "black", threshold = "50%") %>%
    image_threshold(type = "white", threshold = "50%")

The threshold can be different.

Novokuznetsk answered 16/12, 2023 at 18:37 Comment(0)

Recommended topics

Hot tags