OCR: Image to text?

Asked 6/11, 2012 at 9:18 Answered 29/11, 2012 at 19:15

Solved ios ocr xcode4.5 tesseract leptonica

Before mark as copy or repeat question, please read the whole question first.

I am able to do at pressent is as below:

To get image and crop the desired part for OCR.
Process the image using tesseract and leptonica.
When the applied document is cropped in chunks ie 1 character per image it provides 96% of accuracy.
If I don't do that and the document background is in white color and text is in black color it gives almost same accuracy.

For example if the input is as this photo :

Photo start

enter image description here

Photo end

What I want is to able to get the same accuracy for this photo enter image description here
without generating blocks.

The code I used to init tesseract and extract text from image is as below:

For init of tesseract

in .h file

tesseract::TessBaseAPI *tesseract;
uint32_t *pixels;

in .m file

tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tesseract->SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("tessedit_flip_0O", "1");
tesseract->SetVariable("tessedit_single_match", "0");
tesseract->SetVariable("textord_noise_normratio", "5");
tesseract->SetVariable("matcher_avg_noise_size", "22");
tesseract->SetVariable("image_default_resolution", "450");
tesseract->SetVariable("editor_image_text_color", "40");
tesseract->SetVariable("textord_projection_scale", "0.25");
tesseract->SetVariable("tessedit_minimal_rejection", "1");
tesseract->SetVariable("tessedit_zero_kelvin_rejection", "1");

For get text from image

- (void)processOcrAt:(UIImage *)image
{
    [self setTesseractImage:image];

    tesseract->Recognize(NULL);
    char* utf8Text = tesseract->GetUTF8Text();
    int conf = tesseract->MeanTextConf();

    NSArray *arr = [[NSArray alloc]initWithObjects:[NSString stringWithUTF8String:utf8Text],[NSString stringWithFormat:@"%d%@",conf,@"%"], nil];

    [self performSelectorOnMainThread:@selector(ocrProcessingFinished:)
                           withObject:arr
                        waitUntilDone:YES];
    free(utf8Text);
}

- (void)ocrProcessingFinished0:(NSArray *)result
{
    UIAlertView *alt = [[UIAlertView alloc]initWithTitle:@"Data" message:[result objectAtIndex:0] delegate:self cancelButtonTitle:nil otherButtonTitles:@"OK", nil];
   [alt show];
}

But I don't get proper output for the number plate image either it is null or it gives some garbage data for the image.

And if I use the image which is the first one ie white background with text as black then the output is 89 to 95% accurate.

Please help me out.

Any suggestion will be appreciated.

Update

Thanks to @jcesar for providing the link and also to @konstantin pribluda to provide valuable information and guide.

I am able to convert images in to proper black and white form (almost). and so the recognition is better for all images :)

Need help with proper binarization of images. Any Idea will be appreciated

Brout answered 6/11, 2012 at 9:18 Comment(6)

maybe you can try to manipulate the image before trying to recognize the text, for example change every not black(or close to black) pixel color to white. Right now I don't have the objective-c code for doing this, but I'm sure it can be done. – Fernery 6/11, 2012 at 9:38

I have though for it but same here I am not able to implement it. – Brout 6/11, 2012 at 9:45

Read the links on the accepted answer https://mcmap.net/q/901391/-change-a-color-in-a-uiimage – Fernery 6/11, 2012 at 9:51

THANKS FOR YOUR REPLY. NOW I GOT SOME WAY TO DO IT. TY :) – Brout 6/11, 2012 at 10:16

@jcesar thanks for your suggestion. I get code from the link you have posted and currently try to make my code working correctly :) – Brout 6/11, 2012 at 12:13

@Claric PWI which OCR Library you used. I am starting to work on same kind of project. Your help is appreciated. – Weal 20/2, 2014 at 7:26

Hi all Thanks for your replies, from all of that replies I am able to get this conclusion as below:

I need to get the only one cropped image block with number plate contained in it.
From that plate need to find out the portion of the number portion using the data I got using the method provided here.
Then converting the image data to almost black and white using the RGB data found through the above method.
Then the data is converted to the Image using the method provided here.

Above 4 steps are combined in to one method like this as below :

-(void)getRGBAsFromImage:(UIImage*)image
{
    NSInteger count = (image.size.width * image.size.height);
    // First get the image into your data buffer
    CGImageRef imageRef = [image CGImage];
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
    NSUInteger bytesPerPixel = 4;
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef context = CGBitmapContextCreate(rawData, width, height,
                                                 bitsPerComponent, bytesPerRow, colorSpace,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);

    CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(context);

    // Now your rawData contains the image data in the RGBA8888 pixel format.
    int byteIndex = 0;
    for (int ii = 0 ; ii < count ; ++ii)
    {
        CGFloat red   = (rawData[byteIndex]     * 1.0) ;
        CGFloat green = (rawData[byteIndex + 1] * 1.0) ;
        CGFloat blue  = (rawData[byteIndex + 2] * 1.0) ;
        CGFloat alpha = (rawData[byteIndex + 3] * 1.0) ;

        NSLog(@"red %f \t green %f \t blue %f \t alpha %f rawData [%d] %d",red,green,blue,alpha,ii,rawData[ii]);
        if(red > Required_Value_of_red || green > Required_Value_of_green || blue > Required_Value_of_blue)//all values are between 0 to 255
        {
            red = 255.0;
            green = 255.0;
            blue = 255.0;
            alpha = 255.0;
            // all value set to 255 to get white background.
        }
        rawData[byteIndex] = red;
        rawData[byteIndex + 1] = green;
        rawData[byteIndex + 2] = blue;
        rawData[byteIndex + 3] = alpha;

        byteIndex += 4;
    }

    colorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef bitmapContext = CGBitmapContextCreate(
                                                       rawData,
                                                       width,
                                                       height,
                                                       8, // bitsPerComponent
                                                       4*width, // bytesPerRow
                                                       colorSpace,
                                                       kCGImageAlphaNoneSkipLast);

    CFRelease(colorSpace);

    CGImageRef cgImage = CGBitmapContextCreateImage(bitmapContext);

    UIImage *img = [UIImage imageWithCGImage:cgImage];

    //use the img for further use of ocr

    free(rawData);
}

Note:

The only drawback of this method is the time consumed and the RGB value to convert to white and other to black.

UPDATE :

    CGImageRef imageRef = [plate CGImage];
    CIContext *context = [CIContext contextWithOptions:nil]; // 1
    CIImage *ciImage = [CIImage imageWithCGImage:imageRef]; // 2
    CIFilter *filter = [CIFilter filterWithName:@"CIColorMonochrome" keysAndValues:@"inputImage", ciImage, @"inputColor", [CIColor colorWithRed:1.f green:1.f blue:1.f alpha:1.0f], @"inputIntensity", [NSNumber numberWithFloat:1.f], nil]; // 3
    CIImage *ciResult = [filter valueForKey:kCIOutputImageKey]; // 4
    CGImageRef cgImage = [context createCGImage:ciResult fromRect:[ciResult extent]];
    UIImage *img = [UIImage imageWithCGImage:cgImage];

Just replace the above method's(getRGBAsFromImage:) code with this one and the result is same but the time taken is just 0.1 to 0.3 second only.

Brout answered 7/11, 2012 at 9:10 Comment(5)

This takes an incredibly long time but appears to be doing what I want. Any way to use something like this with GPUImage or something similar? – Southwestwardly 7/11, 2012 at 23:1

yep that's right for say 250 X 55 pix image take almost 1.5 min (almost) but gives the 99% accuracy. Do you know or have some suggestion about how to lover the time required? :) – Brout 8/11, 2012 at 5:12

I don't have any suggestions for lowering it using this method, I am using a combination of image preprocessing and Tess to get 100% accurate results on what I'm working on. I'll give it a try with your image and see if I can get similar good results, if it works out I'll post here as an answer. – Southwestwardly 10/11, 2012 at 17:27

Sure I am waiting for it and ASAP I will test it and if it is working for me I accpet the answer. – Brout 11/11, 2012 at 4:39

@TheiOSDev By using this code it is converting img into black and white but how did u get the text from the image? – Ballon 4/1, 2018 at 7:18

I was able to achieve near instant results using the demo photo provided as well as it generating the correct letters.

I pre-processed the image using GPUImage

// Pre-processing for OCR
GPUImageLuminanceThresholdFilter * adaptiveThreshold = [[GPUImageLuminanceThresholdFilter alloc] init];
[adaptiveThreshold setThreshold:0.3f];
[self setProcessedImage:[adaptiveThreshold imageByFilteringImage:_image]];

And then sending that processed image to TESS

- (NSArray *)processOcrAt:(UIImage *)image {
    [self setTesseractImage:image];

    _tesseract->Recognize(NULL);
    char* utf8Text = _tesseract->GetUTF8Text();

    return [self ocrProcessingFinished:[NSString stringWithUTF8String:utf8Text]];
}

- (NSArray *)ocrProcessingFinished:(NSString *)result {
    // Strip extra characters, whitespace/newlines
    NSString * results_noNewLine = [result stringByReplacingOccurrencesOfString:@"\n" withString:@""];
    NSArray * results_noWhitespace = [results_noNewLine componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
    NSString * results_final = [results_noWhitespace componentsJoinedByString:@""];
    results_final = [results_final lowercaseString];

    // Separate out individual letters
    NSMutableArray * letters = [[NSMutableArray alloc] initWithCapacity:results_final.length];
    for (int i = 0; i < [results_final length]; i++) {
        NSString * newTile = [results_final substringWithRange:NSMakeRange(i, 1)];
        [letters addObject:newTile];
    }

    return [NSArray arrayWithArray:letters];
}

- (void)setTesseractImage:(UIImage *)image {
    free(_pixels);

    CGSize size = [image size];
    int width = size.width;
    int height = size.height;

    if (width <= 0 || height <= 0)
        return;

    // the pixels will be painted to this array
    _pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));
    // clear the pixels so any transparency is preserved
    memset(_pixels, 0, width * height * sizeof(uint32_t));

    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

    // create a context with RGBA pixels
    CGContextRef context = CGBitmapContextCreate(_pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
                                                 kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);

    // paint the bitmap to our context which will fill in the pixels array
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), [image CGImage]);

    _tesseract->SetImage((const unsigned char *) _pixels, width, height, sizeof(uint32_t), width * sizeof(uint32_t));
}

This left ' marks for the - but these are also easy to remove. Depending on the image set that you have you may have to fine tune it a bit but it should get you moving in the right direction.

Let me know if you have problems using it, it's from a project I'm using and I didn't want to have to strip everything out or create a project from scratch for it.

Southwestwardly answered 29/11, 2012 at 19:15 Comment(3)

thanks for you reply. I surly try it out. But at moment I got it worked with CoreImage.framework of apple's default image processing framework and using it's default filters I got my image in pure black and white color very easily and it takes just 0.1 to 0.3 second. And provides perfect results for almost all types of image I try it on. – Brout 30/11, 2012 at 5:53

You should update your answer to include the new method you're using so that others can benefit from it. – Southwestwardly 7/12, 2012 at 16:18

See my update in my answer I have put the code for it dude. I know that this is the two way site. – Brout 8/12, 2012 at 5:43

I daresay that tesseract will be overkill for your purpose. You do not need dictionary matching to improve recognition quality ( you do not have this dictionary , but maybe means to compute checksum on license number ), and you have font optimised for OCR. And best of all, you have markers (orange and blue color areas nearby are good) to find region in the image.

I my OCR apps I use human assisted area of interest retrieval ( just aiming help overlay over camera preview). Usually ones uses something like haar cascade to locate interesting features like faces. You may also calculate centroid of orange area, or just bounding box of orange pixels simply by traversing all the image and stoing leftmost / rightmost / topmost / bottommost pixels of suitable color

As for recognition itselff I would recommend to use invariant moments ( not sure whether implemented in tesseract, but you can easily port it from out java project: http://sourceforge.net/projects/javaocr/ )

I tried my demo app on monitor image and it recognized digits on the sport (is not trained for characters)

As for binarisation ( separating black from white ) I would recommend sauvola method as this gives best tolerance to luminance changes ( also implemented in our OCR project )

Telstar answered 6/11, 2012 at 11:18 Comment(2)

Yes that is right but I don't know how to get the perfect area and how to get text without doing some block generation ie need to crop images in 1 char per image blocks and then doing ocr will generate good result otherwise it just gives garbage values. – Brout 6/11, 2012 at 12:9

hi @Konstantin, I have updated my answer. I just get some way to resolve the issue with just .3 to .5 second average time. And again thanks for your suggestion, as it helps me a-lot to get to the derived solution. – Brout 8/12, 2012 at 5:46

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags