Scan and read a document with tick boxes
Asked Answered
R

2

5

I have a request from a customer who wishes to provide meals to elderly people in different localities. To do this the people fill out a form for the week and tick boxes depending on their choices for each day (it also takes into account specific requirements).

For example :

 Name
 Commune

                  With salt ( )      Without salt []

Mon :       Meal 1 ( )                   Meal 2 ( )
           Dessert 1 ( )                 Dessert ( )

Tues :       Meal 1 ( )                   Meal 2 ( )
           Dessert 1 ( )                 Dessert ( )

The data from each sheet should then be compiled to tell us how many of each type of meal to prepare each day for each commune...

The sheets are all the same, so I am hoping to be able to scan them in and automatically read them.

I do not know of any software that allows me to do this. What is the best way of accomplishing this task? At the moment I am looking at tesseract, but maybe there is some simpler technique?

EDIT: we are talking about several hundred forms a week. ideally we will scan them at the same time, extract the data and store the forms electronically.

Rankle answered 15/5, 2013 at 8:28 Comment(0)
S
11

You are looking not for OCR, which implies reading machine-printed characters. You are looking for ICR/OMR software, which is also known as form processing or data capture. OMR stands for Optical Mark Recognition, which is what you are trying to do, recognize value of checkmarks/checkboxes.

Additional info about handwriting recognition is here: ICR for machine printed text?

Because your forms are the same, that means your forms fall into category of "fixed forms" and a template-based software package can process those forms. Here is a short document explaining differences between form types: www.wisetrend.com/files/Structured_vs_Semi-Structured.pdf

Your blank form itself should also be designed properly for machine recognition. It should have reference marks for better alignment of template, clear flow so users know how to fill it out naturally, check boxes of appropriate size, etc.

I believe FlexiCapture will do everything you need: link. There are at least several other solutions that can perform the similar process. I work as an integrator/consultant for paper-based form-processing projects.

I removed your "mobile" tag, as I believe you are not planning to use a cell phone to capture these images. If you are, I would advise against that if you have other options. You mentioned scanning them on a conventional scanner, which is the best option to achieve good image quality. Trust me, you will have enough to deal with when processing human handwritten forms, so optimize your forms, scanning, software and process as much as possible.

If you are interested to develop it yourself, it is possible. The process is to compare an image area (each checkmark) with some 'baseline' to see if there is additional hand-writing for that area. If over some threshold, then the checkmark has been checked. Typical issues are alignment of areas and borderline threshold levels (small/light tick mark). Commercial packages handle that automatically.

Please let me know if you need any additional guidance.

ilya evdokimov

Sufi answered 16/5, 2013 at 0:9 Comment(1)
Thanks, fantastic answer. We'll have a look at commercial software first, and if necessary I may develop something custom.Rankle
B
0

10 years on, let me update this answer with recent developments. We now have ChatGPT and I have played with it for scanning handwritten paper forms. It does pretty well when in recognising handwritten characters in the form. I would second guessing on some bad handwriting, but OpenAI Vision API solves and accurately recognises the written word/letter.

There is only one issue that I am facing, that is about reading the checkboxes. Around 80% of the time it is able to read the checked box correctly but I do not understand why it gets it wrong the rest of the time.

Borosilicate answered 21/12, 2023 at 1:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.