JS+Canvas and a basic implementation of the Viola-Jones face-recognition technique.
With some manuscript like that? I think you'll get really bad results.
You need first to detect the global horizontal inclination. (By getting the inclination you can simultaneously retrieve the line height.)
Create a 100% horizontal grid runner like:
0000000000...
1111111111...
0000000000...
where 0
checkes for light and 1
for dark areas. Let it run over your image-selection data from top-to-bottom, and to all inclinations (i.e. +-15deg max).
A positive match is when your (stripes)grid returns the threshold contrast density that matches its raster.
If the runner returns no match increase it's size and let it run again.
You need to account for mistakes so you need to store every possible positive match. After you're done with all sizes and inclinations you just pick the one that resulted with more matches.
Now you'll have the general horizontal inclination and the line height.
Now you need to define the vertical letter inclination. At the same time you can retrieve the blank spaces.
Same technique. You let run a vertical runner line-by-line (you know the line-height)
0101010
0101010
0101010
0101010
0101010
starting from 0 left to the most right. No match? change degree. Let run again.
Retrieve the run that collected more matches. You have the letter inclination.
let it run over the same line of text and collect all the information about the highlight gaps between the dark areas.
Is it doable using JavaScript?
, then answer is yes! It's definitely possible. Doable, in any turing complete language. If you're question is how to do it well that is really a broad question... – Andizhanjavascript optical character recognition
. OCR is not an easy to do thing (usually commercial software) and you may not find a ready to consume open source package. Running the OCR server-side will give you better chances as you will not be constrained by theJavaScript
platform – Lenzi