Tensorflow model for OCR arabic
Asked Answered
D

2

6

I am a beginner in Tensorflow and I want to build an OCR model with Tensorflow that detects Arabic words from cursive Arabic fonts (i.e. joint Arabic handwriting). Ideally, the model would be able to detect both Arabic and English. Please see the attached image of a page in a dictionary that I am currently trying to OCR. The other pages in the book have the same font and layout with both English and Arabic.

I have two questions:

(1) Would I be training with individual characters in the joint/cursive Arabic text or would I need bounding boxes for the entire words or individual characters?

(2) Are there any other OCR Tensorflow (or Keras) models available that deal with cursive writing particularly with Arabic.

A scanned page of an Arabic dictionary that I wish to apply OCR with

Dukie answered 20/1, 2018 at 16:15 Comment(0)
W
3

Tesseract, an OCR engine from Google, has an Arabic trained model.

Learn more about it here: https://github.com/tesseract-ocr/tesseract

Languages it supports are here: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages

The Arabic dataset is here: https://github.com/tesseract-ocr/tessdata/blob/master/ara.traineddata

Hope this helps!

Waddell answered 18/2, 2018 at 1:27 Comment(1)
how can i convert .traineddata to tflite extension? thanks in advanceShannanshannen
S
1

I don't think so you can use the whole page as the input image, maybe word by word is a better choice as a primitive solution, let's look at these links:

https://hackernoon.com/latest-deep-learning-ocr-with-keras-and-supervisely-in-15-minutes-34aecd630ed8

http://ai.stanford.edu/~ang/papers/ICPR12-TextRecognitionConvNeuralNets.pdf

How to create dataset in the same format as the FSNS dataset?

Shorn answered 20/1, 2018 at 16:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.