I am building an OCR project and I am using a .Net wrapper for Tesseract. The samples that the wrapper have don't show how to deal with a PDF as input. Using a PDF as input how do I produce a searchable PDF using c#?
- I have use ghostscript library to change Pdf to image then feed Tesseract with it and it's working great getting the text but i doesn't save the original shape of Pdf i only get text
how can i get text from Pdf with saving the shape of original Pdf
this is a page from pdf i don't want only text i want the text to be in the shapes like the original pdf and sorry for poor English
Image
. And then use that same library to create the searchable PDF. – Rothermere