Extract Text with its Font Details (Style,Size,color,Italic etc) from a PDF in Python [closed]
Asked Answered
C

1

7

I am looking to Extract Text with its Font Details (Style,Size,color,Italic etc) from a PDF in Python.

I need to extract text and its metadata for translation purpose.Can anyone suggest any libraries for the same.

Chlamydate answered 21/2, 2014 at 6:20 Comment(1)
Is it possible to do that without using a library?Saltzman
Z
1

There is a python library for that. Please have a look at PDFMiner.

http://www.unixuser.org/~euske/python/pdfminer/index.html.

pdftext.py gives you the text extracted out of pdf and it also gives you other information like font and font size etc.

You can try that.

Note: Python 3 is not supported

Zoomorphism answered 21/2, 2014 at 6:59 Comment(2)
Python 3 is supported under PDFMiner.six.Harmony
Currently PDFMiner does not allow to extract information about font colorPeriphrastic

© 2022 - 2024 — McMap. All rights reserved.