How can I convert pdf to asciidoc using pandoc?
Asked Answered
O

1

8

I am trying to convert pdf book to asciidoc document.I have tried the following command:

pandoc -s s.pdf -t asciidoc -o example28.txt

I got "Unknown reader" problem.

q@q-ABRA-A5-V12-1:~/Downloads$ pandoc -s s.pdf -t asciidoc -o example28.txt
pandoc: Unknown reader: pdf
Pandoc can convert to PDF, but not from PDF.

How can I fix this or is there another way to convert from pdf to asciidoc?

Ophthalmoscopy answered 5/9, 2018 at 11:40 Comment(4)
pandoc doesn't read pdfs, only produces them. but you could try less s.pdf | pandoc -t asciidocPersse
When I try this command I get "pandoc: Unknown reader: plain" error.Ophthalmoscopy
ah right, you leave the -f, it will default to markdown... but probably you want a dedicated tool anyway. but stackoverflow is probably the wrong place to ask for that. also depends on your plattform / needs.Persse
See also this more generic question: Python module for converting PDF to text which has many more answers.Customer
S
9

Have you tried pdf2txt? https://pypi.org/project/pdfminer/ It's one of the tools provided there.

Sinus answered 5/9, 2018 at 11:55 Comment(3)
seems to go to HTML, and then you can use pandoc to go from HTML to asciidoc: pdf2txt.py -t html input.pdf | pandoc -f html -t asciidocPersse
Thanks a lot. I have converted pdf to asciidoc but I have extra newline problem which is probably caused extra <br> blocks on html.How can I fix this problem? From : i.imgur.com/QJ3Mx0n.png To:i.imgur.com/XoURhd9.pngOphthalmoscopy
As of 2020, PDFMiner is not actively maintained. This is the community maintained fork: pdfminer.six.Customer

© 2022 - 2024 — McMap. All rights reserved.