Scrapy crawl data inside pdf file
Asked Answered
C

1

4

I would like to know how to crawl data inside a pdf file using scrapy. Which module should I use and which is the best and effective way?? Could you please give me some sample tutorials on this

Thanks!!

Cynarra answered 8/7, 2015 at 9:10 Comment(0)
D
4

I suggest you get the PDF with Scrapy and use PyPDF2 to get the content inside the PDF.

For a complete but somewhat old (using pyPDF) example take a look at this site.

Demars answered 8/7, 2015 at 9:15 Comment(1)
Thank you for the answer.. I have tried to use the sample site you have given me but I am still getting some errors like *** PdfReadError: EOF marker not foundCynarra

© 2022 - 2024 — McMap. All rights reserved.