I need to extract tables from pdf, these tables can be of any type, multiple headers, vertical headers, horizontal header etc.
I have implemented the basic use cases for both and found tabula doing a bit better than camelot still not able to detect all tables perfectly, and I am not sure whether it will work for all kinds or not.
So seeking suggestions from experts who have implemented similar use case.
Tabula Implementation:
import tabula
tab = tabula.read_pdf('pdfs/PDF1.pdf', pages='all')
for t in tab:
print(t, "\n=========================\n")
Camelot Implementation:
import camelot
tables = camelot.read_pdf('pdfs/PDF1.pdf', pages='all', split_text=True)
tables
for tabs in tables:
print(tabs.df, "\n=================================\n")