I have searched stackoverflow extensively before posting this and have not been able to find anything on camelot page dimensions. There is this question, which suggests using table_region
but that does not solve OP's problem or mine. I unfortunately cannot comment to follow up with OP and see if they found a solution.
What I am trying to do:
I am using Camelot to identify tables (obviously). Sometimes, when I know the region of the page that might contain a table of interest, I want to search only in that region. This is easily done using camelot.read_pdf()
's table_region
kwarg - I just need to provide a pair of coordinates for Camelot to search.
The issue is, I get these coordinates using PyMuPDF, so they are in PyMuPDF's coordinate system. I have figured out how to translate these coordinates but I am missing one key piece of information from Camelot - the dimensions of the page. These values are easy to get in PyMuPDF (the Page class .bound()
attribute) and I need the Camelot equivalent. I can provide a further explanation of the algebra here if anyone thinks maybe there is alternative between
What I have tried so far:
I read the documentation. Because of this line in the documentation, I am wondering if this might provide a way to get the dimensions: "There might be cases while using Lattice when smaller lines don’t get detected. The size of the smallest line that gets detected is calculated by dividing the PDF page’s dimensions with a scaling factor called line_scale
. By default, its value is 15"
I am open to alternatives, essentially I either want to check if a region of the page contains a table (region described in the PyMuPDF coordinate system, which for a pdf page the dimensions are typically (612, 792) with the origin in the top left corner. The origin for camelot is in the bottom left corner) or if any tables on the page are in a given region, if that makes sense.
shape
property gives the x and y dimensions – Fenniecv2.imread
) or just convert it to a np.array, thenimg.shape[1]
is the width andimg.shape[0]
is the height – Fennie