I get a file via a HTTP upload and need to make sure its a PDF file. The programing language is Python, but this should not matter.
I thought of the following solutions:
Check if the first bytes of the string are
%PDF
. This is not a good check but prevents the user from uploading other files accidentally.Use
libmagic
(thefile
command inbash
uses it). This does exactly the same check as in (1)Use a library to try to read the page count out of the file. If the lib is able to read a page count it should be a valid PDF file. Problem: I don't know a Python library that can do this
Are there solutions using a library or another trick?
incorrect startxref pointer(1)
– Bandaranaike