Background
The idea is this:
- Person provides contact information for online book purchase
- Book, as a PDF, is marked with a unique hash
- Person downloads book
- PDF passwords are easy to circumvent, or share
The ideal process would be something like:
- Generate hash based on contact information
- Store contact information and hash in database
- Acquire book lock
- Update an "include" file with hash text
- Generate book as PDF (using
pdflatex
) - Apply hash to book
- Release book lock
- Send email with book download link
Technologies
The following technologies can be used (other programming languages are possible, but libraries will likely be limited to those supplied by the host):
- C, Java, PHP
- LaTeX files
- PDF files
- Linux
Question
What programming techniques (or open source software) should I investigate to:
- Embed a unique hash (or other mark) to a PDF
- Create a collusion-attack resistant mark
- Develop a non-fragile (e.g.,
PDF -> EPS -> PDF
still contains the mark) solution
Research
I have looked at the following possibilities:
- Steganography
- Natural Language Processing (NLP)
- Convert blank pages in PDF to images; mark those images; reassemble PDF
- LaTeX watermark package
- ImageMagick
Issues
The possible solutions I have researched have the following issues:
- Steganography. (a) Requires a master copy of the images, which are converted to EPS, which is CPU-intensive and time-consuming; (b) would the watermark survive
PDF -> EPS -> PDF
, or other types of conversion; (c) most images are drawings or screen captures, not photographs in PNG format. - LaTeX. Creates an image cache; any steganographic solution would have to intercept that process somehow.
- NLP. Introduces grammatical errors; could change meaning of technical words.
- Blank Pages. Immediately suspect; it is easy to replace suspicious blank pages.
- Watermark Package. Draws visible marks.
- ImageMagick. Draws visible marks.
What other solutions are possible?
Related Links
Thank you!