There are more than just two locations for metadata within a PDF. Thus all the answers that attempt to remove ALL metadata will usually retain some.
Best dedicated tool (cannot remove everything or else images with embedded metadata would be destroyed), is probably Coherent cpdf which can use GhostScript to fix and regenerate the file first, thus remove much of imbedded meta data based objects.
The simplest invocation is cpdf -remove-metadata in.pdf -o out.pdf
I included the word "private" into many locations within a PDF to make testing simple and clearly just parsing the file as if text will find some. There is also imbedded as 16bit MetaData the word "Google Inc"
>type PrivateMetaData.pdf |find /c /i "private"
12
>type PrivateMetaData.pdf |find /c /i "G o o g l e"
0
Text parsing does not see 16bit encoded text. However using exiftool to test the file it can report 17 instances of the word "private". Let's try to reduce that from the previous 12 found
exiftool -all:all= -overwrite_original privatemetadata.pdf
Warning: [minor] ExifTool PDF edits are reversible. Deleted tags may be recovered! - privatemetadata.pdf
1 image files updated
and test again
>type PrivateMetaData.pdf |find /c /i "G o o g l e"
0
>type PrivateMetaData.pdf |find /c /i "private"
15
So I checked the file content and Google was not removed it is still there and now there are more instances of "private" than before Exiftool was used. Hence the warning it COMPOUNDS PDF XMP data never removing it!
So I try my suggestion above to remove as much metadata as possible
>cpdf -remove-metadata privatemetadata.pdf -o metaout.pdf
For non-commercial use only
To purchase a license visit http://www.coherentpdf.com/
>type metaout.pdf |find /c "private"
2
Well, it's far better but some still remains, because I know where it may be harder to remove or standard entries. Exiftool also will not normally remove these either
cpdf metaout.pdf -info|find "private"
For non-commercial use only
To purchase a license visit http://www.coherentpdf.com/
Author: private
Subject: private
Keywords: private contains google image
Those will need to be individually altered. So last simple count was two entries but the file had not been decoded, so let's check again.
type decoded.pdf |find /c "private"
10
So some were hiding in encoded data like scripts, bookmarks and many other PDF key objects.
What is the best solution?
Answer:
- 1 Decompress the file with qpdf into a pure text format then,
- 2 Use a plain text editor to redact all the observed Meta entries.
We still can see the Google MetaData and that cannot be removed without destroying the embedded JPEG Image. Also we can see any MetaCopyright data for fonts that also cannot be removed without HEXeditor redaction.