I want to delete / remove all the images in a PDF leaving only the text / font in the PDF with whatever command Line tool possible.
I tried using -dGraphicsAlphaBits=1
in a Ghostscript command but the images are present but like a big pixel.
I want to delete / remove all the images in a PDF leaving only the text / font in the PDF with whatever command Line tool possible.
I tried using -dGraphicsAlphaBits=1
in a Ghostscript command but the images are present but like a big pixel.
No, AFAIK, it's not possible to remove all images in a PDF with a commandline tool.
What's the purpose of your request anyway? Save on filesize? Remove information contained in images? Or ...?
Whatever you aim at, here is a command that will downsample all images to a resolution of 2 ppi (Update: 1 ppi doesn't work). Which achieves two goals at once:
Here's how to do it selectively, for only the images on page 33 of original.pdf
:
gs \
-o images-uncomprehendable.pdf \
-sDEVICE=pdfwrite \
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=2 \
-dGrayImageResolution=2 \
-dMonoImageResolution=2 \
-dFirstPage=33 \
-dLastPage=33 \
original.pdf
If you want to do it for all images on all pages, just skip the -dFirstPage
and -dLastPage
parameters.
If you want to remove all color information from images, convert them to Grayscale in the same command (search other answers on Stackoverflow where details for this are discussed).
Update: Originally, I had proposed to use a resolution of 1 PPI. It seems this doesn't work with Ghostscript. I now tested with 2 PPI. This works.
Update 2: See also the following (new) question with the answer:
It provides some sample PostScript code which completely removes all (raster) images from the PDF, leaving the rest of the page layout unchanged.
It also reflects the expanded new capabilities of Ghostscript which can now selectively remove either all text, or all raster images, or all vector objects from a PDF, or any combination of these 3 types.
cpdf
(his own self-made tool, which is excellent!), because cpdf is not universally available as is Ghostscript (3) cpdf
is a payware tool. Even though there is a free-of-charge version ("community edition"), this one is only legal to use for non-commercial purposes. (4) I did not ask for your reason -- I asked the OP, because it may be useful to know in... –
Harrow You can use the draft option of cpdf:
cpdf -draft in.pdf -o out.pdf
This should work in most situations, but file a bug report if it doesn't do the right thing for you.
Disclosure: I am the author of cpdf.
cpdf -remove-fonts in.pdf -o out.pdf
but it leaves corrupted fonts / black blobs. Will look into that. –
Maieutic Time has passed, and development of Ghostscript has progressed...
The latest releases have the following new command line parameters. These can be added to the command line:
-dFILTERIMAGE
: produces an output where all raster drawings are removed.
-dFILTERTEXT
: produces an output where all text elements are removed.
-dFILTERVECTOR
: produces an output where all vector drawings are removed.
Any two of these options can be combined.
Example command:
gs -o noimage.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf
More details (including some illustrative screenshots) can be found in my answer to "How can I remove all images from a PDF?".
No, AFAIK, it's not possible to remove all images in a PDF with a commandline tool.
What's the purpose of your request anyway? Save on filesize? Remove information contained in images? Or ...?
Whatever you aim at, here is a command that will downsample all images to a resolution of 2 ppi (Update: 1 ppi doesn't work). Which achieves two goals at once:
Here's how to do it selectively, for only the images on page 33 of original.pdf
:
gs \
-o images-uncomprehendable.pdf \
-sDEVICE=pdfwrite \
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=2 \
-dGrayImageResolution=2 \
-dMonoImageResolution=2 \
-dFirstPage=33 \
-dLastPage=33 \
original.pdf
If you want to do it for all images on all pages, just skip the -dFirstPage
and -dLastPage
parameters.
If you want to remove all color information from images, convert them to Grayscale in the same command (search other answers on Stackoverflow where details for this are discussed).
Update: Originally, I had proposed to use a resolution of 1 PPI. It seems this doesn't work with Ghostscript. I now tested with 2 PPI. This works.
Update 2: See also the following (new) question with the answer:
It provides some sample PostScript code which completely removes all (raster) images from the PDF, leaving the rest of the page layout unchanged.
It also reflects the expanded new capabilities of Ghostscript which can now selectively remove either all text, or all raster images, or all vector objects from a PDF, or any combination of these 3 types.
cpdf
(his own self-made tool, which is excellent!), because cpdf is not universally available as is Ghostscript (3) cpdf
is a payware tool. Even though there is a free-of-charge version ("community edition"), this one is only legal to use for non-commercial purposes. (4) I did not ask for your reason -- I asked the OP, because it may be useful to know in... –
Harrow gs -o noImages.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf
gs -o noText.pdf -sDEVICE=pdfwrite -dFILTERTEXT input.pdf
gs -o noVectors.pdf -sDEVICE=pdfwrite -dFILTERVECTOR input.pdf
gs -o onlyImages.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT input.pdf
gs -o onlyText.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf
gs -o onlyVectors.pdf -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERTEXT input.pdf
To separate images and text to different layers, unfortunately there is no Free/Open Source Software utility available. Also not a free-as-in-beer one either...
This task can only be achieved with various payware software solutions. Since you didn't exclude this in your question, but you asked for 'whatever commandline tool possible', I'll tell you my favorite one:
A version for CLI usage (which includes a powerful SDK enabling lots of low-level PDF manipulations) is available, and this is supported on all major OS platforms, including Linux.
callas offers you a fully featured gratis test license which is enabled for (I believe) 14 days.
-blur 0x0
will turn a mixed text/image PDF page into a file where you only see pixels from the image, and none from text.... –
Harrow convert -blur 0x0 in.pdf out.png
on the same pdf, but it doesnt produce the image only output here. looks like a bug at my work pc. –
Maieutic convert in.pdf -blur 0x0 out.png
. –
Harrow © 2022 - 2024 — McMap. All rights reserved.