I want to remove all images from a PDF file.
The page layouts should not change. All images should be replaced by empty space.
- How can this be achieved with the help of Ghostscript and the appropriate PostScript code?
I want to remove all images from a PDF file.
The page layouts should not change. All images should be replaced by empty space.
I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.
I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.
Copy the following code and save it as remove-images.ps
:
%!PS
% Run as:
%
% gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
% ..... remove-images.ps <your-input-file>
%
% derived from Chris Liddell's original 'filter-obs.ps' script
% Adapted by @pdfkungfoo (on Twitter)
currentglobal true setglobal
32 dict begin
/debugprint { systemdict /DUMPDEBUG .knownget { {print flush} if}
{pop} ifelse } bind def
/pushnulldevice {
systemdict exch .knownget not
{
//false
} if
{
gsave
matrix currentmatrix
nulldevice
setmatrix
} if
} bind def
/popnulldevice {
systemdict exch .knownget not
{
//false
} if
{
% this is hacky - some operators clear the current point
% i.e.
{ currentpoint } stopped
{ grestore }
{ grestore moveto} ifelse
} if
} bind def
/sgd {systemdict exch get def} bind def
systemdict begin
/_image /image sgd
/_imagemask /imagemask sgd
/_colorimage /colorimage sgd
/image {
(\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
_image
/FILTERIMAGE //popnulldevice exec
} bind def
/imagemask
{
(\nIMAGEMASK\n) //debugprint exec
/FILTERIMAGE //pushnulldevice exec
_imagemask
/FILTERIMAGE //popnulldevice exec
} bind def
/colorimage
{
(\nCOLORIMAGE\n) //debugprint exec
/FILTERIMAGE //pushnulldevice exec
_colorimage
/FILTERIMAGE //popnulldevice exec
} bind def
end
end
.bindnow
setglobal
Now run this command:
gs -o no-more-images-in-sample.pdf \
-sDEVICE=pdfwrite \
-dFILTERIMAGE \
-dDELAYBIND \
-dWRITESYSTEMDICT \
remove-images.ps \
sample.pdf
I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:
If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.
/lib/
subdir. What you let me hope for, is even better :) –
Solipsism remove-images.ps
is missing - it should be the second to last argument, before sample.pdf
. –
Telemotor Meanwhile the latest Ghostscript releases have a much nicer and easier to use method of removing all images from a PDF. The parameter to add to the command line is -dFILTERIMAGE
gs -o noimages.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf
Even better, you can also remove all text or all vector drawing elements from a PDF by specifying -dFILTERTEXT
or -dFILTERVECTOR
.
Of course, you can also combine any combination of these -dFILTER*
parameters you want in order to achieve a required result. (Combining all three will of course result in "empty" pages.)
Here is the screenshot from an example PDF page which contains all 3 types of content mentioned above:
Screenshot of original PDF page containing "image", "vector" and "text" elements.
Running the following 6 commands will create all 6 possible variations of remaining contents:
gs -o noIMG.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf gs -o noTXT.pdf -sDEVICE=pdfwrite -dFILTERTEXT input.pdf gs -o noVCT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR input.pdf gs -o onlyTXT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf gs -o onlyIMG.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT input.pdf gs -o onlyVCT.pdf -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERTEXT input.pdf
The following image illustrates the results:
Top row, from left: all "text" removed; all "images" removed; all "vectors" removed. Bottom row, from left: only "text" kept; only "images" kept; only "vectors" kept.
I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.
I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.
Copy the following code and save it as remove-images.ps
:
%!PS
% Run as:
%
% gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
% ..... remove-images.ps <your-input-file>
%
% derived from Chris Liddell's original 'filter-obs.ps' script
% Adapted by @pdfkungfoo (on Twitter)
currentglobal true setglobal
32 dict begin
/debugprint { systemdict /DUMPDEBUG .knownget { {print flush} if}
{pop} ifelse } bind def
/pushnulldevice {
systemdict exch .knownget not
{
//false
} if
{
gsave
matrix currentmatrix
nulldevice
setmatrix
} if
} bind def
/popnulldevice {
systemdict exch .knownget not
{
//false
} if
{
% this is hacky - some operators clear the current point
% i.e.
{ currentpoint } stopped
{ grestore }
{ grestore moveto} ifelse
} if
} bind def
/sgd {systemdict exch get def} bind def
systemdict begin
/_image /image sgd
/_imagemask /imagemask sgd
/_colorimage /colorimage sgd
/image {
(\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
_image
/FILTERIMAGE //popnulldevice exec
} bind def
/imagemask
{
(\nIMAGEMASK\n) //debugprint exec
/FILTERIMAGE //pushnulldevice exec
_imagemask
/FILTERIMAGE //popnulldevice exec
} bind def
/colorimage
{
(\nCOLORIMAGE\n) //debugprint exec
/FILTERIMAGE //pushnulldevice exec
_colorimage
/FILTERIMAGE //popnulldevice exec
} bind def
end
end
.bindnow
setglobal
Now run this command:
gs -o no-more-images-in-sample.pdf \
-sDEVICE=pdfwrite \
-dFILTERIMAGE \
-dDELAYBIND \
-dWRITESYSTEMDICT \
remove-images.ps \
sample.pdf
I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:
If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.
/lib/
subdir. What you let me hope for, is even better :) –
Solipsism remove-images.ps
is missing - it should be the second to last argument, before sample.pdf
. –
Telemotor © 2022 - 2024 — McMap. All rights reserved.