How can I remove all images from a PDF?
Asked Answered
S

2

13

I want to remove all images from a PDF file.

The page layouts should not change. All images should be replaced by empty space.

  • How can this be achieved with the help of Ghostscript and the appropriate PostScript code?
Solipsism answered 15/4, 2015 at 17:52 Comment(1)
So who-the-hell thought he better downvoted this question? For what reason?!? Feel free to downvote, but please give a comment and tell me why?Solipsism
S
10

I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.

I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.

Copy the following code and save it as remove-images.ps:

%!PS

% Run as:
%
%      gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
%                 ..... remove-images.ps <your-input-file>
%
% derived from Chris Liddell's original 'filter-obs.ps' script
% Adapted by @pdfkungfoo (on Twitter)

currentglobal true setglobal

32 dict begin

/debugprint     { systemdict /DUMPDEBUG .knownget { {print flush} if} 
                {pop} ifelse } bind def

/pushnulldevice {
  systemdict exch .knownget not
  {
    //false
  } if

  {
    gsave
    matrix currentmatrix
    nulldevice
    setmatrix
  } if
} bind def

/popnulldevice {
  systemdict exch .knownget not
  {
    //false
  } if
  {
    % this is hacky - some operators clear the current point
    % i.e.
    { currentpoint } stopped
    { grestore }
    { grestore moveto} ifelse
  } if
} bind def

/sgd {systemdict exch get def} bind def

systemdict begin

/_image /image sgd
/_imagemask /imagemask sgd
/_colorimage /colorimage sgd

/image {
   (\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
  _image
  /FILTERIMAGE //popnulldevice exec
} bind def

/imagemask
{
  (\nIMAGEMASK\n) //debugprint exec
  /FILTERIMAGE //pushnulldevice exec
  _imagemask
  /FILTERIMAGE //popnulldevice exec
} bind def

/colorimage
{
  (\nCOLORIMAGE\n) //debugprint exec
  /FILTERIMAGE //pushnulldevice exec
  _colorimage
  /FILTERIMAGE //popnulldevice exec
} bind def

end
end

.bindnow

setglobal

Now run this command:

gs -o no-more-images-in-sample.pdf \
   -sDEVICE=pdfwrite               \
   -dFILTERIMAGE                   \
   -dDELAYBIND                     \
   -dWRITESYSTEMDICT               \
    remove-images.ps               \
    sample.pdf

I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:

If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.

Solipsism answered 15/4, 2015 at 18:0 Comment(7)
FWIW I'm hoping to have a system level version of Chris's code built into GS in a future release. So this will be possible on all devices without additional work. Don't hold your breath though....Pauly
@KenS: After I discovered link to Chris' code in the IRC logs a 2 hours ago I was hoping that he'd include it alongside other *.ps files in the GS /lib/ subdir. What you let me hope for, is even better :)Solipsism
We won't include the PostScript as such, no. I'm working on some internal stuff which will work with all the interpreters. On the down side, I've been working on it for nearly a year now.Pauly
In the command given, the reference to remove-images.ps is missing - it should be the second to last argument, before sample.pdf.Telemotor
@perpeduumimmobile: Ha!, you are right! Thanks for spotting + reporting it.Solipsism
@KenS: is the "system level version of Chris's code" now in Git sources, as part of the "subclassing" stuff?Solipsism
Well spotted, it is indeed, committed this very afternoon. Its the 'object filtering', but be aware that its not precisely the same as Chris' since it works in the graphics library, not the language. Though this does have the advantage that it works with all the possible input languages.Pauly
S
24

Meanwhile the latest Ghostscript releases have a much nicer and easier to use method of removing all images from a PDF. The parameter to add to the command line is -dFILTERIMAGE

 gs -o noimages.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf

Even better, you can also remove all text or all vector drawing elements from a PDF by specifying -dFILTERTEXT or -dFILTERVECTOR.

Of course, you can also combine any combination of these -dFILTER* parameters you want in order to achieve a required result. (Combining all three will of course result in "empty" pages.)

Here is the screenshot from an example PDF page which contains all 3 types of content mentioned above:


Screenshot of original PDF page containing "image", "vector" and "text" elements.
Screenshot of original PDF page containing "image", "vector" and "text" elements.


Running the following 6 commands will create all 6 possible variations of remaining contents:

 gs -o noIMG.pdf   -sDEVICE=pdfwrite -dFILTERIMAGE                input.pdf
 gs -o noTXT.pdf   -sDEVICE=pdfwrite -dFILTERTEXT                 input.pdf
 gs -o noVCT.pdf   -sDEVICE=pdfwrite -dFILTERVECTOR               input.pdf

 gs -o onlyTXT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf 
 gs -o onlyIMG.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT  input.pdf
 gs -o onlyVCT.pdf -sDEVICE=pdfwrite -dFILTERIMAGE  -dFILTERTEXT  input.pdf

The following image illustrates the results:


Top row, from left: all "text" removed; all "images" removed; all "vectors" removed. Bottom row, from left: only "text" kept; only "images" kept; only "vectors" kept.
Top row, from left: all "text" removed; all "images" removed; all "vectors" removed. Bottom row, from left: only "text" kept; only "images" kept; only "vectors" kept.


Solipsism answered 16/6, 2016 at 12:13 Comment(4)
Can we remove specific vectors? If yes how to identify different vectors in the pdf itself. I tested this and it works but it also removes some vectors which I don't want.Expanded
@JayChakra: No, you cannot remove specific vectors. (You could limit the removal of all vectors to a certain page or range of pages, though, and then re-insert these pages into the original PDF document.)Solipsism
Your images don't seem ordered the way you entered the commands above. "Filtering" X here means not including X in the output, right?Lichenology
@Geremia: You were right about the order of the commands. I've changed it now, thank you. (At least the image capture had already held the correct descriptions.) About the names of the parameters: I agree that the "FILTERxxx" is not the best choice -- maybe naming them "REMOVExxx" would have been more user friendly.Solipsism
S
10

I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.

I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.

Copy the following code and save it as remove-images.ps:

%!PS

% Run as:
%
%      gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
%                 ..... remove-images.ps <your-input-file>
%
% derived from Chris Liddell's original 'filter-obs.ps' script
% Adapted by @pdfkungfoo (on Twitter)

currentglobal true setglobal

32 dict begin

/debugprint     { systemdict /DUMPDEBUG .knownget { {print flush} if} 
                {pop} ifelse } bind def

/pushnulldevice {
  systemdict exch .knownget not
  {
    //false
  } if

  {
    gsave
    matrix currentmatrix
    nulldevice
    setmatrix
  } if
} bind def

/popnulldevice {
  systemdict exch .knownget not
  {
    //false
  } if
  {
    % this is hacky - some operators clear the current point
    % i.e.
    { currentpoint } stopped
    { grestore }
    { grestore moveto} ifelse
  } if
} bind def

/sgd {systemdict exch get def} bind def

systemdict begin

/_image /image sgd
/_imagemask /imagemask sgd
/_colorimage /colorimage sgd

/image {
   (\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
  _image
  /FILTERIMAGE //popnulldevice exec
} bind def

/imagemask
{
  (\nIMAGEMASK\n) //debugprint exec
  /FILTERIMAGE //pushnulldevice exec
  _imagemask
  /FILTERIMAGE //popnulldevice exec
} bind def

/colorimage
{
  (\nCOLORIMAGE\n) //debugprint exec
  /FILTERIMAGE //pushnulldevice exec
  _colorimage
  /FILTERIMAGE //popnulldevice exec
} bind def

end
end

.bindnow

setglobal

Now run this command:

gs -o no-more-images-in-sample.pdf \
   -sDEVICE=pdfwrite               \
   -dFILTERIMAGE                   \
   -dDELAYBIND                     \
   -dWRITESYSTEMDICT               \
    remove-images.ps               \
    sample.pdf

I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:

If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.

Solipsism answered 15/4, 2015 at 18:0 Comment(7)
FWIW I'm hoping to have a system level version of Chris's code built into GS in a future release. So this will be possible on all devices without additional work. Don't hold your breath though....Pauly
@KenS: After I discovered link to Chris' code in the IRC logs a 2 hours ago I was hoping that he'd include it alongside other *.ps files in the GS /lib/ subdir. What you let me hope for, is even better :)Solipsism
We won't include the PostScript as such, no. I'm working on some internal stuff which will work with all the interpreters. On the down side, I've been working on it for nearly a year now.Pauly
In the command given, the reference to remove-images.ps is missing - it should be the second to last argument, before sample.pdf.Telemotor
@perpeduumimmobile: Ha!, you are right! Thanks for spotting + reporting it.Solipsism
@KenS: is the "system level version of Chris's code" now in Git sources, as part of the "subclassing" stuff?Solipsism
Well spotted, it is indeed, committed this very afternoon. Its the 'object filtering', but be aware that its not precisely the same as Chris' since it works in the graphics library, not the language. Though this does have the advantage that it works with all the possible input languages.Pauly

© 2022 - 2024 — McMap. All rights reserved.