How to get picture from *.docx with all changes using POI?
Asked Answered
H

0

6

Let's review the relationship between "docx" and "pictures":

As I understand it, *.docx stores original pictures (pictures at the moment when you copy/paste them into Word). And every time when you use that picture, Word makes a "link" to original picture.

But if you make some changes to that picture (for example resize, crop or change color) Word remembers your changes, modifying the "link" (add some special tags). That's great, because you will never lose quality of your picture!

Let's get a picture from our *.docx file. To do that I use this code snippet:

XWPFDocument wordDoc = new XWPFDocument( pathToFile );
for (XWPFParagraph p : wordDoc.getParagraphs()) {
    for (XWPFRun run : p.getRuns()) {
        for (XWPFPicture pic : run.getEmbeddedPictures()) {
            byte [] img = pic.getPictureData().getData()

            File  outputfile = new File ( pathToOutputFile );                
            BufferedImage image = ImageIO.read(new ByteArrayInputStream(img));
            ImageIO.write(image , "png", outputfile);
        }
    }
}

But this way I get the original pictures from *.docx. If, for example, you cropped out a section from your picture and gave me the rest, then I always find the whole image in outputfile. That's not good.

Does anyone know how to get the picture with all changes that someone made to it in Word?

Hundredweight answered 24/3, 2014 at 19:26 Comment(2)
probably not 100% what you are looking for, in word you can "compress pictures" which would discard originals and keep only processed pictures.Babblement
Your are absolutely right and that should help, but that is not a solution, because we cann't guarantee that user do this trick for us:(Hundredweight

© 2022 - 2024 — McMap. All rights reserved.