UnrecognizedImageError - image insertion error - python-docx
Asked Answered
R

2

7

I am trying to insert an wmf file to docx using python-docx which is producing the following traceback.

Traceback (most recent call last):
  File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 79, in <module>
    read_ppt(path, file)
  File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 73, in read_ppt
    write_docx(ppt_data, False)
  File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 31, in write_docx
    document.add_picture(slide_data.get('picture_location'), width=Inches(5.0))
  File "C:\Python34\lib\site-packages\docx\document.py", line 72, in add_picture
    return run.add_picture(image_path_or_stream, width, height)
  File "C:\Python34\lib\site-packages\docx\text\run.py", line 62, in add_picture
    inline = self.part.new_pic_inline(image_path_or_stream, width, height)
  File "C:\Python34\lib\site-packages\docx\parts\story.py", line 56, in new_pic_inline
    rId, image = self.get_or_add_image(image_descriptor)
  File "C:\Python34\lib\site-packages\docx\parts\story.py", line 29, in get_or_add_image
    image_part = self._package.get_or_add_image_part(image_descriptor)
  File "C:\Python34\lib\site-packages\docx\package.py", line 31, in get_or_add_image_part
    return self.image_parts.get_or_add_image_part(image_descriptor)
  File "C:\Python34\lib\site-packages\docx\package.py", line 74, in get_or_add_image_part
    image = Image.from_file(image_descriptor)
  File "C:\Python34\lib\site-packages\docx\image\image.py", line 55, in from_file
    return cls._from_stream(stream, blob, filename)
  File "C:\Python34\lib\site-packages\docx\image\image.py", line 176, in _from_stream
    image_header = _ImageHeaderFactory(stream)
  File "C:\Python34\lib\site-packages\docx\image\image.py", line 199, in _ImageHeaderFactory
    raise UnrecognizedImageError
docx.image.exceptions.UnrecognizedImageError

The image file is in .wmf format.

Any help or suggestion appreciated.

Rocky answered 1/6, 2019 at 8:15 Comment(0)
H
10

python-docx identifies the type of an image-file by "recognizing" its distinctive header. In this way it can distinguish JPEG from PNG, from TIFF, etc. This is much more reliable than mapping a filename extension and much more convenient than requiring the user to tell you the type. It's a pretty common approach.

This error indicates python-docx is not finding a header it recognizes. Windows Metafile format (WMF) can be tricky this way, there's a lot of leeway in the proprietary spec and variation in file specimens in the field.

To fix this, I recommend you read the file with something that does recognize it (I would start with Pillow) and have it "convert" it into the same or another format, hopefully correcting the header in the process.

First I would try just reading it and saving it as WMF (or perhaps EMF if that's an option). This might be enough to do the trick. If you have to change to an intermediate format and then back, that could be lossy, but maybe better than nothing.

ImageMagick might be another good choice to try because it probably has better coverage than Pillow does.

Hypnosis answered 1/6, 2019 at 16:38 Comment(1)
Thanks for the clarification @Scanny. I have used Pillow to solve the issue.Rocky
A
3

Explanation

python-docx/image.py will read differernt picture file format from SIGNATURES



Format

1.jpg

1.jpg

Use Image converter to convert 1.jpg to different file formats.

Use magic to get mime type.

File format Mime type add_picture()
.jpg image/jpeg
.png image/png
.jfif image/jpeg
.exif
.gif image/gif
.tiff image/tiff
.bmp image/x-ms-bmp
.eps application/postscript ×
.hdr application/octet-stream ×
.ico image/x-icon ×
.svg image/svg+xml ×
.tga image/x-tga ×
.wbmp application/octet-stream ×
.webp image/webp ×



How to solve

Plan A

Convert other format to supported formats like .jpg

Install

pip install pillow

Code

from pathlib import Path

from PIL import Image


def image_to_jpg(image_path):
    path = Path(image_path)
    if path.suffix not in {'.jpg', '.png', '.jfif', '.exif', '.gif', '.tiff', '.bmp'}:
        jpg_image_path = f'{path.parent / path.stem}_result.jpg'
        Image.open(image_path).convert('RGB').save(jpg_image_path)
        return jpg_image_path
    return image_path


if __name__ == '__main__':
    from docx import Document

    document = Document()
    document.add_picture(image_to_jpg('1.jpg'))
    document.add_picture(image_to_jpg('1.webp'))
    document.save('test.docx')



Plan B

First, try to add picture into Word manually. If success, it means Word supports this format. Then modify this library by inheriting the BaseImageHeader class and implementing the from_stream() method with SIGNATURES adding the image format.



Lack of file suffix

modify 1.jpg to 1

from docx import Document

document = Document()
document.add_picture('1')
document.save('test.docx')

It will show this

Using this

from docx import Document

document = Document()
document.add_picture(open('1', mode='rb'))
document.save('test.docx')



Conclusion

import io
from pathlib import Path

import magic
from PIL import Image


def image_to_jpg(image_path_or_stream):
    f = io.BytesIO()
    if isinstance(image_path_or_stream, str):
        path = Path(image_path_or_stream)
        if path.suffix in {'.jpg', '.png', '.jfif', '.exif', '.gif', '.tiff', '.bmp'}:
            f = open(image_path_or_stream, mode='rb')
        else:
            Image.open(image_path_or_stream).convert('RGB').save(f, format='JPEG')
    else:
        buffer = image_path_or_stream.read()
        mime_type = magic.from_buffer(buffer, mime=True)
        if mime_type in {'image/jpeg', 'image/png', 'image/gif', 'image/tiff', 'image/x-ms-bmp'}:
            f = image_path_or_stream
        else:
            Image.open(io.BytesIO(buffer)).convert('RGB').save(f, format='JPEG')
    return f


if __name__ == '__main__':
    from docx import Document

    document = Document()
    document.add_picture(image_to_jpg('1.jpg'))
    document.add_picture(image_to_jpg('1.webp'))
    document.add_picture(image_to_jpg(open('1.jpg', mode='rb')))
    document.add_picture(image_to_jpg(open('1', mode='rb')))  # copy 1.webp and rename it to 1
    document.save('test.docx')
Ammonium answered 23/12, 2021 at 9:6 Comment(1)
thank you! I've got problem with docxtpl, when attempting to render img in docx-file. adding .convert('RGB') before .save(...) solved the issueManx

© 2022 - 2024 — McMap. All rights reserved.