How to reduce size of RTF with embedded images?
Asked Answered
S

6

16

We have some code which produces an RTF document from a RTF template. It is basically doing string search and replaces of special tags within the RTF file. This is accessible via a web page.

Typically, the processing time for this is really quick.

However, we need to embed an image within a template. We've been embedding these as JPEG images using Word's "Insert/Picture/From File..." functionality. But we've found that the resultant RTF file size is massively dependant upon the image.

For example, I've inserted a 20k JPEG logo (which is basically a solid background with some text). The RTF file increased in size from around 390k (without the image) to 510k (with the image).

Then we inserted a JPEG containing a screenshot, i.e. the image contains text, multiple colours, etc. The JPEG is around 150k. Using this image, the RTF file increased in size from 390k to 3.5MB.

So the encoding that Word uses for storing images into an RTF doesn't perform linearly. I'm guessing it is dependant upon what is in the JPEG image.

I need to keep the size of the RTF templates to a minimum to try and keep our file processing times to a minimum.

  • Does anyone have any ideas on how to minimize the size of the RTF files with embedded images?
  • Is there any way of controlling the encoding that Word uses? I can't see any options anywhere.
  • Does anyone know what type of binary encoding Word/RTF uses?

Thanks in advance.

Shockey answered 10/9, 2009 at 12:39 Comment(1)
Not that I have an answer, but it is almost surely because it is being embedded as an uncompressed bitmap, rather than a compressed representation like JPEG.Hrvatska
P
5

An image in an RTF file gets stored as a WMF, uncompressed. On mac, it it would be macpict. Your best bet to keep the file size down is to link the image to the document rather than insert a copy in the document. The trade-off is that you have to keep the files together.

EDIT Is compressing the RTF an option? Using zip/rar, you'll get your file size back, but you'll have to uncompress, first obviously. There are supposed to be tools that will do rtf compression, but I have never used them.

Physiography answered 10/9, 2009 at 14:8 Comment(3)
Thanks. Zipping wouldn't help - I'd still need to unzip to process the file. It's not the file storage size that's my problem - it's the time taken to process the RTF. I don't understand about the linking - I'm probably lacking in Word skills... is it possible to get Word to hyperlink to a URL and display the contents of that URL in the document? I can easily make my images available via a URL. As long as the image appears in the document to the reader and the reader doesn't have to do anything to get the image, then I'd be happy (i.e. I don't want my users to have to click a link)Shockey
Adding a hyperlink is easy either from within word itself or VBA, but sorry I don't know how to have the image be visible inside the rtf doc, but not have a copy of the WMF inside. Screen shots tend to be far larger than they need to be if you have non-white backgrounds, for example. You might consider editing your images and saving them as bmp's. The bmp format will give you an idea of how big the wmf will be. How much color information do you lose saving as a 16 bit image?Physiography
sorry - I meant 16 color image. Just saved a dump of my monitor - originally a 24bit 3.5M image. Saved as 16color & it's 641K. The image does take some damage, but it's still 'serviceable'Physiography
E
18

Here is the best solution

http://support.microsoft.com/kb/224663

Excerpt:

SYMPTOMS

When you save a Microsoft Word document that contains an EMF, PNG, GIF, or JPEG graphic as a different file format (for example, Word 6.0/95 (.doc) or Rich Text Format (.rtf)), the file size of the document may dramatically increase.

For example, a Microsoft Word 2000 document that contains a JPEG graphic that is saved as a Word 2000 document may have a file size of 45,568 bytes (44.5KB). However, when you save this file as Word 6.0/95 (.doc) or as Rich Text Format (.rtf), the file size may grow to 1,289,728 bytes (1.22MB).

CAUSE

This functionality is by design in Microsoft Word. If an EMF, a PNG, a GIF, or a JPEG graphic is inserted into a Word document, when the document is saved, two copies of the graphic are saved in the document. Graphics are saved in the applicable EMF, PNG, GIF, or JPEG format and are also converted to WMF (Windows Metafile) format.

RESOLUTION

Warning If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.

To prevent Word from saving two copies of the graphic in the document, and to reduce the file size of the document, add the ExportPictureWithMetafile=0 string value to the Microsoft Windows registry.

Emotionality answered 18/1, 2010 at 18:12 Comment(4)
Linked page is about how Word saves two copies of the image (original file and uncompressed version) and gives a registry change that tells it to only save the original file. InterestingSomniloquy
I think this is a better answer than the one marked as answer.Rustice
I don't suppose anyone knows how to accomplish the equivalent for WordPad? I tried adding the ExportPictureWithMetafile=0 string value to HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Applets\Wordpad\Options but it had no apparent effect.Colo
WordPad seems to compress the images for me in Windows 10 now. For anyone reading this, try to open the file in WordPad and save before editing the registry. It may compress the file for you.Bilander
P
5

An image in an RTF file gets stored as a WMF, uncompressed. On mac, it it would be macpict. Your best bet to keep the file size down is to link the image to the document rather than insert a copy in the document. The trade-off is that you have to keep the files together.

EDIT Is compressing the RTF an option? Using zip/rar, you'll get your file size back, but you'll have to uncompress, first obviously. There are supposed to be tools that will do rtf compression, but I have never used them.

Physiography answered 10/9, 2009 at 14:8 Comment(3)
Thanks. Zipping wouldn't help - I'd still need to unzip to process the file. It's not the file storage size that's my problem - it's the time taken to process the RTF. I don't understand about the linking - I'm probably lacking in Word skills... is it possible to get Word to hyperlink to a URL and display the contents of that URL in the document? I can easily make my images available via a URL. As long as the image appears in the document to the reader and the reader doesn't have to do anything to get the image, then I'd be happy (i.e. I don't want my users to have to click a link)Shockey
Adding a hyperlink is easy either from within word itself or VBA, but sorry I don't know how to have the image be visible inside the rtf doc, but not have a copy of the WMF inside. Screen shots tend to be far larger than they need to be if you have non-white backgrounds, for example. You might consider editing your images and saving them as bmp's. The bmp format will give you an idea of how big the wmf will be. How much color information do you lose saving as a 16 bit image?Physiography
sorry - I meant 16 color image. Just saved a dump of my monitor - originally a 24bit 3.5M image. Saved as 16color & it's 641K. The image does take some damage, but it's still 'serviceable'Physiography
T
1

We have done a similar project over at work. Only we're not using that "Insert/Picture/From File..." functionality. Our template has a tag named [photos], as I presume your own does also. When we process the document we replace the tag with the RTF codes needed to display images. We're putting them within a table and we're displaying two images on each row, plus a row on top for the title.

So, you might place a tag [photos] in your template. Then you replace the tag with the RTF Codes. You can find some good references to these codes on the web. For eg. here .

Now, my code looks something like this:

\par {\rtf1\ansi\deff0{\trowd\cellx8810 {title}\intbl\qc\cell\row}{\trowd\cellx4405\cellx8810{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex Your image as an array of bytes in hexadecimal }\intbl\cell{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex Your other image }\intbl\cell\row}

if you get your image into a byte array, you may use BitConverter.ToString(array) to get your hex code. only you'll need to replace dashes "-" by "";

Our files will take up less than 1/10th of the space a "normal" RTF will. If we open the doc's code with an editor such as Notepad++, we can see the RTF codes, but if we open the document and save it as RTF (changing its name), it'll go from 1.5Mb to 50Mb!! I'm guessing DaveParillo's reply justifies it: I'm only writing each image once.

Hope it helps. Cheers mate

Taxable answered 13/3, 2012 at 16:40 Comment(0)
R
1

Initially, keep in mind that each byte is stored using 2 characters (two bytes), this means that the increments at least is the double size of original picture.

Other things that you need is that Word and Word Pad insert different (flavor or format) of the same image plus other fields (that RTF can to be displayed without them).

Here are some scripts used to insert images in RTF (https://joseluisbz.wordpress.com/2011/06/22/script-de-clases-rtf-para-jsp-y-php/), and one example of use (https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con-php-y-jsp-y-mostrarlos-en-rtf-usando-clases/)

Now, maybe you will need replace the original Image with another (http://joseluisbz.wordpress.com/2013/07/26/exploring-a-wmf-file-0x000900/).

Reproach answered 16/8, 2013 at 3:19 Comment(0)
I
0

The Swartbees answer worked perfectly for me. I first reduced the image quality to "0" using G.I.M.P. Save as jpeg functionality. After following the microsoft solution suggested by Swartbees above I reinserted the picture into the file and the size increase was negligible 229k to 279k (as opposed to 29000kb).

Thanks for your suggestions guys.

Infringe answered 16/1, 2013 at 5:12 Comment(0)
M
-1

Yes, by removing the redundant characters. And to do this you must insert them back into your stream. For instance if you have over twenty f characters in one line, then you can replace with f[20] in your stream. It is a start.

-Best of luck.

Mismate answered 26/12, 2010 at 4:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.