Is it possible to use the .NET DeflateStream for pdf creation?
Asked Answered
U

1

6

I'm playing around with the ability to create pdf files through C# code. I have been looking at the PDF specifications and have been able to create a working PDF file, done by taking strings of data and encoding them into byte arrays using the UTF8 Encoding.

The problem I run into is when I try to use the DeflateStream on the pdf stream objects. It just doesn't seem to work:

Here is the text version of the pdf object that is in question (\r\n is at the end of each line, just not visible here):

5 0 obj
<</Length 45>>
stream
BT 70 50 TD /F1 12 Tf (Hello, world!) Tj ET
endstream
endobj

When I attempt to use the DeflateStream class to compress the line BT 70 50 TD /F1 12 Tf (Hello, world!) Tj ET, the pdf seems to not work. I noticed that a lot of other libraries such as iTextSharp use their own implementation of the Deflate compression.

Is there any reason why Microsoft's implementation of the DeflateStream class isn't working? Am I using it incorrectly or is it implemented incorrectly or what?


I know that PDF files are binary (not text), but if I'm not encrypting anything then it is possible to view it all as text. Here is the entire PDF file for reference (in plaintext, also \r\n is at the end of each line, just not visible here):

%PDF-1.7
1 0 obj
<</Type /Catalog /Pages 2 0 R>>
endobj
2 0 obj
<</Type /Pages /MediaBox [ 0 0 200 200 ] /Count 1 /Kids [ 3 0 R ]>>
endobj
3 0 obj
<</Type /Page /Parent 2 0 R /Resources <</Font <</F1 4 0 R>>>> /Contents 5 0 R>>
endobj
4 0 obj
<</Type /Font /Subtype /Type1 /BaseFont /Times-Roman>>
endobj
5 0 obj
<</Length 45>>
stream
BT 70 50 TD /F1 12 Tf (Hello, world!) Tj ET
endstream
endobj
xref
0 6
0000000000 65535 f
0000000017 00000 n
0000000067 00000 n
0000000153 00000 n
0000000252 00000 n
0000000325 00000 n
trailer
<</Size 6/Root 1 0 R>>
startxref
422
%%EOF
Uniaxial answered 26/8, 2013 at 18:10 Comment(0)
A
11

Is there any reason why Microsoft's implementation of the DeflateStream class isn't working? Am I using it incorrectly or is it implemented incorrectly or what?

DeflateStream is actually implementing RFC 1951 (DEFLATE), where PDF is compressed using a compression method compatible with RFC 1950. This is detailed, with a workaround, in this related Microsoft Connect bug report.

A simple workaround would be to use a third party compression library, such as DotNetZip, which will support the proper format. That being said, the Connect report suggests that skipping the first two bytes may cause this to work in most cases.

Applicator answered 26/8, 2013 at 18:17 Comment(7)
Hmm, that connect article is interesting. It is moreover talking about the decompression process (not the compression process), but I think that it also negatively impacts compression. I'm going to try the DotNetZip for compression and see if that helps.Uniaxial
@m-y yes - same issue would happen on both sides. In general, though, I find DotNetZip far better than the built in compression libs to use.Applicator
The DotNetZip ZlibStream made it work just fine. I wonder if you could still use the DeflateStream (.NET-4.5) and prepend the two missing bytes to the beginning when compressing and skip the two bytes when decompressing?Uniaxial
@m-y For compressing, I think you need to know the proper byte data, etc - but I'd just use ZLibStream in general. With the fraemwork, they may be constants, but I'm not sure it would be...Applicator
I compared both compression results and it seems that the ZlibStream is adding an additional two bytes at the beginning and an additional 4 bytes at the end, at least in this instance it was. Everything else between matches just fine. Quite interesting.Uniaxial
The additional two bytes at the beginning might be the 'zlib header'. See hereMouse
the Microsoft Connect link is no longer available. The new one is social.msdn.microsoft.com/Forums/en-US/…Mooneye

© 2022 - 2024 — McMap. All rights reserved.