Difference in deflate function between Java and C#
Asked Answered
C

1

8

I have 2 deflate functions written in C# and Scala, when running with the same input, the returned byte array has a difference in leading bytes and trailing bytes (the difference between the bytes in the middle is expected by the unsigned/signed bytes mechanism between C# and Scala).

Deflate function in Scala:

import java.io.ByteArrayOutputStream
import java.util.zip.{Deflater, DeflaterOutputStream}

import zio._


object ZDeflater {
  val deflater = ZManaged.makeEffectTotal(new Deflater(Deflater.DEFLATED, true))(_.end)

  val buffer = ZManaged.fromAutoCloseable(ZIO.succeed(new ByteArrayOutputStream()))

  val stream = for {
    d <- deflater
    b <- buffer
    s <- ZManaged.fromAutoCloseable(ZIO.succeed(new DeflaterOutputStream(b, d, true)))
  } yield (b, s)

  def deflate(input: Array[Byte]): RIO[blocking.Blocking, Array[Byte]] = stream.use { case (buffer, stream) =>
    for {
      ()    <- blocking.effectBlocking(stream.write(input))
      ()    <- blocking.effectBlocking(stream.flush())
      result = buffer.toByteArray
    } yield result
  }
}

Deflate function in C#:

private static byte[] Deflate(byte[] uncompressedBytes)
{
    using (var output = new MemoryStream())
    {
        using (var zip = new DeflateStream(output, CompressionMode.Compress, true))
        {
            zip.Write(uncompressedBytes, 0, uncompressedBytes.Length);
        }

        return output.ToArray();
    }
}

Outputs after deflating: Scala:

ZDeflater.deflate(data.getBytes(StandardCharsets.UTF_8))

124, -111, …, 126, 1, 0, 0, -1, -1

C#:

Deflate(Encoding.UTF8.GetBytes(data))
125, 145, …, 126, 1

Does anyone know what causes the difference between the first and last bytes? Any of your assumptions are very helpful to me. Thank a bunch

P/s: We're having a problem with a situation where C#'s Deflate output works for a specific 3rd part and Scala's output doesn't. So I'm trying to figure out how to make Scala's output to be the same as C#'s

Constitutionally answered 13/5, 2022 at 5:20 Comment(1)
a quick reaction without research - check out Byte Order Marks.Fusee
S
8

As documented here, Java's Deflater class deflates sequences of bytes into ZLIB compressed data format. The ZLIB data format wraps compressed data in DEFLATE data format with a header and an ADLER-32 checksum after the compressed data.

Microsoft's documentation for DeflateStream is inaccurate about the exact data format. But it actually poduces data in raw DEFLATE data format and not in ZLIB format (dotnet-2236 ). With it, its output is also incompatible with HTTP's "deflate" transfer encoding, which actually references the ZLIB data format and not the DEFLATE data format (RFC-2616).

But how can you achive now the same output with Scala and C#?

A) Write data also in raw DEFLATE format with Scala

The Deflater class has an overloaded construtor with a nowrap parameter, that allows to omit the header and the checksum. Setting this parameter to true will lead to compressed data in raw DEFLATE data format. If you should also plan to de-serialize the data in Java, please read the Javadoc of the Inflater constructors carefully.

B) Write data also in ZLIB format with C# (recommended)

Use .NET's ZLibStream class or any third-party library instead of the Deflater class, to serialize your data in ZLIB format.

C) Use GZIP format instead

GZIP format is comparable to ZLIB, but uses a different header and a different checksum. Both, .NET and Java provide explicit stream classes for it. Although ZLIB's checksum calculation performs better and produces even a smaller header than GZIP, the latter is more common (especially in the web). Main reason for GZIP's popularity is, that Microsoft always had trouble distinguishing between raw DEFLATE and ZLIB resp.HTTP's deflate transfer encoding (see ZLIB's FAQ-39;-)

Senzer answered 22/5, 2022 at 22:21 Comment(4)
As you can see in the Scala snippet, I've already set the nowrap parameter to true, so option 1 doesn't seem right. Since we're having a problem with a situation where C#'s Deflate output works for a specific 3rd part and Scala's output doesn't, I'll try approach 3 first. Thanks for the very detailed explanationConstitutionally
The deflate is a necessary step since I use it during the encoding of the SAML Request so I don't think using gzip helps (actually I tried it and it broke the SAML SSO flow)Constitutionally
@Constitutionally i've overseen, that you already set nowrap to true. In the meanwhile I've also checked Java's native code and Microsoft's usage of zlib and it seems that the DeflateStream implementation uses a different default compression level. Please try to set compression level of Deflater to 6 instead of 8 and then compare the output of Scala and C# again.Senzer
None of the available compression levels (0-9) returns the expected result @SenzerConstitutionally

© 2022 - 2024 — McMap. All rights reserved.