How can I unzip a base64 encoded string in R?
Asked Answered
O

1

6

Goal

Goal is to make configuration and code readable after it has been exported from an application that stores this data in base64 encoded and gzip-ped format.

Test in Linux-shell

Example of a string with code

"H4sIAAAAAAAAAIWSS0vEMBSF9/0VIYvubHUnNGlhfIDCwOCMuCyhTeOVTBLzGPTfmzY60yKju+Tc8N1z7o2RQYBqmTESuGthaDuHXJpWTRknzsZfowK0DrSi+Ki4x4qrTPShB8fPu/uIaN3VGVsGB4s49BcnrDKGjsJlwaF5P0sMtxY/swLadBeN/6jda9eBjrxfwrytQvcMjLgI3zLI999FJEuYSGmHpNdp9Gk7xWyQXkilRbL2NXnGdS18twuTvQfsqJkqHU6x0n7KlY5MLX2UjYOyxZqacBFIeDZyxdGettusYiwn+h7X/QadBnadY7oNVaGDS8eoXciZMAyTlckNxh+Vyid//4Qv+y3JeLwIAAA=="

Decoded and gunzip-ped in a Linux shell with the command:

echo $1 | base64 -d | gunzip -c

Which results in:

plugin_applies_if_config<split>plugin_config=<?xml version="1.0" encoding="UTF-8"?>
<BusinessRule>
  <BusinessPlugin BusinessRulePluginID="JavaScriptBusinessConditionWithBinds">
    <Parameters>
      <Parameter ID="Binds" Type="java.lang.String">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;BindMap/&gt;
</Parameter>
      <Parameter ID="ErrorMessages" Type="java.lang.String"></Parameter>
      <Parameter ID="JavaScript" Type="java.lang.String">return false;</Parameter>
    </Parameters>
  </BusinessPlugin>
</BusinessRule>
<split>

Task accomplished. ...almost.

Turn into R-script

As i have several hundred of these strings, i want to perform similar commands as in the Linux shell in a script. And because i only know some R, i tried using R. I succesfully extracted the strings from the XML-document that was exported from the application and turned these in a data frame with columns id, name and code.

The following is a simplified example where i try to reproduce the Linux commands step by step.

encoded = "H4sIAAAAAAAAAIWSS0vEMBSF9/0VIYvubHUnNGlhfIDCwOCMuCyhTeOVTBLzGPTfmzY60yKju+Tc8N1z7o2RQYBqmTESuGthaDutBhDERcHXJpWTRknzsZfowK0DrSi+Ki4x4qrTPShB8fPu/uIaN3VGVsGB4s49BcnrDKGjsJlwaF5P0sMtxY/swLadBeN/6jda9eBjrxfwrytQvcMjLgI3zLI999FJEuYSGmHpNdp9Gk7xWyQXkilRbL2NXnGdS18twuTvQfsqJkqHU6x0n7KlY5MLX2UjYOyxZqacBFIeDZyxdGettusYiwn+h7X/QadBnadY7oNVaGDS8eoXciZMAyTlckNxh+Vyid//4Qv+y3JeLwIAAA=="

decoded = base64enc::base64decode(what=encoded)
# decoded = openssl::base64_decode(encoded)
# decoded = jsonlite::base64_dec(encoded)
# 3 times the same result

str(decoded)
# an array of raw-types. Maybe i need to convert to a string?
paste(decoded, collapse = "")

Doesn't look like the base64 decoded data in the Linux shell, but let's try to unzip...

decompressed <- 
  tryCatch({  
    memDecompress(from = paste(decoded, collapse = ""),
                  type = "gzip",
                  asChar = TRUE)
  },
  error = function(cond) {
    message(cond)
    return(NA)
  })
# fails with "internal error -3 in memDecompress(2)" 
(decompressed)

Clearly the input for 'gzip' is not what it expects. It must be some sort of binary string.

But how to get there? What am i doing wrong? Thanks for your advise!

Organzine answered 9/4, 2019 at 20:21 Comment(1)
You can always use system with the shell command. If you need to do it within R, you could try to write the raw object to a file and see if the file is a valid zip archive. If that is so, you could try the last step -- unzipping... Good luck!Wallboard
E
7

The memDecompress function was improved in R version 4.0.0 to work properly. You should now be able to do

memDecompress(base64enc::base64decode(what=encoded), "gzip", asChar=TRUE)

Previous versions were troublesome because they ignored standard headers. Here's a word around for older versions of R. Basically we create a raw stream of bytes and then use gzcon to decompress them

con <- rawConnection(base64enc::base64decode(what=encoded))
readLines(gzcon(con))
close(con)

You will get a warning that there is an "incomplete final line" but that's just because it looks like there wasn't a new line at the end of the file. The data seems fine otherwise.

Eccrine answered 9/4, 2019 at 20:50 Comment(3)
Also see: #39707888Eccrine
I turned your 3 lines of code into a function, added a try-catch to get rid of the warning, applied it to the data frame and got readable code and configuration ! :) thanks!Organzine
@Eccrine I liked your answer here because you helped me a lot. Thanks a bunchChengtu

© 2022 - 2024 — McMap. All rights reserved.