How do you set a string of bytes from an environment variable in Python?
Asked Answered
F

3

15

Say that you have a string of bytes generated via os.urandom(24),

b'\x1b\xba\x94(\xae\xd0\xb2\xa6\xf2f\xf6\x1fI\xed\xbao$\xc6D\x08\xba\x81\x96v'

and you'd like to store that in an environment variable,

export FOO='\x1b\xba\x94(\xae\xd0\xb2\xa6\xf2f\xf6\x1fI\xed\xbao$\xc6D\x08\xba\x81\x96v'

and retrieve the value from within a Python program using os.environ.

foo = os.environ['FOO']

The problem is that, here, foo has the string literal value '\\x1b\\xba\\x94... instead of the byte sequence b'\x1b\xba\x94....

What is the proper export value to use, or means of using os.environ to treat FOO as a string of bytes?

Frankforter answered 11/6, 2017 at 2:23 Comment(3)
Could be because of the single quotation marks.Dynamite
I'm confused; if you print (repr) foo in Python where it came from something like os.urandom and see b'\x1b\xba...' then it is (in Python) raw bytes. If you read it from the envvar and see '\\x1b\\xba' then it's a (Unicode) string that's still escaped. As per this question, it seems like bash won't interpret your export FOO line as real binary, but a string with a bunch of \x's in it.Domenech
An alternative option is to save the bytes in a binary file, and use the filename as an environment variableAstragal
S
8

You can 'unescape' your bytes in Python with:

import os
import sys

if sys.version_info[0] < 3:  # sadly, it's done differently in Python 2.x vs 3.x
    foo = os.environ["FOO"].decode('string_escape')  # since already in bytes...
else:
    foo = bytes(os.environ["FOO"], "utf-8").decode('unicode_escape')
Soffit answered 11/6, 2017 at 2:30 Comment(1)
Your Py3 solution produces a str, not a bytes object, and unnecessarily converts the string form to bytes. Replace that second line with: foo = os.environb[b'FOO'].decode('unicode-escape').encode('latin-1') to make it read from os.environb (the bytes-oriented view of the environment), decode the escapes, then convert back to raw bytes (latin-1 is a 1-1 mapping that maps the first 256 Unicode ordinals to their ordinal value as bytes).Arista
S
12

The easiest option is to simply set it as binary data in your shell. This uses ANSI string quoting and avoids the need for any sort of conversion on the Python side.

export FOO=$'\x1b\xba\x94(\xae\xd0\xb2\xa6\xf2f\xf6\x1fI\xed\xbao$\xc6D\x08\xba\x81\x96v'

NB: this type of string is not part of the POSIX specification at this time, but is in the process of being added. Support is present in almost all major shells, including Bash, ksh, and zsh. Ensure your shell supports it before relying on its use.

Silverfish answered 27/9, 2017 at 23:28 Comment(1)
This is a great approach, since it makes reading the data in Python as simple as os.environ['FOO'] (Py2) or os.environb[b'FOO'] (Py3), so you get the data in Python as raw bytes without needing to encode or decode at all. I'd completely forgotten about this feature of Bash, so thanks for the reminder!Arista
S
8

You can 'unescape' your bytes in Python with:

import os
import sys

if sys.version_info[0] < 3:  # sadly, it's done differently in Python 2.x vs 3.x
    foo = os.environ["FOO"].decode('string_escape')  # since already in bytes...
else:
    foo = bytes(os.environ["FOO"], "utf-8").decode('unicode_escape')
Soffit answered 11/6, 2017 at 2:30 Comment(1)
Your Py3 solution produces a str, not a bytes object, and unnecessarily converts the string form to bytes. Replace that second line with: foo = os.environb[b'FOO'].decode('unicode-escape').encode('latin-1') to make it read from os.environb (the bytes-oriented view of the environment), decode the escapes, then convert back to raw bytes (latin-1 is a 1-1 mapping that maps the first 256 Unicode ordinals to their ordinal value as bytes).Arista
L
3

With zwer's answer I tried the following

first from bash (this is the same binary literal given by ybakos)

export FOO='\x1b\xba\x94(\xae\xd0\xb2\xa6\xf2f\xf6\x1fI\xed\xbao$\xc6D\x08\xba\x81\x96v'

then I launched the python shell (I have python 3.5.2)

>>> import os
>>> # ybakos's original binary literal
>>> foo =  b'\x1b\xba\x94(\xae\xd0\xb2\xa6\xf2f\xf6\x1fI\xed\xbao$\xc6D\x08\xba\x81\x96v'
>>> # ewer's python 3.x solution
>>> FOO = bytes(os.environ["FOO"], "utf-8").decode('unicode_escape')
>>> foo == FOO
False
>>> ^D

The last line of foo == FOO should return true, so the solution does not appear to work correctly.

I noticed that there is an os.envirnb dictionary, but I couldn't figure out to set an Environment Variable to a binary literal, so I tried the following alternative which uses base64 encoding to get an ASCII version of the binary literal.

First launch python shell

>>> import os
>>> import base64
>>> foo = os.urandom(24)
>>> foo
b'{\xd9q\x90\x8b\xba\xecv\xb3\xcb\x1e<\xd7\xba\xf1\xb4\x99\xf056\x90U\x16\xae'
>>> foo_base64 = base64.b64encode(foo)
>>> foo_base64
b'e9lxkIu67Hazyx4817rxtJnwNTaQVRau'
>>> ^D

Then in the bash shell

export FOO_BASE64='e9lxkIu67Hazyx4817rxtJnwNTaQVRau'

Then back in the python shell

>>> import os
>>> import base64
>>> # the original binary value from the first python shell session
>>> foo = b'{\xd9q\x90\x8b\xba\xecv\xb3\xcb\x1e<\xd7\xba\xf1\xb4\x99\xf056\x90U\x16\xae'
>>> dec_foo = base64.b64decode(bytes(os.environ.get('FOO_BASE64'), "utf-8"))
>>> # the values match!
>>> foo == dec_foo
True
>>> ^D

The last line shows that the 2 results are the same!!

What we are doing, is first getting a binary value from os.urandom() and Base64 encoding it. We then use the Base64 encoded value to set the environment variable. Note: base64.b64encode() returns a binary value, but it will only contain printable ASCII characters.

Then in our program we read in the Base64 encode string value from the environment variable, convert the string into it's binary form, and finally Base64 decode it back to its original value.

Lacey answered 27/9, 2017 at 22:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.