How about the Python 3 'memoryview' way.
Memoryview is a sort of mishmash of the byte/bytearray and struct modules, with several benefits.
- Not limited to just text and bytes, handles 16 and 32 bit words too
- Copes with endianness
- Provides a very low overhead interface to linked C/C++ functions and data
Simplest example, for a byte array:
memoryview(b"some bytes").tolist()
[115, 111, 109, 101, 32, 98, 121, 116, 101, 115]
Or for a unicode string, (which is converted to a byte array)
memoryview(bytes("\u0075\u006e\u0069\u0063\u006f\u0064\u0065\u0020", "UTF-16")).tolist()
[255, 254, 117, 0, 110, 0, 105, 0, 99, 0, 111, 0, 100, 0, 101, 0, 32, 0]
#Another way to do the same
memoryview("\u0075\u006e\u0069\u0063\u006f\u0064\u0065\u0020".encode("UTF-16")).tolist()
[255, 254, 117, 0, 110, 0, 105, 0, 99, 0, 111, 0, 100, 0, 101, 0, 32, 0]
Perhaps you need words rather than bytes?
memoryview(bytes("\u0075\u006e\u0069\u0063\u006f\u0064\u0065\u0020", "UTF-16")).cast("H").tolist()
[65279, 117, 110, 105, 99, 111, 100, 101, 32]
memoryview(b"some more data").cast("L").tolist()
[1701670771, 1869422624, 538994034, 1635017060]
Word of caution. Be careful of multiple interpretations of byte order with data of more than one byte:
txt = "\u0075\u006e\u0069\u0063\u006f\u0064\u0065\u0020"
for order in ("", "BE", "LE"):
mv = memoryview(bytes(txt, f"UTF-16{order}"))
print(mv.cast("H").tolist())
[65279, 117, 110, 105, 99, 111, 100, 101, 32]
[29952, 28160, 26880, 25344, 28416, 25600, 25856, 8192]
[117, 110, 105, 99, 111, 100, 101, 32]
Not sure if that's intentional or a bug but it caught me out!!
The example used UTF-16, for a full list of codecs see Codec registry in Python 3.10
bytes(item, "utf8")
, as explicit is better than implicit, so...str.encode( )
defaults silently to bytes, making you more Unicode-zen but less Explicit-Zen. Also "common" is not a term that i like to follow. Also,bytes(item, "utf8")
, is more like thestr()
, andb"string"
notations. My apologies if i am so noob to understand your reasons. Thank you. – Heslerencode()
doesn't callbytes()
, it's the other way around. Of course that's not immediately obvious which is why I asked the question. – Haldemansome_string.encode(encoding)
, being as example"string".encode("utf8")
which returns type bytes. For me, using the termbytes()
makes much more sense. I tend to think that encode/decode is more charset related than data type. Again, i may be so much noob to think like that... but i love explicit, and there not "byte" refer into"some".encode("utf8")
. Thank you, i've checked thatstr.encode()
just doesnt't default to anyting. – Heslerb = mystring.encode( )
– Crapulous