Is there a zero-copy way to create a bytearray from a memoryview?
Asked Answered
S

1

6

I ran into what I thought was going to be a very simple problem (and I hope it is!), which is to take raw data out of memory, and decode it to a Unicode string.

Doing this is the obvious approach, and works:

the_string = mv.tobytes().decode("utf-8")

where mv is the memoryview in question. But that defeats the purpose of zero copy, because a copy is generated by the tobytes() method. So the next thing to try was to "cast" the memoryview to a bytearray. In other words, create a bytearray that uses the memory view "mv" as its backing data. I thought that this would be simple, but I cannot figure out how to do this. Does anyone out there know how?

Shoup answered 6/5, 2020 at 14:59 Comment(1)
If you do not run the line with many loops, it is no need to consider how to save memory.Alibi
O
-1

The answer is codecs.decode in stdlib.

For example:

>>> b = "Hello 你好".encode("utf-8")
>>> b
b'Hello \xe4\xbd\xa0\xe5\xa5\xbd'

>>> m = memoryview(b)
>>> m.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'memoryview' object has no attribute 'decode'

>>> import codecs
>>> codecs.decode(m, "utf-8")
'Hello 你好'
>>> codecs.decode(m[:-3], "utf-8")
'Hello 你'
Opprobrious answered 25/6, 2021 at 2:45 Comment(3)
Unforntunately this is not zero-copy. decode makes a copy of the underlying buffer. True zero copy would be if the resulting string data references the memoryview itself (That would be possible as cpython can understand internal UTF-8 represenation). Similar to how numpy does it when returning slices.Belaud
@AndreasH. I'm not sure that is actually true, given the way Python strings are internall represented. The actual buffer may be ascii, latin-1, UCS-2, or UCS-4, depinding on various things. See: peps.python.org/pep-0393 although possibly, it could try if the underlying representation allowed it an throw an error if not (e.g., have a nocopy=True pararmter that does that)Wycliffite
@Wycliffite I guess this is not possible due to mutability. str is immutable while memoryview / bytearray is mutable. So it has to create a copy for semantics, even if it technically it would be possible to reference. So my previous comment was leading in the wrong direction. However the original question is how to create a bytearray from a memoryview. This should be possible, from mutability standpoint, I guessBelaud

© 2022 - 2024 — McMap. All rights reserved.