Python mmap ctypes - read only
Asked Answered
G

3

7

I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don't want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I'm using python's mmap module.

If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a "view" of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.

I think I want to use ctypes from_address() method instead, which doesn't appear to need write access. I'm probably missing something simple, but I'm not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren't persisted to disk), but I'd rather keep things read only.

Any suggestions?

Gothar answered 9/6, 2011 at 14:31 Comment(2)
If you're using the python mmap module, why do you need create a ctypes object?Commutative
The log isn't just text, it includes data structures that I have mapped to the ctypes Structure class. So I will be mapping the memory to the various Structure types, and using that to access the sub-elements and make decisions about how to process different parts of the log.Gothar
G
1

Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).

I did find that I could get a pointer with the following

firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object 
#at an arbitrary point in the file

This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.

I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.

Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here

Gothar answered 12/6, 2011 at 23:21 Comment(0)
G
1

Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file." If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY

An example: open two cmd.exe or terminals and in one terminal:

mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")

write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
    while True:
        value = int(input('enter an integer using mm_file_write: '))
        write.value = value
        print('updated value')
        value = int(input('enter an integer using mm_file_read: '))
        #read.value assignment doesnt update anonymous backed file
        read.value = value
        print('updated value')
except KeyboardInterrupt:
    print('got exit event')

In the other terminal do:

mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
    while True:
        new_i = struct.unpack('i', mm_file[:4])
        if i != new_i:
            print('i: {} => {}'.format(i, new_i))
            i = new_i
        time.sleep(0.1)
except KeyboardInterrupt:
    print('Stopped . . .')

And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY

Giavani answered 20/2, 2019 at 19:6 Comment(0)
C
0

I ran into a similar issue (unable to setup a readonly mmap) but I was using only the python mmap module. Python mmap 'Permission denied' on Linux

I'm not sure it is of any help to you since you don't want the mmap to be private?

Commutative answered 9/6, 2011 at 15:6 Comment(1)
I don't have any problem opening and accessing the file through mmap. The problem is that from_buffer() throws an exception if the buffer isn't writable, so I need an alternative to that call.Gothar

© 2022 - 2024 — McMap. All rights reserved.