Read content of RAR file into memory in Python
Asked Answered
C

7

7

I'm looking for a way to read specific files from a rar archive into memory. Specifically they are a collection of numbered image files (I'm writing a comic reader). While I can simply unrar these files and load them as needed (deleting them when done), I'd prefer to avoid that if possible.

That all said, I'd prefer a solution that's cross platform (Windows/Linux) if possible, but Linux is a must. Just as importantly, if you're going to point out a library to handle this for me, please understand that it must be free (as in beer) or OSS.

Cooperage answered 27/7, 2009 at 0:25 Comment(0)
Z
6

The real answer is that there isn't a library, and you can't make one. You can use rarfile, or you can use 7zip unRAR (which is less free than 7zip, but still free as in beer), but both approaches require an external executable. The license for RAR basically requires this, as while you can get source code for unRAR, you cannot modify it in any way, and turning it into a library would constitute illegal modification.

Also, solid RAR archives (the best compressed) can't be randomly accessed, so you have to unarchive the entire thing anyhow. WinRAR presents a UI that seems to avoid this, but really it's just unpacking and repacking the archive in the background.

Zofiazoha answered 27/7, 2009 at 1:11 Comment(2)
Looks like I'll just have to call unrar and extract to a temp directory for use and clean it up when done. Thanks!Cooperage
Of course you can make one, but you'd have to reverse engineer the format, and it's a moving target (the format has changed over the years). It's probably not worth the bother, but I've seen (proprietary) apps that do it.Odilia
L
9

See the rarfile module:

Lustihood answered 27/7, 2009 at 0:54 Comment(3)
Yes, I was just about to post this. Although the OP could have just googled "python rar"...Jennefer
You unfortunately still need unrar for this to work - it's just a nice API on running an external utility.Zofiazoha
@kiv In my defense, I did in fact google similar but found mostly info on the Chilkat library. It also looks like rarfile still relies on unrar.Cooperage
Z
6

The real answer is that there isn't a library, and you can't make one. You can use rarfile, or you can use 7zip unRAR (which is less free than 7zip, but still free as in beer), but both approaches require an external executable. The license for RAR basically requires this, as while you can get source code for unRAR, you cannot modify it in any way, and turning it into a library would constitute illegal modification.

Also, solid RAR archives (the best compressed) can't be randomly accessed, so you have to unarchive the entire thing anyhow. WinRAR presents a UI that seems to avoid this, but really it's just unpacking and repacking the archive in the background.

Zofiazoha answered 27/7, 2009 at 1:11 Comment(2)
Looks like I'll just have to call unrar and extract to a temp directory for use and clean it up when done. Thanks!Cooperage
Of course you can make one, but you'd have to reverse engineer the format, and it's a moving target (the format has changed over the years). It's probably not worth the bother, but I've seen (proprietary) apps that do it.Odilia
D
3

The pyUnRAR2 library can extract files from RAR archives to memory (and disk if you want). It's available under the MIT license and simply wraps UnRAR.dll on Windows and unrar on Unix. Click "QuickTutorial" for usage examples.

On Windows, it is able to extract to memory (and not disk) with the (included) UnRAR.dll by setting a callback using RARSetCallback() and then calling RARProcessFile() with the RAR_TEST option instead of the RAR_EXTRACT option to avoid extracting any files to disk. The callback then watches for UCM_PROCESSDATA events to read the data. From the documentation for UCM_PROCESSDATA events: "Process unpacked data. It may be used to read a file while it is being extracted or tested without actual extracting file to disk."

On Unix, unrar can simply print the file to stdout, so the library just reads from a pipe connected to unrar's stdout. The unrar binary you need is the one that has the "p" for "Print file to stdout" command. Use "apt-get install unrar" to install it on Ubuntu.

Digamma answered 1/7, 2010 at 2:47 Comment(0)
P
2

It seems like the limitation that rarsoft imposes on derivative works is that you may not use the unrar source code to create a variation of the RAR COMPRESSION algorithm. From the context, it would appear that it's specifically allowing folks to use his code (modified or not) to decompress files, but you cannot use them if you intend to write your own compression code. Here is a direct quote from the license.txt file I just downloaded:

  1. The UnRAR sources may be used in any software to handle RAR archives without limitations free of charge, but cannot be used to re-create the RAR compression algorithm, which is proprietary. Distribution of modified UnRAR sources in separate form or as a part of other software is permitted, provided that it is clearly stated in the documentation and source comments that the code may not be used to develop a RAR (WinRAR) compatible archiver.

Seeing as everyone seemed to just want something that would allow them to write a comic viewer capable of handling reading images from CBR (rar) files, I don't see why people think there's anything keeping them from using the provided source code.

Philanthropic answered 14/12, 2010 at 5:19 Comment(2)
To follow myself up, I noticed the unrar source code archive actually can compile into both libunrar.dll as well as libunrar.so. You'd use the commandline:make -f makefile.unix libPhilanthropic
Just to see what would happen, I changed the code.google.com/p/py-unrar2 lib listed below to use libunrar.so that i built on my OSX box. I just had to change 3 or 4 things in windows.py, like replace the bits that referred to windows data types with the standard ctypes, and change to look for my .so instead of the .dll. One of the tests seems to segfault as well (seems to be the password callback). I'll see if I can figure that bit out.Philanthropic
O
1

RAR is a proprietary format; I don't think there are any public specs, so third-party tool and library support is poor to non-existant.

You're much better off using ZIP; it's completely free, has an accurate public spec, the compression library is available everywhere (zlib is one of the most widely-deployed libraries in the world), and it's very easy to code for.

http://docs.python.org/library/zipfile.html

Odilia answered 27/7, 2009 at 1:43 Comment(1)
While I agree that zip is a nice format for this, it unfortunately is only one of the two common formats used for distributing comics, rar being the other. I need to support both.Cooperage
G
0

Look at the Python "struct" module. You can then interpret the RAR file format directly in your Python program, allowing you to retrieve the content inside the RAR without depending on external software to do it for you.

EDIT: This is of course vanilla Python - there are alternatives which use third-party modules (as already posted).

EDIT 2: According to Wikipedia's article my answer would require you to have permission from the author.

Gigi answered 27/7, 2009 at 0:55 Comment(4)
I think this probably puts you in murky legal territory. (I suspect that what rarfile does is the limit of what you're permitted to do without licensing RAR).Zofiazoha
@Glenn I'm afraid so, or at least thats what google has told me.Cooperage
Google didn't tell me anything, and the legal notices in my Linux copy of RAR has nothing but copyright notices.Odilia
I edited my answer to add a link to Wikipedia's entry on RAR. It appears you have to have permission from the author, but it wouldn't hurt to ask the developers to get a conclusive answer (rarlab.com/feedback.htm, Sales section).Gigi
T
0

The free 7zip library is also able to handle RAR files.

Theatheaceous answered 27/7, 2009 at 0:58 Comment(2)
Is it able to load them into memory using Python? 7zip is good, but I'm not sure it answers the question.Jennefer
@Kiv: It's as able as rarfile is, really, since you could just use subprocess.popen to manage the file.Zofiazoha

© 2022 - 2024 — McMap. All rights reserved.