Python 2.x - Write binary output to stdout?
Asked Answered
D

5

36

Is there any way to write binary output to sys.stdout in Python 2.x? In Python 3.x, you can just use sys.stdout.buffer (or detach stdout, etc...), but I haven't been able to find any solutions for Python 2.5/2.6.

EDIT: I'm trying to push a PDF file (in binary form) to stdout for serving up on a web server. When I try to write the file using sys.stdout.write, it adds all sorts of carriage returns to the binary stream that causes the PDF to render corrupt.

EDIT 2: For this project, I need to run on a Windows Server, unfortunately, so Linux solutions are out.

Simply Dummy Example (reading from a file on disk, instead of generating on the fly, just so we know that the generation code isn't the issue):

file = open('C:\\test.pdf','rb') 
pdfFile = file.read() 
sys.stdout.write(pdfFile)
Diastase answered 3/3, 2010 at 19:47 Comment(7)
When you did sys.stdout.write() what didn't work?Vicarial
See above for explanation, but the issue is basically that python adds carriage returns when it tries to convert the binary stream to a string for writing.Diastase
Does sys.stdout = os.fdopen(1, "wb") work for you to eliminate text-mode conversions? (You'll still need to use sys.stdout.write if you don't want the NLs from print statements.) (docs.python.org/library/os.html#os.fdopen)Supernatant
Thanks for the great question. I learned something new today.Elise
@Roger, surprisingly os.fdopen doesn't solve it, although running python with the -u works. -u does bring extra overhead thoughMornings
Maybe you want to check out the link again, I added another answer. A wrapper for the stdout using os.write() and os.read() seems to be working fine in my test cases.Charmeuse
Good question; I had the same issue when I wanted to serve a PNG file from a Python script under Windows Apache.Elfredaelfrida
B
29

Which platform are you on?

You could try this recipe if you're on Windows (the link suggests it's Windows specific anyway).

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

There are some references on the web that there would/should be a function in Python 3.1 to reopen sys.stdout in binary mode but I don't really know if there's a better alternative then the above for Python 2.x.

Been answered 3/3, 2010 at 20:1 Comment(3)
I did a test just reading the PDF in from a file and writing it straight back out, the carriage returns are still added.Diastase
The windows solution link you give is the perfect solution. I can't thank you enough; this was driving me absolutely up the wall.Diastase
Great! The same works for stdin as well, and both is required to make e.g. a functional cat clone that can handle binary filesYearround
S
10

You can use unbuffered mode: python -u script.py.

-u     Force  stdin,  stdout  and stderr to be totally unbuffered.
       On systems where it matters, also put stdin, stdout and stderr
       in binary mode.
Stereotaxis answered 17/2, 2014 at 20:12 Comment(0)
M
8

You can use argopen.argopen(), it handles dash as stdin/stdout, and fixes binary mode on Windows.

import argopen
stdout = argopen.argopen('-', 'wb')
stdout.write(some_binary_data)
Minatory answered 4/2, 2013 at 8:21 Comment(2)
This is much neater than the ActiveState recipe. How did you figure it out? The module is barely documented.Concertante
Didn't work for me -- my distro doesn't have argopen. Didn't want to install it since "msvcrt.setmode()" mentioned above worked for me.Elfredaelfrida
E
7

In Python 2.x, all strings are binary character arrays by default, so I believe you should be able to just

>>> sys.stdout.write(data)

EDIT: I've confirmed your experience.

I created one file, gen_bytes.py

import sys
for char in range(256):
    sys.stdout.write(chr(char))

And another read_bytes.py

import subprocess
import sys

proc = subprocess.Popen([sys.executable, 'gen_bytes.py'], stdout=subprocess.PIPE)
res = proc.wait()
bytes = proc.stdout.read()
if not len(bytes) == 256:
    print 'Received incorrect number of bytes: {0}'.format(len(bytes))
    raise SystemExit(1)
if not map(ord, bytes) == range(256):
    print 'Received incorrect bytes: {0}'.format(map(ord, bytes))
    raise SystemExit(2)
print "Everything checks out"

Put them in the same directory and run read_bytes.py. Sure enough, it appears as if Python is in fact converting newlines on output. I suspect this only happens on a Windows OS.

> .\read_bytes.py
Received incorrect number of bytes: 257

Following the lead by ChristopheD, and changing gen_bytes to the following corrects the issue.

import sys

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

for char in range(256):
    sys.stdout.write(chr(char))

I include this for completeness. ChristopheD deserves the credit.

Elise answered 3/3, 2010 at 19:50 Comment(6)
This works if you're only trying to add string data, but python tries to stringify binary data when just calling write, corrupting the data.Diastase
I ran your gen_bytes.py and read_bytes.py on Mac OS X (Python 2.5 with minor modifications for the missing "format" keywords) and it "Everything checks out"Dragrope
It looks like it's a Windows-only issue.Diastase
On windows, I found that just running gen_bytes.py > bytes.bin I could see that the file was 257 bytes simply by doing a dirMornings
Unless you're using powershell, in which case gen_bytes.py > bytes.bin generates a unicode-encoded file of 522 bytes.Elise
If I reverse the two processes, such that the parent writes and child reads, then I have to set sys.stdin to be binary, on the child. Perhaps the PIPEs that subprocess sets up are always binary, but stdin/stdout are not?Xylotomous
C
0

I solved this using a wrapper for a file-descriptor. (Tested in Python 3.2.5 on Cygwin)

class BinaryFile(object):
    ''' Wraps a file-descriptor to binary read/write. The wrapped
    file can not be closed by an instance of this class, it must
    happen through the original file.

    :param fd: A file-descriptor (integer) or file-object that
        supports the ``fileno()`` method. '''

    def __init__(self, fd):
        super(BinaryFile, self).__init__()
        fp = None
        if not isinstance(fd, int):
            fp = fd
            fd = fp.fileno()
        self.fd = fd
        self.fp = fp

    def fileno(self):
        return self.fd

    def tell(self):
        if self.fp and hasattr(self.fp, 'tell'):
            return self.fp.tell()
        else:
            raise io.UnsupportedOperation(
                'can not tell position from file-descriptor')

    def seek(self, pos, how=os.SEEK_SET):
        try:
            return os.lseek(self.fd, pos, how)
        except OSError as exc:
            raise io.UnsupportedOperation('file-descriptor is not seekable')

    def write(self, data):
        if not isinstance(data, bytes):
            raise TypeError('must be bytes, got %s' % type(data).__name__)
        return os.write(self.fd, data)

    def read(self, length=None):
        if length is not None:
            return os.read(self.fd, length)
        else:
            result = b''
            while True:
                data = self.read(1024)
                if not data:
                    break
                result += data
            return result
Charmeuse answered 26/11, 2014 at 13:30 Comment(1)
The code in this answer doesn't solve the problem in Python 2.7: the \r bytes still appear on standard output on Windows. By adding msvcrt.setmode(self.fd, os.O_BINARY) (as indicated in other answers), the \r bytes disappear.Rolle

© 2022 - 2024 — McMap. All rights reserved.