Reading the binary output of an external program in Common Lisp
Asked Answered
Z

3

6

I'm trying to run an external program in SBCL and capture its output. The output is binary data (a png image), while SBCL insists on interpreting it as strings.

I tried a number of ways, like

(trivial-shell:shell-command "/path/to/png-generator" :input "some input")

(with-input-from-string (input "some input")
  (with-output-to-string (output)
    (run-program "/path/to/png-generator" () :input input :output output))


(with-input-from-string (input "some input")
  (flexi-streams:with-output-to-sequence (output)
    (run-program "/path/to/png-generator" () :input input :output output))

But I get errors like

Illegal :UTF-8 character starting at byte position 0.

It seems to me that SBCL is trying to interpret the binary data as a text and decode it. How do I change this behaviour ? I'm interested only in obtaining a vector of octets.

Edit: Since it is not clear from the text above, I'd like to add that at least in the case of flexi-stream, the element-type of the stream is a flexi-streams:octect (which is a (unsigned-byte 8)). I would expect at least in this case run-program to read the raw bytes without many issues. Instead I get a message like Don't know how to copy to stream of element-type (UNSIGNED-BYTE 8)

Zeuxis answered 11/1, 2012 at 8:51 Comment(0)
J
5

Edit: I got angry at not being able to do this very simple task and solved the problem.

Functionally, the ability to send a stream of type UNSIGNED-BYTE into run-program and have it work correctly is severely limited, for reasons I don't understand. I tried gray streams, flexi-streams, fd streams, and a few other mechanisms, like you.

However, perusing run-program's source (for the fifth or sixth time), I noticed that there's an option :STREAM you can pass to output. Given that, I wondered if read-byte would work... and it did. For more performant work, one could determine how to get the length of a non-file stream and run READ-SEQUENCE on it.

(let* 
       ;; Get random bytes
      ((proc-var (sb-ext:run-program "head" '("-c" "10" "/dev/urandom")
                                     :search t
       ;; let SBCL figure out the storage type. This is what solved the problem.
                                     :output :stream))
       ;; Obtain the streams from the process object.
       (output (process-output proc-var))
       (err (process-error proc-var)))
  (values
   ;;return both stdout and stderr, just for polish.
   ;; do a byte read and turn it into a vector.
   (concatenate 'vector
                ;; A byte with value 0 is *not* value nil. Yay for Lisp!
                (loop for byte = (read-byte output nil)
                   while byte
                   collect byte))
   ;; repeat for stderr
   (concatenate 'vector
                (loop for byte = (read-byte err nil)
                   while byte
                   collect byte))))
Jacks answered 11/1, 2012 at 17:34 Comment(3)
Yes, this seem to work, thank you very much ! In any case I'm not sure where the problem lies. I mean, using a file stream as output works fine, so the problem is not entirely in run-program, but rather on the interaction between a string stream and run-program. But the I would expect that using with-output-to-sequence would work fine. Anyway at least I have a solution now. Thanks again.Zeuxis
@MarcoRighele: on SO, if you care to accept an answer, it marks the question as answered in the SO system - it's the check mark by the vote buttons.Jacks
If was waiting to see if the other solution was working too. In any case I prefer this one as it has less external dependencies.Zeuxis
C
3

If you're willing to use some external libraries, this can be done with babel-streams. This is a function I use to safely get content from a program. I use :latin-1 because it maps the first 256 bytes just to the characters. You could remove the octets-to-string and have the vector.

If you wanted stderr as well, you could use nested 'with-output-to-sequence' to get both.

(defun safe-shell (command &rest args)                                                                                                           
  (octets-to-string                                                                                                                              
   (with-output-to-sequence (stream :external-format :latin-1)                                                                                   
     (let ((proc (sb-ext:run-program command args :search t :wait t :output stream)))                                                            
       (case (sb-ext:process-status proc)                                                                                                        
         (:exited (unless (zerop (sb-ext:process-exit-code proc))                                                                                
                    (error "Error in command")))                                                                                                 
         (t (error "Unable to terminate process")))))                                                                                            
   :encoding :latin-1))                                                                                                                          
Coverage answered 12/1, 2012 at 6:47 Comment(3)
I have problems running your example. With SBCL under linux I get the warning :ENCODING is not a known argument keyword., and the running safe-shell gives me "Unknown character encoding: #<BABEL:EXTERNAL-FORMAT ISO-8859-1 LF {1004CDC241}>". Am I missing something ?Zeuxis
Not completely sure without know versions of SBCL and babel you're using. You can try :iso-8859-1 as well, since that's the canonical name for it. Make sure that OCTETS-TO-STRING is coming from BABLE.Coverage
Ah yes, I was using sb-ext:octects-to-string. With the right function and the most recent version of sbcl it seems to work correctly. Many thanks.Zeuxis
I
2

Paul Nathan already gave a pretty complete answer as to how to read I/O from a program as binary, so I'll just add why your code didn't work: because you explicitely asked SBCL to interpret the I/O as a string of UTF-8 characters, using with-{in,out}put-to-string.

Also, I'd like to point that you don't need to go as far as run-program's source code to get to the solution. It's clearly documented in SBCL's manual.

Inoculation answered 9/8, 2013 at 16:10 Comment(2)
That's true for the with-output-to-string (which has as an element-type of character) of course, but not for the flexi-stream case, where the stream is made of octects. I expected that run-program would read elements of the right element-type depending on the stream, but it seems that's not the case. Anyway, I realize now that the examples are not very clear, I'll put some more details end error messagesZeuxis
But you'll note that you don't get the same error with flexi-streams. If you look at the error message and the stack trace, you'll see that a fair guess is that SBCL doesn't use any write function but some implementation-specific optimization, and it fails with the flexi-stream.Inoculation

© 2022 - 2024 — McMap. All rights reserved.