Capturing terminal output into pandas dataframe without creating external text file

Asked 23/2, 2018 at 11:9 Answered 4/9, 2024 at 18:21

Solved python pandas ffmpeg terminal popen

I am using ffmpeg's extract_mvs file to generate some text information. I would use a command like this in the terminal:

/extract_mvs input.mp4 > output.txt

I would like to use this command with Popen or other subprocess in python such that instead of output.txt, the data is passed straight to a pandas data frame without actually generating the text file.

The idea is to automate this multiple times, so, I am trying to avoid many .txt files from being generated and thus having to open() them one by one.

I thought of something like this:

import subprocess
cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
df = pd.read_csv(a.communicate()[0], sep=',')

But then I get an error: OSError: Expected file path name or file-like object, got <class 'bytes'> type

Can it be fixed and extended so as to read straight from subprocess to pandas?

Redd answered 23/2, 2018 at 11:9 Comment(0)

I found a workaround, using part of the answer of Keith and the one found here, to pass information from string to pandas dataframe.

The final working code is:

import sys
import subprocess
import pandas as pd

cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

b = StringIO(a.communicate()[0].decode('utf-8'))

df = pd.read_csv(b, sep=",")

Redd answered 23/2, 2018 at 13:43 Comment(0)

Updated answer:

The more I think about your question and the output from the first answer I suggested, the more I think your problem is not a decoding issue and is perhaps more a failure to provide the right input to pd.read_csv(). As an alternative you could try skipping pd.read_csv() entirely. Instead, you could try reading the output from the subprocess line by line into a dataframe.

Something like this:

cmd = ['./extract_mvs', 'input.mp4']

df = pd.DataFrame()

a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

for line in a.stdout:
    df = pd.concat([df, line])

a.wait()

Again, I haven't tested this code myself (because I'm traveling and using my phone right now), but I hope this gets you a little closer to a solution.

Original answer:

I haven't tested this, but I think you just need to decode the results returned by the execution of your subprocess. Specifically, you need to decode your results from bytes to utf-8.

You can try: pd.read_csv(a.communicate()[0].decode('utf-8'))

Worm answered 23/2, 2018 at 12:6 Comment(1)

Thanks for the input. It's one step closer to the solution I guess. When I try the above I get: prntscr.com/iipcq6 It prints the information in the console while it was supposed to store in df. When calling df, it says it is not defined. – Redd 23/2, 2018 at 12:17

import os
import subprocess
import pandas as pd
import sys
cmd = 'NSLOOKUP email.fullcontact.com'
df = pd.DataFrame()
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

b = StringIO(a.communicate()[0].decode('utf-8'))

df = pd.read_csv(b, sep=",")
column = list(df.columns)
name = list(df.iloc[1])[0].strip('Name:').strip()
name

Outbound answered 15/11, 2019 at 21:40 Comment(1)

Please don't just leave code behind as an answer. Explain what your code does in English and explain why it is the correct answer. – Streamlet 15/11, 2019 at 21:58

import subprocess
import pandas as pd

cmd = ['./extract_mvs', 'input.mp4']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
df = pd.read_csv(p.stdout)
if p.wait() != 0:
    raise Exception(f"Command exited with {p.returncode}")

This code builds up the dataframe as output is generated. For long-running programs with large output, it will avoid having to buffer the entire output in memory before processing it with read_csv, which may significantly reduce the memory footprint of your program.

In this snippet, p.stdout is a readable stream object (aka a file-like object).

Jonson answered 4/9, 2024 at 18:21 Comment(0)

Updated answer:

Original answer:

Recommended topics

Hot tags