How to preprocess a text stream on the fly in Python?
Asked Answered
F

2

6

What I need is a Python 3 function (or whatever) that would take a text stream (like sys.stdin or like that returned by open(file_name, "rt")) and return a text stream to be consumed by some other function but remove all the spaces, replace all tabs with commas and convert all the letters to lowercase on the fly (the "lazy" way) as the data is read by the consumer code.

I assume there is a reasonably easy way to do this in Python 3 like something similar to list comprehensions but don't know what exactly might it be so far.

Fauver answered 4/2, 2018 at 6:45 Comment(1)
(e.replace(" ",'').replace("\t",',').lower() for e in file) generator might work. It does things the "lazy" wayPierrepierrepont
A
1

I am not sure this is what you mean, but the easiest way i can think of is to inherit from file (the type returned from open) and override the read method to do all the things you want after reading the data. A simple implementation would be:

class MyFile(file):
    def read(*args, **kwargs):
         data = super().read(*args,**kwargs)
         # process data eg. data.replace(' ',' ').replace('\t', ',').lower()
         return data
Arnaldo answered 4/2, 2018 at 7:6 Comment(0)
S
0

I believe what you are looking for is the io module, more specifically a io.StringIO.

You can then use the open() method to get the initial data and modify, then pass it around:

with open(file_name, 'rt') as f:
    stream = io.StringIO(f.read().replace(' ','').replace('\t',',').lower())
Skyla answered 4/2, 2018 at 7:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.