Is there a way to read a multi-line csv file using the ReadFromText
transform in Python? I have a file that contains one line I am trying to make Apache Beam read the input as one line, but cannot get it to work.
def print_each_line(line):
print line
path = './input/testfile.csv'
# Here are the contents of testfile.csv
# foo,bar,"blah blah
# more blah blah",baz
p = apache_beam.Pipeline()
(p
| 'ReadFromFile' >> apache_beam.io.ReadFromText(path)
| 'PrintEachLine' >> apache_beam.FlatMap(lambda line: print_each_line(line))
)
# Here is the output:
# foo,bar,"blah blah
# more blah blah",baz
The above code parses the input as two lines even though the standard for multi-line csv files is to wrap multi-line elements within double-quotes.