Parsing a tab delimited file into separate lists or strings
Asked Answered
E

5

15

I am trying to take a tab delimited file with two columns, Name and Age, which reads in as this:

'Name\tAge\nMark\t32\nMatt\t29\nJohn\t67\nJason\t45\nMatt\t12\nFrank\t11\nFrank\t34\nFrank\t65\nFrank\t78\n'

And simply create two lists, one with names (called names, without heading) and one with the ages (called ages, but without ages in the list).

Encincture answered 30/9, 2011 at 2:27 Comment(0)
C
20

Using the csv module, you might do something like this:

import csv

names=[]
ages=[]
with open('data.csv','r') as f:
    next(f) # skip headings
    reader=csv.reader(f,delimiter='\t')
    for name,age in reader:
        names.append(name)
        ages.append(age) 

print(names)
# ('Mark', 'Matt', 'John', 'Jason', 'Matt', 'Frank', 'Frank', 'Frank', 'Frank')
print(ages)
# ('32', '29', '67', '45', '12', '11', '34', '65', '78')
Cadman answered 30/9, 2011 at 2:34 Comment(0)
T
10

tab delimited data is within the domain of the csv module:

>>> corpus = 'Name\tAge\nMark\t32\nMatt\t29\nJohn\t67\nJason\t45\nMatt\t12\nFrank\t11\nFrank\t34\nFrank\t65\nFrank\t78\n'
>>> import StringIO
>>> infile = StringIO.StringIO(corpus)

pretend infile was just a regular file...

>>> import csv
>>> r = csv.DictReader(infile, 
...                    dialect=csv.Sniffer().sniff(infile.read(1000)))
>>> infile.seek(0)

you don't even have to tell the csv module about the headings and the delimiter format, it'll figure it out on its own

>>> names, ages = [],[]
>>> for row in r:
...     names.append(row['Name'])
...     ages.append(row['Age'])
... 
>>> names
['Mark', 'Matt', 'John', 'Jason', 'Matt', 'Frank', 'Frank', 'Frank', 'Frank']
>>> ages
['32', '29', '67', '45', '12', '11', '34', '65', '78']
>>> 
Travel answered 30/9, 2011 at 2:41 Comment(0)
E
5

I would use the split and splitlines methods of strings:

names = []
ages = []
for name_age in input.splitlines():
    name, age = name_age.strip().split("\t")
    names.append(name)
    ages.append(age)

If you were parsing a more complex format, I would suggest using the csv module, which can also handle tsv… But it seems like it would be a bit overkill here.

Eriha answered 30/9, 2011 at 2:32 Comment(0)
S
2

Unutbu's answer compressed using a list comprehension:

names = [x[0] for x in csv.reader(open(filename,'r'),delimiter='\t')]
ages = [x[1] for x in csv.reader(open(filename,'r'),delimiter='\t')]
Stapes answered 22/4, 2015 at 20:26 Comment(0)
A
0

marvin's answer but without reading the entire file twice

data = [ (x[0],x[1]) for x in csv.reader(open(filename,'r'),delimiter='\t')]

If you are ok with it being of tuples, instead of two lists

you could still read data into two lists in a single pass and that would be unubtu's answer

Agha answered 23/8, 2018 at 21:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.