How to read a file line-by-line into a list?
Asked Answered
O

28

2025

How do I read every line of a file in Python and store each line as an element in a list?

I want to read the file line by line and append each line to the end of the list.

Ofay answered 18/7, 2010 at 22:25 Comment(0)
P
2953

This code will read the entire file into memory and remove all whitespace characters (newlines and spaces) from the end of each line:

with open(filename) as file:
    lines = [line.rstrip() for line in file]

If you're working with a large file, then you should instead read and process it line-by-line:

with open(filename) as file:
    for line in file:
        print(line.rstrip())

In Python 3.8 and up you can use a while loop with the walrus operator like so:

with open(filename) as file:
    while line := file.readline():
        print(line.rstrip())

Depending on what you plan to do with your file and how it was encoded, you may also want to manually set the access mode and character encoding:

with open(filename, 'r', encoding='UTF-8') as file:
    while line := file.readline():
        print(line.rstrip())
Pinwork answered 18/7, 2010 at 22:28 Comment(15)
I checked the memory profile of different ways given in the answers using the procedure mentioned here. The memory usage is far better when each line is read from the file and processed, as suggested by @DevShark here. Holding all lines in a collection object is not a good idea if memory is a constraint or the file is large. The execution time is similar in both the approaches.Consort
I think that readlines() is deprecated.Genital
@Genital It's not. See the docs: io.IOBase.readlines(). Why do you think it is?Bronchiole
I think the walrus version would stop on empty linesDeserted
What is that := operator?!Wharve
@AlexisWilke basically, while (x := something): do ... means while something: x = something; do ...Haas
Is there any potential downside of squeezing the first approach into one line? Like so: lines = [line.rstrip() for line in f.readlines()]Wobbly
plus one for the walrus. Noice!Nuremberg
@wjandrea: It's not deprecated, but it should have been; fileobj.readlines() is a micro-optimization equivalent to just doing list(fileobj), and its mere existence makes people use it when they really just wanted to iterate fileobj directly, rather than making an unnecessary huge temporary list. The obvious way to do it ends up being the wrong (or at least, inefficient) way to do it so often.Melina
@AlexisWilke: See ":=" syntax and assignment expressions: what and why?Melina
@ketza: No downside. In fact, it's better as just lines = [line.rstrip() for line in f] (avoiding a needless temporary list on top of the one the listcomp generates; file objects are already iterables of their lines, and you can begin processing faster and save on peak memory utilization by avoid .readlines() in this, and most other, cases); I've edited to use that approach.Melina
def read(filename): with open(filename) as file: lines = [line.rstrip() for line in file] return"\n".join(lines)Niphablepsia
As @Deserted suggests, the walrus operator versions terminate on the first empty or blank line. readline will return \n for empty lines, or <whitespace>\n for blank lines. Using an rstrip in the condition like this will cause these to be an empty string, which is Falsey, which terminates the loop. As mentioned in this answer, readline only returns an empty string at EOF so as to be unambiguous as to blank lines (\n) and EOF (''). The first two examples that have rstrip in the print function correctly.Balbo
Instead of writing print(line.rstrip()) print (line, end="") another optionRoofing
In Python 3.9+, open the file with newline = None (default) and use line.removesuffix('\n') instead of rstrip if you want to discard only newlines and keep everything else.Sis
J
1220

See Input and Ouput:

with open('filename') as f:
    lines = f.readlines()

or with stripping the newline character:

with open('filename') as f:
    lines = [line.rstrip('\n') for line in f]
Joby answered 18/7, 2010 at 22:28 Comment(6)
Better, use f.read().splitlines(), which does remove newlinesMasqat
Is the second version, with for line in open(filename) safe? That is, will the file be automatically closed?Identification
Best to read the file one line at a time rather than reading the whole file into memory all at once. Doing so doesn't scale well with large input files. See below answer by robert.Ascus
lines = [x.rstrip('\n') for x in open('data\hsf.txt','r')] If I write this way, how can I close the file after reading?Stationer
Yes, to the point others are making here, while it's not "best practice" to use open without the context manager (or some other guaranteed way to close it), this is not really one of those cases - when the object has no more references to it it will be garbage collected and the file closed, which should happen immediately on error or not, when the list comprehension is done processing.Felicio
@AaronHall "when the object has no more references to it it will be garbage collected and the file closed" - this is true of CPython, but not true of PyPy. Not all Python implementations immediately destruct objects when they are no longer referenced. As such, the best practice of using with with open is relevant even in this case.Tedda
G
715

This is more explicit than necessary, but does what you want.

with open("file.txt") as file_in:
    lines = []
    for line in file_in:
        lines.append(line)
Gratifying answered 18/7, 2010 at 22:27 Comment(6)
I prefer this answer since it doesn't require to load the whole file into memory (in this case it is still appended to array though, but there might be other circumstances). Certainly for big files this approach might mitigate problems.Anthea
Appending to an array is slow. I cannot think of a use case where this is the best solution.Argybargy
Note: This solution does not strip newlines.Whalebone
This solution does load the whole file to memory. I don't know why people think it does not.Allen
@Allen It loads the whole file into lines[] by choice, but can just load line by line.Isotonic
@João That's true, if you added an if-statement in the for-loop, it'd be worthwhile, but as it's written, it's equivalent to lines = file.readlines() but more verbose than necessary.Bronchiole
C
318

This will yield an "array" of lines from the file.

lines = tuple(open(filename, 'r'))

open returns a file which can be iterated over. When you iterate over a file, you get the lines from that file. tuple can take an iterator and instantiate a tuple instance for you from the iterator that you give it. lines is a tuple created from the lines of the file.

Conduce answered 18/7, 2010 at 22:27 Comment(11)
This is the nicest answer if you want the newline characters in there. Any way to modify it to take those out without ruining the beautiful simplicity of this version?Roxannroxanna
@MarshallFarrier Try lines = open(filename).read().split('\n') instead.Conduce
does it close the file?Deaconry
@Deaconry Since there is no remaining reference to the file after the line is run, the destructor should automatically close the file.Conduce
@NoctisSkytower I find lines = open(filename).read().splitlines() a little cleaner, and I believe it also handles DOS line endings better.Lendlease
@dal102 Yes, I agree with you and wish that I had knowledge of the splitlines method sooner. However, note that the newline argument of the open function is None, so universal newlines mode is enabled, and splitting on '\n' is valid in this case. Especially interesting, though, is that there is a bytes.splitlines method. This gives one the ability to emulate universal newlines mode when opening a file in binary mode. You do not actually need to open a file in text mode to easily split the file's data on line boundaries and can avoid importing the re module.Conduce
This is elegant (except it's worth noting in the answer itself that the trailing \n is retained in each element), but I'm curious why you chose tuple() over list(). Based on my informal tests, list() performs slightly better (probably won't matter much). list(), unlike tuple() will return a mutable sequence (which may or may not be desired).Mallorymallow
@Mallorymallow Assuming a file of 1000 lines, a list takes up about 13.22% more space than a tuple. Results come from from sys import getsizeof as g; i = [None] * 1000; round((g(list(i)) / g(tuple(i)) - 1) * 100, 2). Creating a tuple takes about 4.17% more time than creating a list (with a 0.16% standard deviation). Results come from running from timeit import timeit as t; round((t('tuple(i)', 'i = [None] * 1000') / t('list(i)', 'i = [None] * 1000') - 1) * 100, 2) 30 times. My solution favors space over speed when the need for mutability is unknown.Conduce
If the file were very large, wouldn't that create a very large tuple? Wouldn't looping line by line be more memory-efficient in such cases? Thanks.Dismissal
Semantically, you should use a list for this and not a tuple. Performance considerations are premature: if it turns out later you need to add or remove elements from lines, the choice of using a tuple is going to come back to bite you.Bronchiole
@Dismissal Consider the original question, "How to read a file line-by-line into a list?" If the goal is to store the contents of a file in an array-like data structure, my answer should be sufficient. However, in acknowledgment of your question, you are correct that it may be better to read a file line-by-line (or even block-by-block). If you can process a file incrementally without the need of loading it entirely in RAM, you could use a more memory-efficient solution for solving whatever problem you are tackling.Conduce
K
246

According to Python's Methods of File Objects, the simplest way to convert a text file into list is:

with open('file.txt') as f:
    my_list = list(f)
    # my_list = [x.rstrip() for x in f] # remove line breaks

If you just need to iterate over the text file lines, you can use:

with open('file.txt') as f:
    for line in f:
       ...

Old answer:

Using with and readlines() :

with open('file.txt') as f:
    lines = f.readlines()

If you don't care about closing the file, this one-liner will work:

lines = open('file.txt').readlines()

The traditional way:

f = open('file.txt') # Open file on read mode
lines = f.read().splitlines() # List with stripped line-breaks
f.close() # Close file
Kharkov answered 20/4, 2015 at 5:53 Comment(2)
The commented line in the first example # my_list = [x.rstrip() for x in f] # remove line breaks should instead be # my_list = [x.rstrip() for x in my_list] # remove line breaksNewlin
@Newlin no, he's correct. he's looping through the lines in the file. You would be correct if the line is after the 'with' clauseChery
T
235

If you want the \n included:

with open(fname) as f:
    content = f.readlines()

If you do not want \n included:

with open(fname) as f:
    content = f.read().splitlines()
Tiresias answered 2/3, 2014 at 4:22 Comment(2)
great, it contains empty string between each line. '1\n2\n3\n' => [ '1', '', '2', '', '3', '' ]Portend
@Joke You must be doing something wrong (no offense). With s = '1\n2\n3\n', s.splitlines() returns ['1', '2', '3']. Maybe your input actually contains blank lines? s = '1\n\n2\n\n3\n\n'Bronchiole
R
171

You could simply do the following, as has been suggested:

with open('/your/path/file') as f:
    my_lines = f.readlines()

Note that this approach has 2 downsides:

1) You store all the lines in memory. In the general case, this is a very bad idea. The file could be very large, and you could run out of memory. Even if it's not large, it is simply a waste of memory.

2) This does not allow processing of each line as you read them. So if you process your lines after this, it is not efficient (requires two passes rather than one).

A better approach for the general case would be the following:

with open('/your/path/file') as f:
    for line in f:
        process(line)

Where you define your process function any way you want. For example:

def process(line):
    if 'save the world' in line.lower():
         superman.save_the_world()

(The implementation of the Superman class is left as an exercise for you).

This will work nicely for any file size and you go through your file in just 1 pass. This is typically how generic parsers will work.

Remorseless answered 25/2, 2016 at 9:13 Comment(13)
This was exactly what I needed - and thanks for explaining the downsides. As a beginner in Python, it's awesome to understand why a solution is the solution. Cheers!Buskined
the question doesn't state the need to process every line, so this answer gives irrelevant informationDietrich
Think a bit more Corey. Do you really ever want your computer to read each line, without ever doing anything with these lines? Surely you can realize you always need to process them one way or another.Remorseless
@Remorseless always? that's just false.Dietrich
You always need to do something with the lines. It can be as simple as printing the lines, or counting them. There is no value in having your process read the lines in memory, but not doing anything with it.Remorseless
of course you don't always need to process items as you read them from a file at the moment you read them... that's nonsense. Perhaps you need to generate a list of items stored in a file as input to another function? Is that such an outrageous idea?Dietrich
You always need to do something with them. I think the point you are trying to make is that you might want to apply a function to all of them at once, rather than one by one. That is indeed the case sometimes. But it is very inefficient from a memory standpoint to do so, and prevents you from reading files if its footprint is larger than your Ram. That's why typically generic parsers operate in the way I described.Remorseless
Good approach, but just to be precise: in this context, "processing the lines" won't alter them in the original file. You need to copy them to another file if you need them to be modified and stored.Penthouse
@PierreOcinom that is correct. Given that the file is opened in read only mode, you couldn't modify the original file with the code above. To open a file for both reading and writing, use open('file_path', 'r+')Remorseless
I checked the memory profile of both the ways using the procedure mentioned here. The memory usage is far better when each line is read from the file and processed, as suggested by @DevShark. Holding all lines in a collection object is not a good idea if memory is a constraint or the file is large. The execution time is similar in both the approaches.Consort
Thanks for running the numbers. This is what was expected.Remorseless
@Remorseless Loading the lines into a set to use as a filter list during execution. They must all be loaded into RAM, and there's no per-line processing necessary.Ginoginsberg
My statement was “you always need to do something with the lines”, and your example illustrate it: you add them to a set. I think your point is that the code I wrote is not the only way to do things. That is correct. To load them all into a set, the other approach of reading all the lines in one go might be more to your liking.Remorseless
E
105

Having a Text file content:

line 1
line 2
line 3

We can use this Python script in the same directory of the txt above

>>> with open("myfile.txt", encoding="utf-8") as file:
...     x = [l.rstrip("\n") for l in file]
>>> x
['line 1','line 2','line 3']

Using append:

x = []
with open("myfile.txt") as file:
    for l in file:
        x.append(l.strip())

Or:

>>> x = open("myfile.txt").read().splitlines()
>>> x
['line 1', 'line 2', 'line 3']

Or:

>>> x = open("myfile.txt").readlines()
>>> x
['linea 1\n', 'line 2\n', 'line 3\n']

Or:

def print_output(lines_in_textfile):
    print("lines_in_textfile =", lines_in_textfile)

y = [x.rstrip() for x in open("001.txt")]
print_output(y)

with open('001.txt', 'r', encoding='utf-8') as file:
    file = file.read().splitlines()
    print_output(file)

with open('001.txt', 'r', encoding='utf-8') as file:
    file = [x.rstrip("\n") for x in file]
    print_output(file)

output:

lines_in_textfile = ['line 1', 'line 2', 'line 3']
lines_in_textfile = ['line 1', 'line 2', 'line 3']
lines_in_textfile = ['line 1', 'line 2', 'line 3']
Elmaelmajian answered 26/4, 2017 at 4:57 Comment(7)
is the encoding="utf-8" required?Sinciput
@Sinciput no, but when you read a text file, you can have some strange character (expecially in italian)Elmaelmajian
read().splitlines() is provided to you by Python: it's simply readlines() (which is probably faster, as it is less wasteful).Esch
@EricOLebigot from the examples shown, it looks like read().splitlines() and readlines() don't produce the same output. Are you sure they're equivalent?Litigate
If you use readlines only, you need to use the strip method to get rid of the \n in the text, so I changed the last examples using a list comprehension to have the same output in both cases. So, if you use read().readlines() you will have a "clean" item with the line and without the newline characther, otherwise, you must do what you see in the code above.Elmaelmajian
Indeed. Note that in the code above all the strip() should be rstrip("\n") or spaces around a line are deleted. Also, there is no point in doing readlines() in a list comprehension: simply iterating over the file is better, as it doesn't waste time and memory by creating an intermediate list of the lines.Esch
with open("Beautify.txt") as file_in: lines = [] for line in file_in: lines.append(line.replace('\n',''))Sheathe
B
55

Introduced in Python 3.4, pathlib has a really convenient method for reading in text from files, as follows:

from pathlib import Path
p = Path('my_text_file')
lines = p.read_text().splitlines()

(The splitlines call is what turns it from a string containing the whole contents of the file to a list of lines in the file.)

pathlib has a lot of handy conveniences in it. read_text is nice and concise, and you don't have to worry about opening and closing the file. If all you need to do with the file is read it all in in one go, it's a good choice.

Beeman answered 30/4, 2018 at 17:41 Comment(0)
T
47

To read a file into a list you need to do three things:

  • Open the file
  • Read the file
  • Store the contents as list

Fortunately Python makes it very easy to do these things so the shortest way to read a file into a list is:

lst = list(open(filename))

However I'll add some more explanation.

Opening the file

I assume that you want to open a specific file and you don't deal directly with a file-handle (or a file-like-handle). The most commonly used function to open a file in Python is open, it takes one mandatory argument and two optional ones in Python 2.7:

  • Filename
  • Mode
  • Buffering (I'll ignore this argument in this answer)

The filename should be a string that represents the path to the file. For example:

open('afile')   # opens the file named afile in the current working directory
open('adir/afile')            # relative path (relative to the current working directory)
open('C:/users/aname/afile')  # absolute path (windows)
open('/usr/local/afile')      # absolute path (linux)

Note that the file extension needs to be specified. This is especially important for Windows users because file extensions like .txt or .doc, etc. are hidden by default when viewed in the explorer.

The second argument is the mode, it's r by default which means "read-only". That's exactly what you need in your case.

But in case you actually want to create a file and/or write to a file you'll need a different argument here. There is an excellent answer if you want an overview.

For reading a file you can omit the mode or pass it in explicitly:

open(filename)
open(filename, 'r')

Both will open the file in read-only mode. In case you want to read in a binary file on Windows you need to use the mode rb:

open(filename, 'rb')

On other platforms the 'b' (binary mode) is simply ignored.


Now that I've shown how to open the file, let's talk about the fact that you always need to close it again. Otherwise it will keep an open file-handle to the file until the process exits (or Python garbages the file-handle).

While you could use:

f = open(filename)
# ... do stuff with f
f.close()

That will fail to close the file when something between open and close throws an exception. You could avoid that by using a try and finally:

f = open(filename)
# nothing in between!
try:
    # do stuff with f
finally:
    f.close()

However Python provides context managers that have a prettier syntax (but for open it's almost identical to the try and finally above):

with open(filename) as f:
    # do stuff with f
# The file is always closed after the with-scope ends.

The last approach is the recommended approach to open a file in Python!

Reading the file

Okay, you've opened the file, now how to read it?

The open function returns a file object and it supports Pythons iteration protocol. Each iteration will give you a line:

with open(filename) as f:
    for line in f:
        print(line)

This will print each line of the file. Note however that each line will contain a newline character \n at the end (you might want to check if your Python is built with universal newlines support - otherwise you could also have \r\n on Windows or \r on Mac as newlines). If you don't want that you can could simply remove the last character (or the last two characters on Windows):

with open(filename) as f:
    for line in f:
        print(line[:-1])

But the last line doesn't necessarily has a trailing newline, so one shouldn't use that. One could check if it ends with a trailing newline and if so remove it:

with open(filename) as f:
    for line in f:
        if line.endswith('\n'):
            line = line[:-1]
        print(line)

But you could simply remove all whitespaces (including the \n character) from the end of the string, this will also remove all other trailing whitespaces so you have to be careful if these are important:

with open(filename) as f:
    for line in f:
        print(f.rstrip())

However if the lines end with \r\n (Windows "newlines") that .rstrip() will also take care of the \r!

Store the contents as list

Now that you know how to open the file and read it, it's time to store the contents in a list. The simplest option would be to use the list function:

with open(filename) as f:
    lst = list(f)

In case you want to strip the trailing newlines you could use a list comprehension instead:

with open(filename) as f:
    lst = [line.rstrip() for line in f]

Or even simpler: The .readlines() method of the file object by default returns a list of the lines:

with open(filename) as f:
    lst = f.readlines()

This will also include the trailing newline characters, if you don't want them I would recommend the [line.rstrip() for line in f] approach because it avoids keeping two lists containing all the lines in memory.

There's an additional option to get the desired output, however it's rather "suboptimal": read the complete file in a string and then split on newlines:

with open(filename) as f:
    lst = f.read().split('\n')

or:

with open(filename) as f:
    lst = f.read().splitlines()

These take care of the trailing newlines automatically because the split character isn't included. However they are not ideal because you keep the file as string and as a list of lines in memory!

Summary

  • Use with open(...) as f when opening files because you don't need to take care of closing the file yourself and it closes the file even if some exception happens.
  • file objects support the iteration protocol so reading a file line-by-line is as simple as for line in the_file_object:.
  • Always browse the documentation for the available functions/classes. Most of the time there's a perfect match for the task or at least one or two good ones. The obvious choice in this case would be readlines() but if you want to process the lines before storing them in the list I would recommend a simple list-comprehension.
Tindle answered 16/1, 2018 at 22:33 Comment(4)
The last approach is the recommended approach to open a file in Python! Why is it last, then? Won't the vast majority of people just glance at the first few lines of an answer before moving on?Whalebone
@Whalebone I haven't put much thought into it when I wrote the answer. Do you think I should put it at the top of the answer?Tindle
It might be best, yeah. I also just noticed that you mention Python 2, so that could be updated, too.Whalebone
Ah the question was originally tagged python-2.x. It may make sense to update it more generally. I'll see if I come to that in the next time. Thanks for your suggestions. Much appreciated!Tindle
T
45

Clean and Pythonic Way of Reading the Lines of a File Into a List


First and foremost, you should focus on opening your file and reading its contents in an efficient and pythonic way. Here is an example of the way I personally DO NOT prefer:

infile = open('my_file.txt', 'r')  # Open the file for reading.

data = infile.read()  # Read the contents of the file.

infile.close()  # Close the file since we're done using it.

Instead, I prefer the below method of opening files for both reading and writing as it is very clean, and does not require an extra step of closing the file once you are done using it. In the statement below, we're opening the file for reading, and assigning it to the variable 'infile.' Once the code within this statement has finished running, the file will be automatically closed.

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

Now we need to focus on bringing this data into a Python List because they are iterable, efficient, and flexible. In your case, the desired goal is to bring each line of the text file into a separate element. To accomplish this, we will use the splitlines() method as follows:

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

The Final Product:

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

Testing Our Code:

  • Contents of the text file:
     A fost odatã ca-n povesti,
     A fost ca niciodatã,
     Din rude mãri împãrãtesti,
     O prea frumoasã fatã.
  • Print statements for testing purposes:
    print my_list  # Print the list.

    # Print each line in the list.
    for line in my_list:
        print line

    # Print the fourth element in this list.
    print my_list[3]
  • Output (different-looking because of unicode characters):
     ['A fost odat\xc3\xa3 ca-n povesti,', 'A fost ca niciodat\xc3\xa3,',
     'Din rude m\xc3\xa3ri \xc3\xaemp\xc3\xa3r\xc3\xa3testi,', 'O prea
     frumoas\xc3\xa3 fat\xc3\xa3.']

     A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri
     împãrãtesti, O prea frumoasã fatã.

     O prea frumoasã fatã.
Talanta answered 20/12, 2014 at 18:31 Comment(0)
J
30

Here's one more option by using list comprehensions on files;

lines = [line.rstrip() for line in open('file.txt')]

This should be more efficient way as the most of the work is done inside the Python interpreter.

Jemy answered 27/5, 2014 at 12:21 Comment(4)
rstrip() potentially strips all trailing whitespace, not just the \n; use .rstrip('\n').Mallorymallow
This also doesn't guarantee that the file will be closed after reading in all Python implementations (although in CPython, the main Python implementation, it will be).Tedda
This should be more efficient way as the most of the work is done inside the Python interpreter. What does that mean?Whalebone
@AMC: The wording used is wrong, but building the same list via a listcomp allows for using some special purpose bytecodes that operate more efficiently than a manual loop repeatedly calling .append(line.rstrip()) on some list created outside the loop. It's still doing most of the work in the bytecode interpreter loop, it just does it a little faster. To push the per-item work entirely to the C layer on the CPython reference interpreter, you'd do with open('file.txt') as f: lines = list(map(str.rstrip, f)), which would cut the bytecode interpreter out of the loop entirely.Melina
C
29
f = open("your_file.txt",'r')
out = f.readlines() # will append in the list out

Now variable out is a list (array) of what you want. You could either do:

for line in out:
    print (line)

Or:

for line in f:
    print (line)

You'll get the same results.

Cumuliform answered 12/1, 2014 at 10:58 Comment(0)
C
28

Another option is numpy.genfromtxt, for example:

import numpy as np
data = np.genfromtxt("yourfile.dat",delimiter="\n")

This will make data a NumPy array with as many rows as are in your file.

Chace answered 18/6, 2013 at 10:17 Comment(0)
C
27

Read and write text files with Python 2 and Python 3; it works with Unicode

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Define data
lines = ['     A first string  ',
         'A Unicode sample: €',
         'German: äöüß']

# Write text file
with open('file.txt', 'w') as fp:
    fp.write('\n'.join(lines))

# Read text file
with open('file.txt', 'r') as fp:
    read_lines = fp.readlines()
    read_lines = [line.rstrip('\n') for line in read_lines]

print(lines == read_lines)

Things to notice:

  • with is a so-called context manager. It makes sure that the opened file is closed again.
  • All solutions here which simply make .strip() or .rstrip() will fail to reproduce the lines as they also strip the white space.

Common file endings

.txt

More advanced file writing/reading

For your application, the following might be important:

  • Support by other programming languages
  • Reading/writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python.

Couvade answered 16/1, 2018 at 19:42 Comment(0)
C
25

If you'd like to read a file from the command line or from stdin, you can also use the fileinput module:

# reader.py
import fileinput

content = []
for line in fileinput.input():
    content.append(line.strip())

fileinput.close()

Pass files to it like so:

$ python reader.py textfile.txt 

Read more here: http://docs.python.org/2/library/fileinput.html

Cuirbouilli answered 22/11, 2013 at 14:57 Comment(0)
A
20

The simplest way to do it

A simple way is to:

  1. Read the whole file as a string
  2. Split the string line by line

In one line, that would give:

lines = open('C:/path/file.txt').read().splitlines()

However, this is quite inefficient way as this will store 2 versions of the content in memory (probably not a big issue for small files, but still). [Thanks Mark Amery].

There are 2 easier ways:

  1. Using the file as an iterator
lines = list(open('C:/path/file.txt'))
# ... or if you want to have a list without EOL characters
lines = [l.rstrip() for l in open('C:/path/file.txt')]
  1. If you are using Python 3.4 or above, better use pathlib to create a path for your file that you could use for other operations in your program:
from pathlib import Path
file_path = Path("C:/path/file.txt") 
lines = file_path.read_text().split_lines()
# ... or ... 
lines = [l.rstrip() for l in file_path.open()]
Affection answered 6/2, 2015 at 3:34 Comment(2)
This is a bad approach. For one thing, calling .read().splitlines() isn't in any way "simpler" than just calling .readlines(). For another, it's memory-inefficient; you're needlessly storing two versions of the file content (the single string returned by .read(), and the list of strings returned by splitlines()) in memory at once.Tedda
@MarkAmery True. Thanks for highlighting this. I have updated my answer.Affection
F
15

Just use the splitlines() functions. Here is an example.

inp = "file.txt"
data = open(inp)
dat = data.read()
lst = dat.splitlines()
print lst
# print(lst) # for python 3

In the output you will have the list of lines.

Funicular answered 9/9, 2016 at 9:13 Comment(2)
Memory-inefficient compared to using .readlines(). This puts two copies of the file content in memory at once (one as a single huge string, one as a list of lines).Tedda
But data.read().splitlines() is much easier to read, and memory is not always a concern compared to ease of reading the code.Seaplane
G
11

If you are faced with a very large / huge file and want to read faster (imagine you are in a TopCoder or HackerRank coding competition), you might read a considerably bigger chunk of lines into a memory buffer at one time, rather than just iterate line by line at file level.

buffersize = 2**16
with open(path) as f:
    while True:
        lines_buffer = f.readlines(buffersize)
        if not lines_buffer:
            break
        for line in lines_buffer:
            process(line)
Gyneco answered 11/3, 2017 at 8:49 Comment(3)
what does process(line) do? I get an error that there is not such variable defined. I guess something needs importing and I tried to import multiprocessing.Process, but that's not it I guess. Could you please elaborate? ThanksAllaallah
process(line) is a function that you need to implement to process the data. for example, instead of that line, if you use print(line), it will print each line from the lines_buffer.Dutchman
f.readlines(buffersize) returns an immutable buffer. if you want to directly read into your buffer you need to use readinto() function. I will be much faster.Stretch
S
7

The easiest ways to do that with some additional benefits are:

lines = list(open('filename'))

or

lines = tuple(open('filename'))

or

lines = set(open('filename'))

In the case with set, we must be remembered that we don't have the line order preserved and get rid of the duplicated lines.

Below I added an important supplement from @MarkAmery:

Since you're not calling .close on the file object nor using a with statement, in some Python implementations the file may not get closed after reading and your process will leak an open file handle.

In CPython (the normal Python implementation that most people use), this isn't a problem since the file object will get immediately garbage-collected and this will close the file, but it's nonetheless generally considered best practice to do something like:

with open('filename') as f: lines = list(f) 

to ensure that the file gets closed regardless of what Python implementation you're using.

Spindly answered 14/3, 2019 at 14:28 Comment(5)
Since you're not calling .close on the file object nor using a with statement, in some Python implementations the file may not get closed after reading and your process will leak an open file handle. In CPython (the normal Python implementation that most people use), this isn't a problem since the file object will get immediately garbage-collected and this will close the file, but it's nonetheless generally considered best practice to do something like with open('filename') as f: lines = list(f) to ensure that the file gets closed regardless of what Python implementation you're using.Tedda
Thank you for your great comment @MarkAmery! I really appreciate it.Spindly
@Spindly Why have the best (correct) solution last?Whalebone
@Whalebone because first, I wanted to show the simplest ways and for consistency of reasoning.Spindly
Besides, I hope my answer is made so that it is short and easy to read.Spindly
T
5

Use this:

import pandas as pd
data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc.
array = data.values

data is a dataframe type, and uses values to get ndarray. You can also get a list by using array.tolist().

Tatary answered 30/3, 2016 at 15:50 Comment(1)
pandas.read_csv() is for reading CSV data, how is it appropriate here?Whalebone
R
5

In case that there are also empty lines in the document I like to read in the content and pass it through filter to prevent empty string elements

with open(myFile, "r") as f:
    excludeFileContent = list(filter(None, f.read().splitlines()))
Rouse answered 16/1, 2019 at 21:30 Comment(2)
This is unpythonic, be careful.Whalebone
Save some large intermediate temporaries with excludeFileContent = list(filter(None, map(str.rstrip, f))), or, to preserve non-newline trailing whitespace (using str.rstrip as the mapper function strips any and all types of trailing whitespace), add an import (from operator import methodcaller) and do excludeFileContent = list(filter(None, map(methodcaller('rstrip', '\n'), f))).Melina
F
4

Outline and Summary

With a filename, handling the file from a Path(filename) object, or directly with open(filename) as f, do one of the following:

  • list(fileinput.input(filename))
  • using with path.open() as f, call f.readlines()
  • list(f)
  • path.read_text().splitlines()
  • path.read_text().splitlines(keepends=True)
  • iterate over fileinput.input or f and list.append each line one at a time
  • pass f to a bound list.extend method
  • use f in a list comprehension

I explain the use-case for each below.

In Python, how do I read a file line-by-line?

This is an excellent question. First, let's create some sample data:

from pathlib import Path
Path('filename').write_text('foo\nbar\nbaz')

File objects are lazy iterators, so just iterate over it.

filename = 'filename'
with open(filename) as f:
    for line in f:
        line # do something with the line

Alternatively, if you have multiple files, use fileinput.input, another lazy iterator. With just one file:

import fileinput

for line in fileinput.input(filename): 
    line # process the line

or for multiple files, pass it a list of filenames:

for line in fileinput.input([filename]*2): 
    line # process the line

Again, f and fileinput.input above both are/return lazy iterators. You can only use an iterator one time, so to provide functional code while avoiding verbosity I'll use the slightly more terse fileinput.input(filename) where apropos from here.

In Python, how do I read a file line-by-line into a list?

Ah but you want it in a list for some reason? I'd avoid that if possible. But if you insist... just pass the result of fileinput.input(filename) to list:

list(fileinput.input(filename))

Another direct answer is to call f.readlines, which returns the contents of the file (up to an optional hint number of characters, so you could break this up into multiple lists that way).

You can get to this file object two ways. One way is to pass the filename to the open builtin:

filename = 'filename'

with open(filename) as f:
    f.readlines()

or using the new Path object from the pathlib module (which I have become quite fond of, and will use from here on):

from pathlib import Path

path = Path(filename)

with path.open() as f:
    f.readlines()

list will also consume the file iterator and return a list - a quite direct method as well:

with path.open() as f:
    list(f)

If you don't mind reading the entire text into memory as a single string before splitting it, you can do this as a one-liner with the Path object and the splitlines() string method. By default, splitlines removes the newlines:

path.read_text().splitlines()

If you want to keep the newlines, pass keepends=True:

path.read_text().splitlines(keepends=True)

I want to read the file line by line and append each line to the end of the list.

Now this is a bit silly to ask for, given that we've demonstrated the end result easily with several methods. But you might need to filter or operate on the lines as you make your list, so let's humor this request.

Using list.append would allow you to filter or operate on each line before you append it:

line_list = []
for line in fileinput.input(filename):
    line_list.append(line)

line_list

Using list.extend would be a bit more direct, and perhaps useful if you have a preexisting list:

line_list = []
line_list.extend(fileinput.input(filename))
line_list

Or more idiomatically, we could instead use a list comprehension, and map and filter inside it if desirable:

[line for line in fileinput.input(filename)]

Or even more directly, to close the circle, just pass it to list to create a new list directly without operating on the lines:

list(fileinput.input(filename))

Conclusion

You've seen many ways to get lines from a file into a list, but I'd recommend you avoid materializing large quantities of data into a list and instead use Python's lazy iteration to process the data if possible.

That is, prefer fileinput.input or with path.open() as f.

Felicio answered 16/5, 2018 at 20:17 Comment(0)
C
3

I would try one of the below mentioned methods. The example file that I use has the name dummy.txt. You can find the file here. I presume that the file is in the same directory as the code (you can change fpath to include the proper file name and folder path).

In both the below mentioned examples, the list that you want is given by lst.

1. First method

fpath = 'dummy.txt'
with open(fpath, "r") as f: lst = [line.rstrip('\n \t') for line in f]

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

2. In the second method, one can use csv.reader module from Python Standard Library:

import csv
fpath = 'dummy.txt'
with open(fpath) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='   ')
    lst = [row[0] for row in csv_reader] 

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

You can use either of the two methods. The time taken for the creation of lst is almost equal for the two methods.

Chanteuse answered 19/12, 2018 at 1:47 Comment(2)
What’s the advantage of the second approach? Why invoke an additional library, which adds in edge cases (the delimiter, and quotes)?Benildas
What is the delimiter=' ' argument for?Whalebone
F
2

You could also use the loadtxt command in NumPy. This checks for fewer conditions than genfromtxt, so it may be faster.

import numpy
data = numpy.loadtxt(filename, delimiter="\n")
Fatma answered 20/7, 2015 at 17:33 Comment(0)
R
2

I like to use the following. Reading the lines immediately.

contents = []
for line in open(filepath, 'r').readlines():
    contents.append(line.strip())

Or using list comprehension:

contents = [line.strip() for line in open(filepath, 'r').readlines()]
Rase answered 29/3, 2018 at 10:30 Comment(2)
There is no need for readlines(), which even incurs a memory penalty. You can simply remove it, as iterating over a (text) file gives each line in turn.Esch
You should use a with statement to open (and implicitly close) the file.Aalii
D
1

Here is a Python(3) helper library class that I use to simplify file I/O:

import os

# handle files using a callback method, prevents repetition
def _FileIO__file_handler(file_path, mode, callback = lambda f: None):
  f = open(file_path, mode)
  try:
    return callback(f)
  except Exception as e:
    raise IOError("Failed to %s file" % ["write to", "read from"][mode.lower() in "r rb r+".split(" ")])
  finally:
    f.close()


class FileIO:
  # return the contents of a file
  def read(file_path, mode = "r"):
    return __file_handler(file_path, mode, lambda rf: rf.read())

  # get the lines of a file
  def lines(file_path, mode = "r", filter_fn = lambda line: len(line) > 0):
    return [line for line in FileIO.read(file_path, mode).strip().split("\n") if filter_fn(line)]

  # create or update a file (NOTE: can also be used to replace a file's original content)
  def write(file_path, new_content, mode = "w"):
    return __file_handler(file_path, mode, lambda wf: wf.write(new_content))

  # delete a file (if it exists)
  def delete(file_path):
    return os.remove() if os.path.isfile(file_path) else None

You would then use the FileIO.lines function, like this:

file_ext_lines = FileIO.lines("./path/to/file.ext"):
for i, line in enumerate(file_ext_lines):
  print("Line {}: {}".format(i + 1, line))

Remember that the mode ("r" by default) and filter_fn (checks for empty lines by default) parameters are optional.

You could even remove the read, write and delete methods and just leave the FileIO.lines, or even turn it into a separate method called read_lines.

Dorsal answered 20/4, 2019 at 14:44 Comment(2)
Is lines = FileIO.lines(path) really enough simpler than with open(path) as f: lines = f.readlines() to justify this helper's existence? You're saving, like, 17 characters per call. (And most of the time, for performance and memory reasons, you'll want to loop over a file object directly instead of reading its lines into a list anyway, so you won't even want to use this often!) I'm often a fan of creating little utility functions, but this one feels to me like it's just needlessly creating a new way to write something that's already short and easy with the standard library gives us.Tedda
In addition to what @MarkAmery said, why use a class for this?Whalebone
Z
0

Command line version

#!/bin/python3
import os
import sys
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
filename = dname + sys.argv[1]
arr = open(filename).read().split("\n") 
print(arr)

Run with:

python3 somefile.py input_file_name.txt
Zante answered 29/8, 2017 at 23:53 Comment(1)
Why on earth would you want to require the text file be in the same directory your python script is in? Just open(sys.argv[1]) instead and it'll work regardless of a relative path or absolute path being specified, and it won't care where your script lives.Rosena

© 2022 - 2024 — McMap. All rights reserved.