Break string into list of characters in Python [duplicate]
Asked Answered
C

10

81

Essentially I want to suck a line of text from a file, assign the characters to a list, and create a list of all the separate characters in a list -- a list of lists.

At the moment, I've tried this:

fO = open(filename, 'rU')
fL = fO.readlines()

That's all I've got. I don't quite know how to extract the single characters and assign them to a new list.

The line I get from the file will be something like:

fL = 'FHFF HHXH XXXX HFHX'

I want to turn it into this list, with each single character on its own:

['F', 'H', 'F', 'F', 'H', ...]
Consequential answered 23/3, 2012 at 2:32 Comment(0)
C
30

Strings are iterable (just like a list).

I'm interpreting that you really want something like:

fd = open(filename,'rU')
chars = []
for line in fd:
   for c in line:
       chars.append(c)

or

fd = open(filename, 'rU')
chars = []
for line in fd:
    chars.extend(line)

or

chars = []
with open(filename, 'rU') as fd:
    map(chars.extend, fd)

chars would contain all of the characters in the file.

Cavalry answered 23/3, 2012 at 2:37 Comment(2)
@Consequential itertools.chain is really the simplest for this -- chars = list(itertools.chain.from_iterable(open(filename, 'rU))).Pachalic
The code above does not account for the whitespaces, i.e., " "Evert
I
168

You can do this using list:

new_list = list(fL)

Be aware that any spaces in the line will be included in this list, to the best of my knowledge.

Irreconcilable answered 23/3, 2012 at 2:34 Comment(2)
with utf-8 characters it doesn't work as expected. For string "zyć", i was expecting a list of 3 characters, instead i got this list: ['z', 'y', '\xc4', '\x87']. Could you please guide on what could be done to resolve this issue. ThanksPrudery
i've got my answer, i forgot to add 'u' before my string, so it was not getting treated as unicode. thanks.Prudery
L
62

I'm a bit late it seems to be, but...

a='hello'
print list(a)
# ['h','e','l','l', 'o']
Logic answered 11/6, 2016 at 16:12 Comment(0)
C
30

Strings are iterable (just like a list).

I'm interpreting that you really want something like:

fd = open(filename,'rU')
chars = []
for line in fd:
   for c in line:
       chars.append(c)

or

fd = open(filename, 'rU')
chars = []
for line in fd:
    chars.extend(line)

or

chars = []
with open(filename, 'rU') as fd:
    map(chars.extend, fd)

chars would contain all of the characters in the file.

Cavalry answered 23/3, 2012 at 2:37 Comment(2)
@Consequential itertools.chain is really the simplest for this -- chars = list(itertools.chain.from_iterable(open(filename, 'rU))).Pachalic
The code above does not account for the whitespaces, i.e., " "Evert
C
20

python >= 3.5

Version 3.5 onwards allows the use of PEP 448 - Extended Unpacking Generalizations:

>>> string = 'hello'
>>> [*string]
['h', 'e', 'l', 'l', 'o']

This is a specification of the language syntax, so it is faster than calling list:

>>> from timeit import timeit
>>> timeit("list('hello')")
0.3042821969866054
>>> timeit("[*'hello']")
0.1582647830073256
Clavius answered 8/6, 2019 at 7:28 Comment(0)
D
10

So to add the string hello to a list as individual characters, try this:

newlist = []
newlist[:0] = 'hello'
print (newlist)

  ['h','e','l','l','o']

However, it is easier to do this:

splitlist = list(newlist)
print (splitlist)
Dowdy answered 14/1, 2014 at 16:21 Comment(2)
But even easier is: newlist = list('hello')Culicid
@Culicid Yeah, just noticed I hadn't put that in :)Dowdy
I
7
fO = open(filename, 'rU')
lst = list(fO.read())
Indigence answered 23/3, 2012 at 3:4 Comment(0)
E
5

Or use a fancy list comprehension, which are supposed to be "computationally more efficient", when working with very very large files/lists

fd = open(filename,'r')
chars = [c for line in fd for c in line if c is not " "]
fd.close()

Btw: The answer that was accepted does not account for the whitespaces...

Evert answered 25/7, 2013 at 4:33 Comment(0)
M
4
a='hello world'
map(lambda x:x, a)

['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']

An easy way is using function “map()”.

Manipulate answered 22/7, 2015 at 2:55 Comment(0)
C
3

In python many things are iterable including files and strings. Iterating over a filehandler gives you a list of all the lines in that file. Iterating over a string gives you a list of all the characters in that string.

charsFromFile = []
filePath = r'path\to\your\file.txt' #the r before the string lets us use backslashes

for line in open(filePath):
    for char in line:
        charsFromFile.append(char) 
        #apply code on each character here

or if you want a one liner

#the [0] at the end is the line you want to grab.
#the [0] can be removed to grab all lines
[list(a) for a in list(open('test.py'))][0]  

.

.

Edit: as agf mentions you can use itertools.chain.from_iterable

His method is better, unless you want the ability to specify which lines to grab list(itertools.chain.from_iterable(open(filename, 'rU)))

This does however require one to be familiar with itertools, and as a result looses some readablity

If you only want to iterate over the chars, and don't care about storing a list, then I would use the nested for loops. This method is also the most readable.

Corliss answered 23/3, 2012 at 3:23 Comment(0)
U
0

Because strings are (immutable) sequences they can be unpacked similar to lists:

with open(filename, 'rU') as fd:
    multiLine = fd.read()
    *lst, = multiLine

When running map(lambda x: x, multiLine) this is clearly more efficient, but in fact it returns a map object instead of a list.

with open(filename, 'rU') as fd:
    multiLine = fd.read()
    list(map(lambda x: x, multiLine))

Turning the map object into a list will take longer than the unpacking method.

Urion answered 26/3, 2019 at 12:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.