How can I get a flat result from a list comprehension instead of a nested list?

S

18

120

I have a list A, and a function f which takes an item of A and returns a list. I can use a list comprehension to convert everything in A like [f(a) for a in A], but this returns a list of lists. Suppose my input is [a1,a2,a3], resulting in [[b11,b12],[b21,b22],[b31,b32]].

How can I get the flattened list [b11,b12,b21,b22,b31,b32] instead? In other words, in Python, how can I get what is traditionally called flatmap in functional programming languages, or SelectMany in .NET?

(In the actual code, A is a list of directories, and f is os.listdir. I want to build a flat list of subdirectories.)

_{See also: How do I make a flat list out of a list of lists? for the more general problem of flattening a list of lists after it's been created.}

Solubilize answered 2/7, 2009 at 22:40 Comment(0)

T

166

You can have nested iterations in a single list comprehension:

[filename for path in dirs for filename in os.listdir(path)]

which is equivalent (at least functionally) to:

filenames = []
for path in dirs:
    for filename in os.listdir(path):
        filenames.append(filename)

Twentyfour answered 2/7, 2009 at 23:32 Comment(13)

Although clever, that is hard to understand and not very readable. – Sundsvall 20/10, 2014 at 23:1

Doesn't really answer the question as asked. This is rather a workaround for not encountering the problem in the first place. What if you already have a list of lists. For example, what if your list of lists is a result of the multiprocessing module's map function? Perhaps the itertools solution or the reduce solution is best. – Baccy 22/1, 2015 at 3:43

Dave31415: [ item for list in listoflists for item in list ] – Commercial 6/2, 2015 at 19:53

'readability' is a subjective judgment. I find this solution quite readable. – Mckellar 22/5, 2015 at 23:13

Can you please explain your code a bit? Which for execute first and as @Commercial pointed how do we get listoflists in the middle. – Leopoldine 28/1, 2016 at 14:44

I thought it was readable too, until I saw the order of the terms... :( – Blackdamp 10/5, 2017 at 15:14

@cz Same order as if you write nested loops with list.append instead of a comprehension. See here (french wikipedia, but the relevant part is the Python example). – Misquotation 5/7, 2017 at 14:22

This is completely useless in any real use case. Python is really lacking this functionality. How would I for example get every href attribute of ever a tag in a list of BeautifulSoup objects? – Imparadise 11/6, 2018 at 20:0

@Imparadise [a.href for soup in soups for a in soup] – Pistol 28/8, 2018 at 19:18

I'm no expert here, but it seems like Python vs LINQ would bec for a in source for b in a for c in b vs from a in source from b in a from c in b select c, so it's really just taking the "select c" at the end and moving it to the beginning, removing the word "select". I hate it, but it is what it is. – Furfuraceous 8/12, 2018 at 0:0

Just adding a simple running example that helped me: [x for y in ((1,2,3), (4,5,6)) for x in y] – Solander 8/5, 2019 at 7:4

This directly answers the question by creating a flat list directly from the comprehension, rather than flattening a list of lists after the fact. Other answers here really belong on the bigger canonical stackoverflow.com/questions/952914 instead. – Undressed 10/9, 2022 at 8:37

@Baccy on the contrary; this is one of the few questions that actually answers the question as asked; "not encountering the problem in the first place" is the question. "flat map" operations in other languages accept a predicate and directly create a flat list of results, rather than flattening a list of lists after the fact. See for example https://mcmap.net/q/182832/-fmap-and-quot-flat-map-quot-in-haskell; the Python equivalent syntax for \x -> [x,x] is lambda x: [x,x]). – Undressed 10/9, 2022 at 8:44

R

96

>>> from functools import reduce  # not needed on Python 2
>>> list_of_lists = [[1, 2],[3, 4, 5], [6]]
>>> reduce(list.__add__, list_of_lists)
[1, 2, 3, 4, 5, 6]

The itertools solution is more efficient, but this feels very pythonic.

Revkah answered 17/1, 2010 at 18:32 Comment(1)

For a list of 1,000 sublists, this is 100 times slower that the itertools way and the difference only gets worse as your list grows. – Roethke 7/5, 2022 at 7:54

G

73

You can find a good answer in the itertools recipes:

import itertools

def flatten(list_of_lists):
    return list(itertools.chain.from_iterable(list_of_lists))

Gradin answered 2/7, 2009 at 22:50 Comment(1)

The same approach can be used to define flatmap, as proposed by this answer and this external blog post – Cottonmouth 11/7, 2017 at 19:53

E

40

The question proposed flatmap. Some implementations are proposed but they may unnecessary creating intermediate lists. Here is one implementation that's based on iterators.

def flatmap(func, *iterable):
    return itertools.chain.from_iterable(map(func, *iterable))

In [148]: list(flatmap(os.listdir, ['c:/mfg','c:/Intel']))
Out[148]: ['SPEC.pdf', 'W7ADD64EN006.cdr', 'W7ADD64EN006.pdf', 'ExtremeGraphics', 'Logs']

In Python 2.x, use itertools.map in place of map.

Empoison answered 17/11, 2013 at 23:7 Comment(0)

L

24

You could just do the straightforward:

subs = []
for d in dirs:
    subs.extend(os.listdir(d))

Lear answered 2/7, 2009 at 23:37 Comment(1)

Yep, this is fine (though not quite as good as @Ants') so I'm giving it a +1 to honor its simplicity! – Epizoon 3/7, 2009 at 1:31

M

16

You can concatenate lists using the normal addition operator:

>>> [1, 2] + [3, 4]
[1, 2, 3, 4]

The built-in function sum will add the numbers in a sequence and can optionally start from a specific value:

>>> sum(xrange(10), 100)
145

Combine the above to flatten a list of lists:

>>> sum([[1, 2], [3, 4]], [])
[1, 2, 3, 4]

You can now define your flatmap:

>>> def flatmap(f, seq):
...   return sum([f(s) for s in seq], [])
... 
>>> flatmap(range, [1,2,3])
[0, 0, 1, 0, 1, 2]

Edit: I just saw the critique in the comments for another answer and I guess it is correct that Python will needlessly build and garbage collect lots of smaller lists with this solution. So the best thing that can be said about it is that it is very simple and concise if you're used to functional programming :-)

Mimas answered 3/7, 2009 at 12:47 Comment(0)

U

10

subs = []
map(subs.extend, (os.listdir(d) for d in dirs))

(but Ants's answer is better; +1 for him)

Unfolded answered 2/7, 2009 at 22:48 Comment(2)

Using reduce (or sum, which saves you many characters and an import;-) for this is just wrong -- you keep uselessly tossing away old lists to make a new one for each d. @Ants has the right answer (smart of @Steve to accept it!). – Epizoon 3/7, 2009 at 1:28

You can't say in general that this is a bad solution. It depends on whether performance is even an issue. Simple is better unless there is a reason to optimize. That's why the reduce method could be best for many problems. For example you have a slow function that produces a list of a few hundred objects. You want to speed it up by using multiprocessing 'map' function. So you create 4 processes and use reduce to flat map them. In this case the reduce function is fine and very readable. That said, it's good that you point out why this can be suboptimal. But it is not always suboptimal. – Baccy 22/1, 2015 at 3:48

T

10

import itertools
x=[['b11','b12'],['b21','b22'],['b31']]
y=list(itertools.chain(*x))
print y

itertools will work from python2.3 and greater

Tyrolienne answered 21/11, 2012 at 16:48 Comment(0)

E

5

You could try itertools.chain(), like this:

import itertools
import os
dirs = ["c:\\usr", "c:\\temp"]
subs = list(itertools.chain(*[os.listdir(d) for d in dirs]))
print subs

itertools.chain() returns an iterator, hence the passing to list().

Egyptian answered 2/7, 2009 at 22:47 Comment(0)

A

4

This is the most simple way to do it:

def flatMap(array):
  return reduce(lambda a,b: a+b, array)

The 'a+b' refers to concatenation of two lists

Assertion answered 12/1, 2021 at 17:37 Comment(0)

F

1

You can use pyxtension:

from pyxtension.streams import stream
stream([ [1,2,3], [4,5], [], [6] ]).flatMap() == range(7)

Fuchsin answered 3/11, 2017 at 22:41 Comment(4)

Can this directly replace a list comprehension like [f(a) for a in A] (where f returns a list)? Or does it only flatten a list of lists after the fact? – Undressed 10/9, 2022 at 8:39

@KarlKnechtel -it actually do both: it can replace the list comprehension with applying a function f in this way: stream([ [1,2,3], [4,5], [], [6] ]).flatMap( f ) AND it returns a flatten list after that, with f applied over the elements of the flattened list – Fuchsin 12/9, 2022 at 17:44

As far as I can tell, the question is specifically about replacing a list comprehension, since otherwise it would be a duplicate of stackoverflow.com/questions/952914. Mind editing to show a more appropriate example? – Undressed 13/9, 2022 at 11:26

@KarlKnechtel Yes, you are right - I indeed missed the spec that f returns a list. Can't edit the answer, so will post a new answer here: No - it can't directly replace that list comprehension, as [f(a) for a in A] (where f returns a list)? is simply a mapping, which would be equivalent to stream(A).map( f ), while stream(A).flatMap( f ) would be equivalent of stream(A).map( f ).flatMap() - I hope this is slightly more clear. – Fuchsin 14/9, 2022 at 16:44

E

0

Google brought me next solution:

def flatten(l):
   if isinstance(l,list):
      return sum(map(flatten,l))
   else:
      return l

Epic answered 2/7, 2009 at 22:52 Comment(2)

Would be a little better if it handled generator expressions too, and would be a lot better if you explained how to use it... – Valdis 3/7, 2009 at 4:45

This answer belongs on stackoverflow.com/questions/2158395 instead, but it would likely be a duplicate there. – Undressed 10/9, 2022 at 8:39

E

0

I was looking for flatmap and found this question first. flatmap is basically a generalization of what the original question asks for. If you are looking for a concise way of defining flatmap for summable collections such as lists you can use

sum(map(f,xs),[])

It's only a little longer than just writing

flatmap(f,xs)

but also potentially less clear at first.

The sanest solution would be to have flatmap as a basic function inside the programming language but as long as it is not, you can still define it using a better or more concrete name:

# `function` must turn the element type of `xs` into a summable type.
# `function` must be defined for arguments constructed without parameters.
def aggregate(function, xs):
    return sum( map(function, xs), type(function( type(xs)() ))() )

# or only for lists
aggregate_list = lambda f,xs: sum(map(f,xs),[])

Strings are not summable unfortunately, it won't work for them.
You can do

assert( aggregate_list( lambda x: x * [x], [2,3,4] ) == [2,2,3,3,3,4,4,4,4] )

but you can't do

def get_index_in_alphabet(character):
    return (ord(character) & ~0x20) - ord('A')

assert(aggregate( lambda x: get_index_in_alphabet(x) * x, "abcd") == "bccddd")

For strings, you need to use

aggregate_string = lambda f,s: "".join(map(f,s))  # looks almost like sum(map(f,s),"")

assert( aggregate_string( lambda x: get_index_in_alphabet(x) * x, "abcd" ) == "bccddd" )

It's obviously a mess, requiring different function names and even syntax for different types. Hopefully, Python's type system will be improved in the future.

Excitability answered 18/6, 2023 at 14:43 Comment(0)

D

0

You can also use the flatten function using numpy:

import numpy as np

matrix = [[i+k for i in range(10)] for k in range(10)]
matrix_flat = np.array(arr).flatten()

numpy documentation flatten

Domoniquedomph answered 5/8, 2023 at 15:47 Comment(0)

D

0

Why not flatten and flat_map functions appliable to any iterable using generators?

def flatten(iters):
    for it in iters:
        for elem in it:
            yield elem

def flat_map(fn, it):
    return flatten(map(fn, it))

Usage is very simple:

for e in flat_map(range, [1, 2, 3]):
    print(e, end=" ")
# Output: 0 0 1 0 1 2

As an interesting trivia, you can write flatten in a recursive manner aswell. Analysis left to the interested reader!

def flatten(it):
    try:
        yield from next(it)
        yield from flatten(it)
    except StopIteration:
        pass

Dagney answered 8/10, 2023 at 22:23 Comment(0)

H

0

Recursion can effectively flatten any nested list structure. Below is an example code snippet:

lst = [1,2,4, [3,2,5,6], [1,5,6,[9,20,23,45]]]

def flatten_list(l):
  flat_data = []
  for i in l:
    if type(i) != list:    # or you can use:  if isinstance(i, list):
      flat_data.append(i)
    else:
      flat_data.extend(flatten_list(i))
  return flat_data

print(flatten_list(lst))

In this code, the flatten_list function recursively traverses the nested list, appending non-list elements to the flat_data list. When encountering nested lists, it calls itself recursively until all elements are flattened. This approach ensures that any level of nesting within the list is handled effectively, resulting in a single flattened list as the output.

Hankypanky answered 8/3, 2024 at 10:44 Comment(0)

H

0

Recursion can effectively flatten any nested list structure. Below is an example code snippet:

lst = [1,2,4, [3,2,5,6], [1,5,6,[9,20,23,45]]]

def flatten_list(l):
  flat_data = []
  for i in l:
    if type(i) != list:    # or you can use:  if isinstance(i, list):
      flat_data.append(i)
    else:
      flat_data.extend(flatten_list(i))
  return flat_data

print(flatten_list(lst))

In this code, the flatten_list function recursively traverses the nested list, appending non-list elements to the flat_data list. When encountering nested lists, it calls itself recursively until all elements are flattened.

Hankypanky answered 8/3, 2024 at 11:6 Comment(0)

S

-2

If listA=[list1,list2,list3]
flattened_list=reduce(lambda x,y:x+y,listA)

This will do.

Sipple answered 18/7, 2015 at 8:39 Comment(1)

This is a very inefficient solution if the sublists are large. The + operator between two lists is O(n+m) – Lancer 27/4, 2017 at 23:42

Recommended topics

Hot tags