Python list comprehension - want to avoid repeated evaluation
Asked Answered
S

12

68

I have a list comprehension which approximates to:

[f(x) for x in l if f(x)]

Where l is a list and f(x) is an expensive function which returns a list.

I want to avoid evaluating f(x) twice for every non-empty occurance of f(x). Is there some way to save its output within the list comprehension?

I could remove the final condition, generate the whole list and then prune it, but that seems wasteful.

Edit:

Two basic approaches have been suggested:

An inner generator comprehension:

[y for y in (f(x) for x in l) if y]

or memoization.

I think the inner generator comprehension is elegant for the problem as stated. In actual fact I simplified the question to make it clear, I really want:

[g(x, f(x)) for x in l if f(x)]

For this more complicated situation, I think memoization produces a cleaner end result.

Suchlike answered 4/4, 2013 at 13:32 Comment(3)
well you can indeed solve it with a generator comprehension even in this case, simply by [g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]. the main point is if there is any duplication in x.Lupus
Thanks, it seems everything can be solved with comprehensions! I still think once the expression has gotten that complicated that memoization makes for more readable code.Suchlike
Yes, use a generator (with parentheses, not square brackets). If you prefer memoization that's fine, but a generator is much better than building and then filtering the entire list, as you have now. (For example, it could be used if the inner generator is infinite, and the outer comprehension stops when it finds a certain value).Arielle
L
11

A solution (the best if you have repeated value of x) would be to memoize the function f, i.e. to create a wrapper function that saves the argument by which the function is called and save it, than return it if the same value is asked.

a really simple implementation is the following:

storage = {}
def memoized(value):
    if value not in storage:
        storage[value] = f(value)
    return storage[value]

[memoized(x) for x in l if memoized(x)]

and then use this function in the list comprehension. This approach is valid under two condition, one theoretical and one practical. The first one is that the function f should be deterministic, i.e. returns the same results given the same input, and the other is that the object x can be used as a dictionary keys. If the first one is not valid than you should recompute f each timeby definition, while if the second one fails it is possible to use some slightly more robust approaches.

You can find a lot of implementation of memoization around the net, and I think that the new versions of python have something included in them too.

On a side note, never use the small L as a variable name, is a bad habit as it can be confused with an i or a 1 on some terminals.

EDIT:

as commented, a possible solution using generators comprehension (to avoid creating useless duplicate temporaries) would be this expression:

[g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]

You need to weight your choice given the computational cost of f, the number of duplication in the original list and memory at you disposition. Memoization make a space-speed tradeoff, meaning that it keep tracks of each result saving it, so if you have huge lists it can became costly on the memory occupation front.

Lupus answered 4/4, 2013 at 13:40 Comment(5)
Pretty much my answer, only you could just use a @memoize decorator.Gond
this is a rough implementation of the memoization decorator for explaination purpose indeed. about performances, it is worst of the other solution only if each x is different from each others. if even one is the same as the other it will just suffer from the overhead of the dictionary that is marginal given that f has been defined as a costly function. this mean that even with one repetition it will gain a lot in terms of function call reductions. it's not essential as other solutions but it's more robust...Lupus
I edited my comment for a better explanation. I don't personally like memoization, but in this kind of problems I guess it is the more robust solution, and I didn't use it like a silver bullet without thinking.Lupus
Is this really useful though? Your function globally changes storage which means that you need to reset storage before you call this on a new set of data -- All so you can use a list-comprehension easily. IMHO, this isn't worth it. Just use a loop.Catching
as before, if f is deterministic it would be even better, as any value in the second iteration that has been seen before won't need a second evaluation. In fact, the most often you need to perform the cycle, the more the memoization is useful. Each tecnique has it's good and bad side.Lupus
M
53
[y for y in (f(x) for x in l) if y]

Will do.

Moramorabito answered 4/4, 2013 at 13:35 Comment(0)
S
16

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling twice the same function:

In our case, we can name the evaluation of f(x) as a variable y while using the result of the expression to filter the list but also as the mapped value:

[y for x in l if (y := f(x))]
Sitnik answered 27/4, 2019 at 15:26 Comment(0)
L
11

A solution (the best if you have repeated value of x) would be to memoize the function f, i.e. to create a wrapper function that saves the argument by which the function is called and save it, than return it if the same value is asked.

a really simple implementation is the following:

storage = {}
def memoized(value):
    if value not in storage:
        storage[value] = f(value)
    return storage[value]

[memoized(x) for x in l if memoized(x)]

and then use this function in the list comprehension. This approach is valid under two condition, one theoretical and one practical. The first one is that the function f should be deterministic, i.e. returns the same results given the same input, and the other is that the object x can be used as a dictionary keys. If the first one is not valid than you should recompute f each timeby definition, while if the second one fails it is possible to use some slightly more robust approaches.

You can find a lot of implementation of memoization around the net, and I think that the new versions of python have something included in them too.

On a side note, never use the small L as a variable name, is a bad habit as it can be confused with an i or a 1 on some terminals.

EDIT:

as commented, a possible solution using generators comprehension (to avoid creating useless duplicate temporaries) would be this expression:

[g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]

You need to weight your choice given the computational cost of f, the number of duplication in the original list and memory at you disposition. Memoization make a space-speed tradeoff, meaning that it keep tracks of each result saving it, so if you have huge lists it can became costly on the memory occupation front.

Lupus answered 4/4, 2013 at 13:40 Comment(5)
Pretty much my answer, only you could just use a @memoize decorator.Gond
this is a rough implementation of the memoization decorator for explaination purpose indeed. about performances, it is worst of the other solution only if each x is different from each others. if even one is the same as the other it will just suffer from the overhead of the dictionary that is marginal given that f has been defined as a costly function. this mean that even with one repetition it will gain a lot in terms of function call reductions. it's not essential as other solutions but it's more robust...Lupus
I edited my comment for a better explanation. I don't personally like memoization, but in this kind of problems I guess it is the more robust solution, and I didn't use it like a silver bullet without thinking.Lupus
Is this really useful though? Your function globally changes storage which means that you need to reset storage before you call this on a new set of data -- All so you can use a list-comprehension easily. IMHO, this isn't worth it. Just use a loop.Catching
as before, if f is deterministic it would be even better, as any value in the second iteration that has been seen before won't need a second evaluation. In fact, the most often you need to perform the cycle, the more the memoization is useful. Each tecnique has it's good and bad side.Lupus
G
10

You should use a memoize decorator. Here is an interesting link.


Using memoization from the link and your 'code':

def memoize(f):
    """ Memoization decorator for functions taking one or more arguments. """
    class memodict(dict):
        def __init__(self, f):
            self.f = f
        def __call__(self, *args):
            return self[args]
        def __missing__(self, key):
            ret = self[key] = self.f(*key)
            return ret
    return memodict(f)

@memoize
def f(x):
    # your code

[f(x) for x in l if f(x)]
Gond answered 4/4, 2013 at 13:38 Comment(1)
This is my favorite of the memoization options provided here. You still have access to the function via f.f(...), it keeps state on a per-function basis (as opposed to on a global basis), it uses dict.__missing__ which is so useful. You have my upvote.Catching
S
9
[y for y in [f(x) for x in l] if y]

For your updated problem, this might be useful:

[g(x,y) for x in l for y in [f(x)] if y]
Stearoptene answered 4/4, 2013 at 13:34 Comment(4)
Fair enough ... You win, there is a way to do this. I would still use a loop though :-PCatching
You can even make the inner comprehension a generator to save on duplicate list creation.Lorant
I might use this if I factored the inner generator into a temporary variable ahead of time...Catching
This looks like @Mahdi's solution, but it will build the whole list before it filters it. Mahdi's is better: It will create a generator and pump it one value at a time.Arielle
C
8

Nope. There's no (clean) way to do this. There's nothing wrong with a good-old-fashioned loop:

output = []
for x in l:
    result = f(x)
    if result: 
        output.append(result)

If you find that hard to read, you can always wrap it in a function.

Catching answered 4/4, 2013 at 13:34 Comment(0)
A
8

As the previous answers have shown, you can use a double comprehension or use memoization. For reasonably-sized problems it's a matter of taste (and I agree that memoization looks cleaner, since it hides the optimization). But if you're examining a very large list, there's a huge difference: Memoization will store every single value you've calculated, and can quickly blow out your memory. A double comprehension with a generator (round parens, not square brackets) only stores what you want to keep.

To come to your actual problem:

[g(x, f(x)) for x in series if f(x)]

To calculate the final value you need both x and f(x). No problem, pass them both like this:

[g(x, y) for (x, y) in ( (x, f(x)) for x in series ) if y ]

Again: this should be using a generator (round parens), not a list comprehension (square brackets). Otherwise you will build the whole list before you start filtering the results. This is the list comprehension version:

[g(x, y) for (x, y) in [ (x, f(x)) for x in series ] if y ] # DO NOT USE THIS
Arielle answered 4/4, 2013 at 14:36 Comment(3)
beside readability, whould be [g(x, y) for x in series for y in [f(x)] if y] different in memory occupation?when the memorized list will be cleaned up (depends on where memorized function is defined)?Antarctic
The eventual list computed is the same, but you'll take longer to get there (and garbage collecting takes time too).Arielle
PS. But don't take my word for it, try it out and time it!Arielle
A
4

There have been a lot of answers regarding memoizing. The Python 3 standard library now has a lru_cache, which is a Last Recently Used Cache. So you can:

from functools import lru_cache

@lru_cache()
def f(x):
    # function body here

This way your function will only be called once. You can also specify the size of the lru_cache, by default this is 128. The problem with the memoize decorators shown above is that the size of the lists can grow well out of hand.

Amniocentesis answered 10/9, 2014 at 3:56 Comment(1)
This looks useful. If I understand correctly how the list comprehension works, I could use an lru_cache size of 1, since I only ever want to re-use the most recently used value?Suchlike
K
3

You can use memoization. It is a technique which is used in order to avoid doing the same computation twice by saving somewhere the result for each calculated value. I saw that there is already an answer that uses memoization, but I would like to propose a generic implementation, using python decorators:

def memoize(func):
    def wrapper(*args):
        if args in wrapper.d:
            return wrapper.d[args]
        ret_val = func(*args)
        wrapper.d[args] = ret_val
        return ret_val
    wrapper.d = {}
    return wrapper

@memoize
def f(x):
...

Now f is a memoized version of itself. With this implementation you can memoize any function using the @memoize decorator.

Kenwood answered 4/4, 2013 at 13:44 Comment(0)
T
3

Use map() !!

comp = [x for x in map(f, l) if x]

f is the function f(X), l is the list

map() will return the result of f(x) for each x in the list.

Townswoman answered 15/9, 2017 at 19:26 Comment(0)
L
1

Here is my solution:

filter(None, [f(x) for x in l])
Lepage answered 4/4, 2013 at 14:17 Comment(1)
As with the others, you can use a generator inside filter -- And I usually prefer filter(bool,...) to filter(None,...). The do the same thing, but I find the former to be more explicit. e.g. I think this would be better as filter(bool,(f(x) for x in l))Catching
T
-2

How about defining:

def truths(L):
    """Return the elements of L that test true"""
    return [x for x in L if x]

So that, for example

> [wife.children for wife in henry8.wives]
[[Mary1], [Elizabeth1], [Edward6], [], [], []]

> truths(wife.children for wife in henry8.wives) 
[[Mary1], [Elizabeth1], [Edward6]]
Telecast answered 4/4, 2013 at 19:22 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.