You can use map()
with a generator, but it will just try to map generator objects, and it will not try to descend into the generators themselves.
A possible approach is to have a generator do the looping the way you want and have a function operate on the objects.
This has the added advantage of separating more neatly the looping from the computation.
So, something like this should work:
# Python 3 (Win10)
from concurrent.futures import ThreadPoolExecutor
import os
def read_samples(samples):
for sample in samples:
with open(os.path.join('samples', sample)) as fff:
for _ in range(10):
yield fff
def main():
with ThreadPoolExecutor(10) as exc:
files = os.listdir('samples')
files = list(exc.map(lambda x: str(x.read()), read_samples(files)))
print(str(len(files)), end="\r")
if __name__=="__main__":
main()
Another approach is to nest an extra map
call to consume the generators:
# Python 3 (Win10)
from concurrent.futures import ThreadPoolExecutor
import os
def read_samples(samples):
for sample in samples:
with open(os.path.join('samples', sample)) as fff:
for _ in range(10):
yield fff
def main():
with ThreadPoolExecutor(10) as exc:
files = os.listdir('samples')
files = exc.map(list, exc.map(lambda x: str(x.read())), read_samples(files))
files = [f for fs in files for f in fs] # flattening the results
print(str(len(files)), end="\r")
if __name__=="__main__":
main()
A more minimal example
Just to get to some more reproducible example, the traits of your code can be written in a more minimal example (that does not rely on files laying around on your system):
from concurrent.futures import ThreadPoolExecutor
def foo(n):
for i in range(n):
yield i
with ThreadPoolExecutor(10) as exc:
x = list(exc.map(foo, range(k)))
print(x)
# [<generator object foo at 0x7f1a853d4518>, <generator object foo at 0x7f1a852e9990>, <generator object foo at 0x7f1a852e9db0>, <generator object foo at 0x7f1a852e9a40>, <generator object foo at 0x7f1a852e9830>, <generator object foo at 0x7f1a852e98e0>, <generator object foo at 0x7f1a852e9fc0>, <generator object foo at 0x7f1a852e9e60>]
from concurrent.futures import ThreadPoolExecutor
def foos(ns):
for n in range(ns):
for i in range(n):
yield i
with ThreadPoolExecutor(10) as exc:
k = 8
x = list(exc.map(lambda x: x ** 2, foos(k)))
print(x)
# [0, 0, 1, 0, 1, 4, 0, 1, 4, 9, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 25, 0, 1, 4, 9, 16, 25, 36]
from concurrent.futures import ThreadPoolExecutor
def foo(n):
for i in range(n):
yield i ** 2
with ThreadPoolExecutor(10) as exc:
k = 8
x = exc.map(list, exc.map(foo, range(k)))
print([z for y in x for z in y])
# [0, 0, 1, 0, 1, 4, 0, 1, 4, 9, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 25, 0, 1, 4, 9, 16, 25, 36]
yield
function") insidemap
, but as you have observed this will just instantiate that generator. Is there a reason whyread_sample
does not just produce a list? What are you trying to achieve by using generators? Note that you can get the results by usinglist(itertools.chain(*exc.map(read_sample, files)))
instead, but it will benefit from neither threads nor generator. – Operonfiles
is list with 100 names then result is also list with 100 elements and you printlen()
which means number of elements on list, not summary size of all elements (which could be 1000 - likesum(len(x) for x in files)
) – Ulaulahmap
peek inside the generator, something like an non-existentmap_from
or something? – Commissuremap_from
to run generator 10 times for every filename. Normalmap
will run function/generator only once for every filename. BTW: usingread()
it reads all data from file in first execution and next executions would create only empty results - so it seems useless. You would have to use ie.read(5)
to read only part of file. – UlaulahA
objects which has a regular expression. Those eachA
objects will have to generate 10B
objects. Therefore I am yielding 10B
objects. I am trying to make it multithreaded as they are all file operations and using themap
function. If this POC is successful, I will apply the same concept in my product. However, I believe there should be a way in python to achieve this, regardless of whatever the business logic is. – Cidmap
. I am not bound to use generator. But the same problem lies in returning list as well, At the end I will get list of list which has to be flattened before using. – Cid