I am making a python function which makes a lot of requests to an api. The function works like this:
async def get_one(session, url):
try:
with session.get(url) as resp:
resp = await resp.json()
except:
resp = None
return resp, url
async def get_all(session, urls):
tasks = [asyncio.create_task(get_one(session, url)) for url in urls]
results = await asyncio.gather(*tasks)
return results
async def make_requests(urls):
timeout = aiohttp.ClientTimeout(sock_read=10, sock_connect=10, total=0.1*len(urls))
connector = aiohttp.TCPConnector(limit=125)
async with aiohttp.ClientSession(connector=connector, skip_auto_headers=['User-Agent'], timeout=timeout) as session:
data = await get_all(session, ids)
return data
def main(urls):
results = []
while urls:
retry = []
response = asyncio.run(make_requests(urls))
for resp, url in response:
if resp is not None:
results.append(resp)
else:
retry.append(url)
urls = retry
return results
The problem is my function keeps building up memory, especially when there are more errors in the try-except block inside the 'get_one' function, the more times I have to retry, the more memory it consumes (something is preventing python from collecting the garbage).
I have come accross an old answer (Asyncio with memory leak (Python)) stating that create_task() is responsible for this (or ensure_future), as it keeps a reference to the original task.
But it is still not clear to me if this is really the case, or how to solve this issue if it is. Any help will appreciated, thank you!
if is not None
at the end is supposed to beif resp is not None
. – Theona