I am using pandas and xlsxwriter
to create excel reports in AWS lambda. I have been having an issue where the disk space fills up when a reasonably large excel file in the region of 60MB is created, but there is 512MB of disk space available.
I have been trying to work out what and why this is happening and I came across a promising looking property I could pass, constant_memory
, so I tried this out using the following code:
with pd.ExcelWriter(output, options={"constant_memory": True}) as writer:
It seems this actually ensures my lambda runs to completion consistently and if I remove it then it no longer does, but the issue is I get the following warning:
FutureWarning: Use of **kwargs is deprecated, use engine_kwargs instead.
with pd.ExcelWriter(output, options={"constant_memory": True}) as writer:
I then read a post on SO, which I think is by the author of xlsxwriter
suggesting this property doesn't actually work when using pandas. This is sort of evident in that I only get the first column in my excel report and the last row of 360000 rows. I am confused as to how passing this is ensuring my lambda completes0? And why is it that if I do not pass it then my lambda runs out of disk space?
The second strange thing is the documentation for pandas doesn't mention any argument called options
at all, and if I try and pass constant_memory
using engine_kwargs
:
with pd.ExcelWriter(output, engine_kwargs={'constant_memory': True}) as writer:
I get the following error:
TypeError: __init__() got an unexpected keyword argument 'constant_memory'
I hope someone who is more familiar with xlsxwriter can help me out in understanding these few points.
The documentation for xlsxwriter
also mentions a property in_memory
, which seems like it could resolve my issue but I am unable to pass this to pd.ExcelWriter
:
TypeError: __init__() got an unexpected keyword argument 'in_memory'
**Edit: Passing in_memory
does actually solve the issues I have but I get a deprecation warning passing it in using the kwargs options
.
with pd.ExcelWriter(output, options={"in_memory": True}) as writer:
FutureWarning: Use of **kwargs is deprecated, use engine_kwargs instead.
with pd.ExcelWriter(output, options={"in_memory": True}) as writer:
How can I pass this property to the xlsxwriter
using pandas without getting the deprecation warning as it seems to resolve my issues?
constant_memory
didnt solve my issue butin_memory
does, thanks for the pointer towards the incorrect syntax, not sure how I missed that. – Metapsychology