Is there a temporary folder that I can access to hold files temporarily while running processes within AWS glue? For example, in Lambda we have access to a /tmp directory as long as the process is executing. Do we have something similar in AWS Glue that we can store files while the job is executing?
Are you asking for this? There are a number of argument names that are recognized and used by AWS Glue, that you can use to set up the script environment for your Jobs and JobRuns:
- --TempDir — Specifies an S3 path to a bucket that can be used as a temporary directory for the Job.
Here is a link, which you can refer.
Hope, this helps.
Yes, there is a tmp directory which you can use to move files to and from s3.
s3 = boto3.resource('s3')
--Downloads file to local spark directory tmp
s3.Bucket(bucket_name).download_file(DATA_DIR+file,'tmp/'+file)
And you can also upload files from 'tmp/' to s3.
'tmp'
, therefore the prepending of 'tmp/'
is unnecessary. –
Challenge OP clarified in a comment
I was hoping to have a temp dir local to the system
My experience in Oct 2023...
For AWS Glue 4.0 Spark jobs (not tested with lower Glue versions or with Python Shell jobs), the folder /tmp
is usable. Note that this is NOT the temporary location that you specify in the job details tab (that location is in S3).
I have successfully used /tmp
to extract a large (9GB) CSV file from a zip archive before uploading it to S3.
How much space is available?
The table in this AWS post lists disk sizes for different worker types but it's unclear how much of this is actually available to jobs. As I said, I've gone up to 9GB.
/tmp
is also usable in Pyhon Shell jobs –
Carbonaceous For anyone looking for an answer to this for pythonshell
jobs. Yes, this is possible. I've been using the /tmp
folder with Python 3.9 jobs. But the disk space is somewhat limited. The doc says:
You can set the value to 0.0625 or 1. The default is 0.0625. In either case, the local disk for the instance will be 20GB.
However, this test suggests that around 5GB of those 20GB are already in use.
© 2022 - 2024 — McMap. All rights reserved.