AWS Glue job hangs when calling the AWS Glue client API using boto3 from the context of a running AWS Glue Job?
Asked Answered
H

2

6

I'm trying to create a Glue Job that enumerates all tables in a database in my catalog. In order to do so I use the following code snippet:

session = boto3.Session(region_name='us-east-2')
glue = session.client('glue')
tables = glue.get_tables(
    DatabaseName='customer1'
)
print tables

The job hangs for about 15 minutes and the connection appears to be refused, because I eventually get the following error:

botocore.vendored.requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='glue.us-east-2.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to glue.us-east-2.amazonaws.com timed out. (connect timeout=60)’))

This issue is specific to the glue API. I can use the S3 API with no problems.

I've gone through all my security groups and opened up all the ports to traffic from anywhere. I've even added self-referencing rules. But this is to no avail.

I can't figure out what could be causing the connection to be blocked. Is AWS specifically blocking glue requests?

Hallie answered 13/6, 2018 at 22:27 Comment(2)
I am running into the same issue.Chase
I have the same problem when running glue boto client commands from Glue Dev Endpoint. However when running as a normal glue job all boto3 commands run successfully.Acentric
A
1

I was facing the same problem that boto3 calls to glue or s3 were hanging and eventually timing out.

I fixed it by changing the subnet-id when creating the dev-endpoint. Initially I was using a subnet that routed traffic to an Internet Gateway. I switched to a subnet routing traffic to an internal NAT gateway. Hope this helps.

Acentric answered 6/7, 2018 at 12:2 Comment(1)
This has worked for you may be because you not assigned public-ip to your instance.Milne
M
1

glue job times out when calling aws boto3 client api

Solution: Just repeat what @darius matonas replied to make it straight, when you need to run a Glue job to get either the job you just created or other jobs' information, BEFORE you call boto3 -- something like get_job_run or get_job_runs, MAKE SURE create a new endpoint in VPC and assigne to same Subnet and Security Group that your Glue connection uses.

Maragretmarala answered 13/9, 2022 at 19:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.