Unable to run AWS Glue Crawler due to IAM Permissions

About

Asked 8/1, 2023 at 6:4 Answered 2/2, 2023 at 5:50

amazon-web-services etl amazon-iam aws-glue

I am unable to run newly created AWS Glue Crawler. I followed IAM Role guide at https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console

Created new Crawler Role AWSGlueServiceRoleDefault with AWSGlueServiceRole and AmazonS3FullAccess managed policies
Trust Relationship contains:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

User executing crawler signs via SSO and inheriths arn:aws:iam::aws:policy/AdministratorAccess
I even tried to create new AWS user with all permissions

After executing Crawler it fails within 8 seconds with following error:

Crawler cannot be started. Verify the permissions in the policies attached to the IAM role defined in the crawler

What other IAM permissions are needed?

Lithesome answered 8/1, 2023 at 6:4 Comment(4)

Can you share the role, with all the policies? Is your bucket encrypted by kms? – Overplus 8/1, 2023 at 9:28

Regarding 4) - Did you attach these policies to your role or really create a new user? The user won't help you here as the Crawler will use the permissions of the role you give it. – Jarvis 8/1, 2023 at 15:29

Did you have any luck with this? I'm having the same issue here. – Pity 25/1, 2023 at 1:13

I got this error, tried again without changing anything, and it worked the second time. Guessing I did not have enough IP addresses available. – Countdown 10/1, 2024 at 21:55

If you're crawling tables and schemas via a JDBC connection to an external data store, make sure you have specified network options to the Glue Connection. I got the exactly same error if the options is not specified. I think the error message is somewhat misleading here.

Here's what I have defined to my crawlers:

A role, e.g. AWSGlueServiceRoleDefault with AWSGlueServiceRole managed policies attached.
Specify the network options to your connections.
A NAT gateway is created and attached to the subnet you have defined in the step 2 so that there is a public IP available for your crawler to connect to the external data store.

If you're attempting to connecting RDS, since the crawler and the database are both in the AWS network, a NAT is not needed. Just define the security group rules to allow the connections. Check the document here.

If S3 is the target data source, a VPC endpoint for S3 is recommended. Check the document here.

Aragonite answered 2/2, 2023 at 5:50 Comment(1)

This should be marked as an accepted answer. – Segregationist 21/9, 2023 at 16:56

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags