Unable to run AWS Glue Crawler due to IAM Permissions
Asked Answered
L

1

8

I am unable to run newly created AWS Glue Crawler. I followed IAM Role guide at https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console

  1. Created new Crawler Role AWSGlueServiceRoleDefault with AWSGlueServiceRole and AmazonS3FullAccess managed policies
  2. Trust Relationship contains:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
  1. User executing crawler signs via SSO and inheriths arn:aws:iam::aws:policy/AdministratorAccess
  2. I even tried to create new AWS user with all permissions AWS Permissions

After executing Crawler it fails within 8 seconds with following error:

Crawler cannot be started. Verify the permissions in the policies attached to the IAM role defined in the crawler

What other IAM permissions are needed?

Lithesome answered 8/1, 2023 at 6:4 Comment(4)
Can you share the role, with all the policies? Is your bucket encrypted by kms?Overplus
Regarding 4) - Did you attach these policies to your role or really create a new user? The user won't help you here as the Crawler will use the permissions of the role you give it.Jarvis
Did you have any luck with this? I'm having the same issue here.Pity
I got this error, tried again without changing anything, and it worked the second time. Guessing I did not have enough IP addresses available.Countdown
A
6

If you're crawling tables and schemas via a JDBC connection to an external data store, make sure you have specified network options to the Glue Connection. I got the exactly same error if the options is not specified. I think the error message is somewhat misleading here.

Here's what I have defined to my crawlers:

  1. A role, e.g. AWSGlueServiceRoleDefault with AWSGlueServiceRole managed policies attached. enter image description here

  2. Specify the network options to your connections. enter image description here

  3. A NAT gateway is created and attached to the subnet you have defined in the step 2 so that there is a public IP available for your crawler to connect to the external data store. enter image description here

If you're attempting to connecting RDS, since the crawler and the database are both in the AWS network, a NAT is not needed. Just define the security group rules to allow the connections. Check the document here.

If S3 is the target data source, a VPC endpoint for S3 is recommended. Check the document here.

Aragonite answered 2/2, 2023 at 5:50 Comment(1)
This should be marked as an accepted answer.Segregationist

© 2022 - 2024 — McMap. All rights reserved.