Cross-account access to S3 for AWS Glue in another account
Asked Answered
K

3

9

I want to set up cross account access to an S3 bucket for AWS Glue in another account to crawl. We have two accounts in our environment (A & B):

  • AccountA has an S3 bucket with ACL permissions (i.e. administrator prefers not to use bucket policies) allowing AccountB to both 'List objects' and 'Read Bucket Permissions'.
  • AccountB wants to use Glue (in AccountB) to crawl the data in the S3 bucket residing in AccountA and thereby populate its own data catalog.

I've verified that I can list the content of AccountA's S3 bucket by using AWS CLI via AccountB credentials i.e. aws s3 ls AccountA-S3-Bucket

Within AccountB, I've set up a role (Allows Glue to call AWS services on your behalf) with the following inline policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": "arn:aws:s3:::AccountA-S3-Bucket/*"
        }
    ] 
}

The role also has AmazonS3FullAccess, AWSGlueServiceRole and CloudWatchLogsFullAccess managed policies attached, for good measure. I set up a Glue crawler which has this role attached as a service role.

When I look at the CloudWatch logs after the crawler stops, I get the following error:

[3c81da32-b1eb-49f8-8e51-123fa94f789b] ERROR : Not all read errors will be logged. com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 4C75D2487246DC4B; S3 Extended Request ID: GoXpY+6XC0pL73qJDmHGt3/4Mp/HeFXNiNFU3QGxVxt2ltTV4W41/LuJCBDVCcqc6Hep+tlG+Wg=), S3 Extended Request ID: GoXpY+6XC0pL73qJDmHGt3/4Mp/HeFXNiNFU3QGxVxt2ltTV4W41/LuJCBDVCcqc6Hep+tlG+Wg=

I've also tried to follow this blog post on getting the above working How to provide cross-account access to objects that are in Amazon S3 buckets to AWS Glue & Athena in another account

The only real difference between what I'm doing, and what the blog post is doing, is they set up a bucket policy on the S3 bucket, whereas my administrator has set up ACL permissions on the bucket. I'm wondering if this is the cause of the problem. Any help would be grateful.

Kosaka answered 2/10, 2020 at 16:49 Comment(0)
K
2

The issue was that the admin set an ACL on the bucket, however didn't set ACL (Read Object) on the objects within the bucket. The ACL approach was discarded due to the large number of objects in the bucket, and having to place an ACL on each. A bucket policy was enforced instead - solving the problem.

Kosaka answered 6/10, 2020 at 20:16 Comment(0)
B
1

You are looking in the right direction. ACL is different than the S3 Bucket Policy. To make sure the objects of an S3 bucket are accessible from a particular IAM Role, you need to explicitly allow access to that IAM Role inside your S3 Policy.

Bathometer answered 6/10, 2020 at 10:53 Comment(0)
H
0

i have the same problem but i use policy in s3 not acl

s3 policy contains:

{ "Sid": "CID", "Effect": "Allow", "Principal": { "AWS": "" }, "Action": [ "s3:GetObject", "s3:List" ], "Resource": [ "arn:aws:s3:::mybucket/*", "arn:aws:s3:::mybucket" ], "Condition": { "StringEquals": { "aws:PrincipalOrgID": "o-xxxxxx" } } }

crawler role contains: { "Action": [ "s3:" ], "Resource": [ "arn:aws:s3:::mybucket/", "arn:aws:s3:::mybucket" ], "Effect": "Allow" } crawler logs: `` ERROR : Not all read errors will be logged. com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: EWGRJND7S5XZ17GG; S3 Extended Request ID: kYnTruEJwe0dCgNvYdUBUoeWzKR9notex3cwLVdCrIBkOOdpb3F775q2mCKOL6zEJpI11L1G1Ps=; Proxy: null), S3 Extended Request ID: kYnTruEJwe0dCgNvYdUBUoeWzKR9notex3cwLVdCrIBkOOdpb3F775q2mCKOL6zEJpI11L1G1Ps= BENCHMARK : Classification complete, writing results to database cid_cur

INFO : Crawler configured with Configuration {"Version":1.0,"CrawlerOutput":{"Tables":{"AddOrUpdateBehavior":"MergeNewColumns"}},"Grouping":{"TableGroupingPolicy":"CombineCompatibleSchemas"}} and SchemaChangePolicy {"UpdateBehavior":"LOG","DeleteBehavior":"LOG"}. Note that values in the Configuration override values in the SchemaChangePolicy for S3 Targets.

INFO : Found table cid with no matching schema at the table's S3 location

BENCHMARK : Finished writing to Catalog

BENCHMARK : Crawler has finished running and is in state READY ``

Heredity answered 27/3, 2023 at 15:53 Comment(1)
Use code block pleaseIntermeddle

© 2022 - 2024 — McMap. All rights reserved.