AWS Data Pipeline: Issue with permissions S3 Access for IAM role
Asked Answered
V

2

5

I'm using the Load S3 data into RDS MySql table template in AWS Data Pipeline to import csv's from a S3 bucket into our RDS MySql. However I (as IAM user with full-admin rights) run into a warning I can't solve:

Object:Ec2Instance - WARNING: Could not validate S3 Access for role. Please ensure role ('DataPipelineDefaultRole') has s3:Get*, s3:List*, s3:Put* and sts:AssumeRole permissions for DataPipeline.

Google told me not to use the default policies for the DataPipelineDefaultRole and DataPipelineDefaultResourceRole. Based on the documentation of IAM Roles for AWS Data Pipeline and topic at this AWS support forum I've used an inline policy and edited the Trust Relationships for both roles.

Policy DataPipelineDefaultRole:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:*",
                "datapipeline:DescribeObjects",
                "datapipeline:EvaluateExpression",
                "dynamodb:BatchGetItem",
                "dynamodb:DescribeTable",
                "dynamodb:GetItem",
                "dynamodb:Query",
                "dynamodb:Scan",
                "dynamodb:UpdateTable",
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:CancelSpotInstanceRequests",
                "ec2:CreateSecurityGroup",
                "ec2:CreateTags",
                "ec2:DeleteTags",
                "ec2:Describe*",
                "ec2:ModifyImageAttribute",
                "ec2:ModifyInstanceAttribute",
                "ec2:RequestSpotInstances",
                "ec2:RunInstances",
                "ec2:StartInstances",
                "ec2:StopInstances",
                "ec2:TerminateInstances",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:DeleteSecurityGroup",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:DescribeNetworkInterfaces",
                "ec2:CreateNetworkInterface",
                "ec2:DeleteNetworkInterface",
                "ec2:DetachNetworkInterface",
                "elasticmapreduce:*",
                "iam:GetInstanceProfile",
                "iam:GetRole",
                "iam:GetRolePolicy",
                "iam:ListAttachedRolePolicies",
                "iam:ListRolePolicies",
                "iam:ListInstanceProfiles",
                "iam:PassRole",
                "rds:DescribeDBInstances",
                "rds:DescribeDBSecurityGroups",
                "redshift:DescribeClusters",
                "redshift:DescribeClusterSecurityGroups",
                "s3:CreateBucket",
                "s3:DeleteObject",
                "s3:Get*",
                "s3:List*",
                "s3:Put*",
                "sdb:BatchPutAttributes",
                "sdb:Select*",
                "sns:GetTopicAttributes",
                "sns:ListTopics",
                "sns:Publish",
                "sns:Subscribe",
                "sns:Unsubscribe",
                "sqs:CreateQueue",
                "sqs:Delete*",
                "sqs:GetQueue*",
                "sqs:PurgeQueue",
                "sqs:ReceiveMessage"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "iam:CreateServiceLinkedRole",
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "iam:AWSServiceName": [
                        "elasticmapreduce.amazonaws.com",
                        "spot.amazonaws.com"
                    ]
                }
            }
        }
    ]
}

Trust Relationship DataPipelineDefaultRole:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "ec2.amazonaws.com",
          "elasticmapreduce.amazonaws.com",
          "datapipeline.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Policy DataPipelineDefaultResourceRole:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:*",
                "datapipeline:*",
                "dynamodb:*",
                "ec2:Describe*",
                "elasticmapreduce:AddJobFlowSteps",
                "elasticmapreduce:Describe*",
                "elasticmapreduce:ListInstance*",
                "rds:Describe*",
                "redshift:DescribeClusters",
                "redshift:DescribeClusterSecurityGroups",
                "s3:*",
                "sdb:*",
                "sns:*",
                "sqs:*"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Trust Relationship DataPipelineDefaultResourceRole:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

I tried several options/combinations but the warning remains. Is there anyone who knows how to solve this permissions issue?

Varico answered 1/2, 2019 at 9:27 Comment(0)
M
5

I might be a bit late answering this question, but I just found out the warning message you saw might be misleading. If you configured the pipeline to put the logs into an S3 bucket, the warning would appear if you specified just the root of the bucket, instead of a path. For instance, if I set the configuration field "Pipeline Log Uri" (that I found in the Default Configuration) to be s3://bucket-name/ then I see the warning. On the other hand, if I specify a path, such as s3://bucket-name/logs, the warning disappears.

The following thread in the AWS forum was really helpful to figure this out: https://forums.aws.amazon.com/thread.jspa?threadID=164635.

Magpie answered 23/11, 2020 at 10:53 Comment(2)
thank you! this worked after more than 1 hour of debugging!Premeditate
thank you so much, this was the culprit over here tooHun
G
1

I don't see any issues with how your policies & roles are defined. It all looks good. The only thing I can think of is how fast you are creating your pipeline after defining roles ?

Just remember the IAM policies are global whereas data-pipeline exists in a specific region, so give it some sleep time between creating the policy/role & creating the datapipeline, it takes time for AWS to replicate IAM policy changes in all regions.

Ex. if you are using bash aws-cli to create/update role & then create/activate data-pipeline, insert `sleep Xs` between role & datapipeline creation.

Nitpick you don't require ec2.amazonaws.com in trust relationship for DataPipelineDefaultRole.

Gingras answered 21/2, 2019 at 23:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.