Nested Step Function in a Step Function: Unknown Error: "...not authorized to create managed-rule"
Asked Answered
B

6

41

I have a Step Function (Parent) created in a SAM/CloudFormation template that, among other things, calls another Step Function (Child). I'm following the instructions on calling Child, from Parent, using the service integration pattern. But I'm getting an IAM-related (I think) error I can't resolve when deploying via the CLI. (The error manifests in the CLI output, so it never actually makes it into AWS. There have been plenty of prior deployments, so the changeset is just trying to modify the Step Function with this deployment.)

'arn:aws:iam::{Account-Number}:role/{Parent-Step-Function-Role-Name}' is not authorized to create managed-rule. (Service: AWSStepFunctions; Status Code: 400; Error Code: AccessDeniedException; Request ID: {Long-Id-Number})

To get the synchronous behavior I want (Parent calls Child, waits for execution of Child to complete, then moves onto the next State) I use the suggestion (from the service integration pattern link above) to create a task (in my SAM template) that looks like the following:

...More States...

"Call Child State": {
  "Type": "Task",
  "Next": "The Next State",
  "Resource": "arn:aws:states:::states:startExecution.sync",
  "Parameters": {  
    "Input": {
      "comment": "Hello World!"
    },
    "StateMachineArn": "${ChildStepFunction}",
    "Name": "ChildExecutionFromParent"
  }
},

...More States...

I've defined the IAM-role for Parent as follows, making sure that it only has Lambda execution privileges for the Lambda functions in Parent, and, more applicably to the problem, has permission to StartExecution of Child. I followed the instructions in the link just below, that stated StartExecution was the only permission needed when using the service integration pattern.

https://docs.aws.amazon.com/step-functions/latest/dg/stepfunctions-iam.html

ParentStepFunctionRole:
  Type: AWS::IAM::Role
  Properties:
    AssumeRolePolicyDocument:
      Version: 2012-10-17
      Statement:
        -
          Effect: Allow
          Principal:
            Service:
              - !Sub states.${AWS::Region}.amazonaws.com
          Action: sts:AssumeRole
    Policies:
      -
        PolicyName: ChildStepFunctionExecution
        PolicyDocument:
          Version: 2012-10-17
          Statement:
            -
              Effect: Allow
              Action: states:StartExecution
              Resource: !Ref ChildStepFunction
            -
              Effect: Allow
              Action: lambda:InvokeFunction
              Resource:
                  - !GetAtt Function1.Arn
                  ...
                  - !GetAtt FunctionX.Arn

I've tried replacing the above State with a simple Pass State to make sure there were no other errors in the Step Function blocking the deployment, and it deployed fine. So I know it has to do with that State. (Also of note, when deploying with the Pass State for testing, I left the role as defined above, so, again, I know it's not a syntax error with the Policies that would be causing this. Obviously, that's not the same as perhaps having the wrong or missing policies.)

Boondoggle answered 10/3, 2020 at 7:5 Comment(1)
I still get this problem with permissions fixed, it also is intermittent. Tried adding a bunch of DependsOn but doesn't help!Pianette
H
50

[Updated 5/22/2020 based on the post from @Matt and the comment from @Joe.CK to reduce the scope to the specific Resource required.]

This Stack Overflow question pointed me in the right direction. botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the CreateStateMachine operation

The issue appears to be stemming from CloudWatch and I was able to get past it by adding the following statement to my IAM policy.

- Effect: Allow
  Action:
  - events:PutTargets
  - events:PutRule
  - events:DescribeRule
  Resource: 
  - !Sub arn:${AWS::Partition}:events:${AWS::Region}:${AWS::AccountId}:rule/StepFunctionsGetEventsForStepFunctionsExecutionRule

The AWS Step Functions sample project "Start a workflow within a workflow" includes something similar but restricted to a single Lambda function it invokes.

Helprin answered 10/3, 2020 at 17:36 Comment(5)
A-ha. That helped me a lot! Thanks. It helped me read more carefully on the link above (here: docs.aws.amazon.com/step-functions/latest/dg/…) which I originally said only needed states:StartExecution. However, scrolling down you'll find that it outlines what you show there (plus a bit more) as needed for synchronous calls. I guess when it's asynchronous it's just "fire and forget" with a startExecution, but to track completion, you need to let CloudWatch send a message back to Step Functions. I'll try it tonight and I'll mark this as solved once tested.Boondoggle
See below for how this answer was fully incorporated into the correct Role definition.Boondoggle
You don't actually need the all the resources *. (It is generally bad idea to allow any role to have access to * resources) The one you are looking for is StepFunctionsGetEventsForStepFunctionsExecutionRule. It is used for nested workflow within Step Function (i.e. start another state machine execution within as a Task)Megrim
I had a similar problem but when calling ECS, to figure it out I initially allowed "*" on the resource and then in the console noticed it was referring to StepFunctionsGetEventsForECSTaskRule. In fact I had both problems because it was a Step Function calling a Step Function running an ECS taskSpann
In my case, I need to permit StepFunctionsGetEventsForECSTaskRule for my ec2 tasks and just to make things easier in case things change, I'll just use "rule/StepFunctionsGetEvents*"Froehlich
B
9

Adding the full Role definition that solved the problem combining what Andrew provided and what was in the documentation. It's in four parts:

  1. Allow the Child Step Function to run via states:StartExecution
  2. Allow the Parent to Describe and Stop any Step Functions. (I'd presume that this may be able to be more closely tailored, with the resource; however, this is a copy and paste from AWS' documentation.)
  3. Allow the Parent to create/modify (Put) a rule into Cloud Watch (a specific system generated/managed resource) so that it can hold until execution is complete (because of the synchronous execution).
  4. Allow the Parent to run all the applicable Lambda functions in the Step Function. (This isn't really part of the problem I had, but related to the Step Function over all. This could also include other integrations—ex. SNS—if you have them.)
  ParentStepFunctionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          -
            Effect: Allow
            Principal:
              Service:
                - !Sub states.${AWS::Region}.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        -
          PolicyName: ParentStepFunctionExecutionPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              -
                Effect: Allow
                Action: states:StartExecution
                Resource: !Ref ChildStepFunction
              -
                Effect: Allow
                Action:
                  - states:DescribeExecution
                  - states:StopExecution
                Resource: "*"
              -
                Effect: Allow
                Action:
                  - events:PutTargets
                  - events:PutRule
                  - events:DescribeRule
                Resource: !Sub arn:aws:events:${AWS::Region}:${AWS::AccountId}:rule/StepFunctionsGetEventsForStepFunctionsExecutionRule
              -
                Effect: Allow
                Action: lambda:InvokeFunction
                Resource:
                  - !GetAtt Function1.Arn
                  ...
                  - !GetAtt FunctionX.Arn
Boondoggle answered 10/3, 2020 at 21:11 Comment(1)
I added the managed policy CloudWatchEventsFullAccess and everything is working. Thanks.Swaziland
S
6

For StepFunction to properly listen to some other services it submits tasks / jobs to it needs to be able to create CloudWatch Event rules. Those rules follow the naming pattern:

arn:${AWS::Partition}:events:${AWS::Region}:${AWS::AccountId}:rule/StepFunctions*

Allowing events:PutTargets, events:PutRule and events:DescribeRule on that should enable SF to create and manage the required rules.

See e.g. https://docs.aws.amazon.com/step-functions/latest/dg/batch-job-notification.html - for a batch job the rule has the name StepFunctionsGetEventsForBatchJobsRule

Sarsenet answered 11/3, 2022 at 11:49 Comment(0)
S
3

I added the "CloudWatcheventsFullAccess" managed policy, and that error went away. Thank you to the above answers. I wanted to add my code example here, because it would not fit inside of a comment.

  NetworkFactory:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: statemachine/network-factory.asl.json
      DefinitionSubstitutions:
        CreateHubStateMachineArn: !Ref CreateHubStateMachine
        CreateVpcStateMachineArn: !Ref CreateVpcStateMachine
      Policies:
        - StepFunctionsExecutionPolicy:
            StateMachineName: !GetAtt CreateHubStateMachine.Name
        - StepFunctionsExecutionPolicy:
            StateMachineName: !GetAtt CreateVpcStateMachine.Name
        - "CloudWatchEventsFullAccess"
Swaziland answered 22/7, 2020 at 13:17 Comment(3)
Note this is not considered good practice. It's one thing to remove the error, and another to do it correctly. Why not give it, or all resources in that case, full admin access and never worry about permissions again? ;-)Osman
Full admin access to all resources? I'm not sure what you are referring to?Swaziland
I'm pointing out your solution gives too many permissions with the "CloudWatchEventsFullAccess" managed policy. We wouldn't want to use "AdministratorAccess" policy every time we run into a permission issue, right? Giving it "CloudWatchEventsFullAccess" is going down that road; over allocation of permissions to alleviate a problem.Osman
A
2

Just a second. This is slightly different, an inline policy, authorizing the events:PutRule action on the StepFunctionsGetEventsForStepFunctionsExecutionRule managed rule resource.

  StateMachine:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: statemachine/parentstatemachine.asl.json
      DefinitionSubstitutions:
        ChildWorkflowArn: !Ref ChildStateMachine
      Policies: 
        - Version: 2012-10-17
          Statement:
            - Effect: Allow
              Action:
                - events:PutTargets
                - events:PutRule
                - events:DescribeRule
              Resource: !Sub arn:${AWS::Partition}:events:${AWS::Region}:${AWS::AccountId}:rule/StepFunctionsGetEventsForStepFunctionsExecutionRule
        - StepFunctionsExecutionPolicy:
            StateMachineName: !Ref ChildStateMachine

Just to make sure wires aren't crossed, the following is somewhat like the error that CloudFormation reports without the inline policy statement, though not exactly.

'arn:aws:iam::xxxxxxxx:role/xxxxxxxx' is not authorized to create managed-rule.
(
 Service: AWSStepFunctions; 
 Status Code: 400; 
 Error Code: AccessDeniedException; 
 Request ID: xxxxxxx;
 Proxy: null
)

role/xxxxxxxx is generated by the SAM CloudFormation transformation for the AWS::Serverless::StateMachine resource. It's blatant automation.

Amends answered 3/6, 2020 at 16:25 Comment(0)
M
0

The StepFunctionsGetEventsForStepFunctionsExecutionRule is definitely key to the solution. For my situation, that wasn't enough. When using Terraform, I also had to bump the AWS provider to >= 2.69 since that is where the provider picks up the retry logic for AccessDeniedExceptions. Additionally, I was running into issues with the resource dependency graph that Terraform built to apply the changes. The graph had terraform trying to create the state machine before the policy was created and the policy had dependencies on the state machine. The solution was to break up the uber policy into three policy attachments to the role used by the state machine. One policy had the StepFunctionsGetEventsForStepFunctionsExecutionRule, a second had policies around the states action and the third was the original uber policy. With that in place, the dependency graph was such that the two new policies were created, then the state machine then the original uber policy and all was well.

Meghan answered 29/1, 2021 at 14:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.