Accessing AWS ElastiCache (Redis CLUSTER mode) from different AWS accounts via AWS PrivateLink

Asked 2/7, 2021 at 11:29 Answered 11/7, 2021 at 16:33

Solved python amazon-web-services docker kubernetes redis

I have a business case where I want to access a clustered Redis cache from one account (let's say account A) to an account B.

I have used the solution mentioned in the below link and for the most part, it works Base Solution

The base solution works fine if I am trying to access the clustered Redis via redis-py however if I try to use it with redis-py-cluster it fails.

I am testing all this in a staging environment where the Redis cluster has only one node but in the production environment, it has two nodes, so the redis-py approach will not work for me.

Below is my sample code

redis = "3.5.3"
redis-py-cluster = "2.1.3"
==============================


from redis import Redis
from rediscluster import RedisCluster

respCluster = 'error'
respRegular = 'error'

host = "vpce-XXX.us-east-1.vpce.amazonaws.com"
port = "6379"

try:
    ru = RedisCluster(startup_nodes=[{"host": host, "port": port}], decode_responses=True, skip_full_coverage_check=True)
    respCluster = ru.get('ABC')
except Exception as e:
    print(e)

try:
    ru = Redis(host=host, port=port, decode_responses=True)
    respRegular = ru.get('ABC')
except Exception as e:
    print(e)

return {"respCluster": respCluster, "respRegular": respRegular}

The above code works perfectly in account A but in account B the output that I got was

{'respCluster': 'error', 'respRegular': '123456789'}

And the error that I am getting is

rediscluster.exceptions.ClusterError: TTL exhausted

In account A we are using AWS ECS + EC2 + docker to run this and

In account B we are running the code in an AWS EKS Kubernetes pod.

What should I do to make the redis-py-cluster work in this case? or is there an alternative to redis-py-cluster in python to access a multinode Redis cluster?

I know this is a highly specific case, any help is appreciated.

EDIT 1: Upon further research, it seems that TTL exhaust is a general error, in the logs the initial error is

redis.exceptions.ConnectionError: 
Error 101 connecting to XX.XXX.XX.XXX:6379. Network is unreachable

Here the XXXX is the IP of the Redus cluster in Account A. This is strange since the redis-py also connects to the same IP and port, this error should not exist.

Brandebrandea answered 2/7, 2021 at 11:29 Comment(9)

try these steps : docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/… – Neddy 5/7, 2021 at 7:23

@Neddy this was not possible because of VPCs in Account-A and Account-B had the same CIDR range. Peered VPCs can’t have the same CIDR range. – Brandebrandea 5/7, 2021 at 7:33

@AbhishekPatil Seems like your Redis data port is 6379, and you probably added that as a rule in VPC. Did you also allow port 16379? Which is the redis bus port -- calculated by adding 1000 + your data port. I'm really interested in knowing if this works out for you. – Chappell 6/7, 2021 at 0:18

How I inferred this? When I read TTL exhausted, it seemed to me node communication is not even made. Upon a quick search I found this: The TTL exhausted error is a very generic error that usually happens if the networks goes down or the client can't get in touch with the node it tries to talk to. source So it seems that node-to-node communication is down. Because bus port is not opened in VPC rules. – Chappell 6/7, 2021 at 0:27

@Chappell I do not know if the port is the issue the same IP and port is access by redis-py but not with redis-py-cluster. However just to be sure I will try your solution as well. – Brandebrandea 6/7, 2021 at 3:5

@AbhishekPatil it's correct what you are doing, keep doing it. We need redis data port 6379 open but I mean also redis bus port 16379 is required to be opened as well. – Chappell 6/7, 2021 at 8:32

@AbhishekPatil Did it work eventually? – Chappell 7/7, 2021 at 13:22

@om-ha, unfortunately, no, I tried to follow the same steps mentioned in the base steps for 6379, So we had the configuration for both client and node data bus port at the same time. But for the NLB health check, the 16379 port is showing unhealthy. – Brandebrandea 7/7, 2021 at 13:53

@Chappell maybe it's because my STG Redis cluster has only one not at this moment. Do I have to manually open from somewhere? Elasticache is an AWS service so I do not know if AWS will allow us to do this or not. – Brandebrandea 7/7, 2021 at 13:56

So turns out the issue was due to how redis-py-cluster manages host and port.

When a new redis-py-cluster object is created it gets a list of host IPs from the Redis server(i.e. Redis cluster host IPs form account A), after which the client tries to connect to the new host and ports.

In normal cases, it works as the initial host and the IP from the response are one and the same.(i.e. the host and port added at the time of object creation)

In our case, the object creation host and port are obtained from the DNS name from the Endpoint service of Account B.

It leads to the code trying to access the actual IP from account A instead of the DNS name from account B.

The issue was resolved using Host port remapping, here we bound the IP returned from the Redis server from Account A with IP Of Account B's endpoints services DNA name.

Brandebrandea answered 11/7, 2021 at 16:33 Comment(2)

Glad you were able to resolve this, thanks for sharing the valuable insight. Did you actually need to map/open the redis bus port as I said or that was managed automatically by AWS? (sources for cluster redis bus port are one and two and three) – Chappell 15/7, 2021 at 20:46

@Chappell Redis bus port was managed automatically by AWS – Brandebrandea 16/7, 2021 at 9:30

-1

Based on your comment:

this was not possible because of VPCs in Account-A and Account-B had the same CIDR range. Peered VPCs can’t have the same CIDR range.

I think what you are looking for is impossible. Routing within a VPC always happens first - it happens before any route tables are considered at all. Said another way, if the destination of the packet lies within the sending VPC it will never leave that VPC because AWS will try routing it within its own VPC, even if the IP isn't in use at that time in the VPC.

So, if you are trying to communicate with a another VPC which has the same IP range as yours, even if you specifically put a route to egress traffic to a different IP (but in the same range), the rule will be silently ignored and AWS will try to deliver the packet in the originating VPC, which seems like it is not what you are trying to accomplish.

Cosby answered 8/7, 2021 at 19:59 Comment(2)

thank you for the answer. I would like to point out that it's not impossible. It's already working for redis-py. It's Just not working for redis-py-cluster. – Brandebrandea 9/7, 2021 at 5:21

You should clarify your question and include networking details: VPC CIDRs, subnets, NACLs, security groups, and how the VPCs communicate with each other. From what you have written, this appears to be a networking problem. Your comment about overlapping CIDRs would definitely be a point of concern, but perhaps you are right and the additional information would clarify it. – Cosby 9/7, 2021 at 11:14

Recommended topics

Hot tags