Does AWS load-balance connections between master and read-replica DB
Asked Answered
F

4

11

I created a read-replica for an RDS instance, Does AWS handles the load-balancing automatically or it needs another configuration or implementation in the code ? and if so Is there a reference for that ?

Edit for more clarification When I create a read-replica how it's going to work and handle requests ? I'm asking because it gives me an endpoint, so there must be a sort of implementation for it work properly right? (alongside the master RDS)

Fraenum answered 25/7, 2019 at 14:9 Comment(4)
There is no magic here. You need to use endpoint of read-replica in connection string when you fire a query that will only read. otherwise you use master node endpoint in connection string to update the data.Parts
Note that what you are probably asking about is actually called read/write splitting and not load balancing.Boraginaceous
@Michael-sqlbot it sounds like OP is asking about balancing the load between different servers in the read cluster, not splitting reads and writes based on server roleAppositive
@AaronNewman I assume the question is about read/write splitting, since it says "created a read replica" (as opposed to some number of read replicas).Boraginaceous
T
2

Amazon provides connection load balancing for RDS Aurora database clusters, via a specific load-balanced endpoint. For any other RDS database configuration you will have to handle the connections yourself.

Edit for more clarification When I create a read-replica how it's going to work and handle requests ? I'm asking because it gives me an endpoint, so there must be a sort of implementation for it work properly right? (alongside the master RDS)

At that point you just have 2 database servers, and each server has a separate endpoint URL that you can connect to. Amazon is keeping the data between those servers in sync for you, that's all. There is no load balancing happening here. You seem to be making an assumption that there is more going on here than there actually is. If you are using an RDS database engine other than Aurora, you will have to implement the load balancing yourself.

Tapley answered 25/7, 2019 at 15:22 Comment(0)
U
5

Unfortunately, load balancing of Aurora cluster is quite limited. First of all, it balances connection at the moment of connection being established and does not re-balance them under any circumstances, unless the connection is dropped and re-opened (but that has limitations too, see below).

Second, when Aurora does distribute the connections, it, however, does not take into account the number of connections already established to each read-replica.

Third, it does it using DNS switch process with TTL of 1 second. As result of all of the above, it is, unfortunately, very common case, when some replicas receive MUCH more connections, than others. Not to mention absence of mechanism of drying out instances for switching off etc.

Because all of the above limitations and highly unbalanced load, we've developed a fairlink JDBC driver (or rather envelop-driver) for MySQL and PostgreSQL. You can find it here: https://github.com/DiceTechnology/dice-fairlink

Unfeigned answered 4/12, 2019 at 10:50 Comment(0)
T
2

Amazon provides connection load balancing for RDS Aurora database clusters, via a specific load-balanced endpoint. For any other RDS database configuration you will have to handle the connections yourself.

Edit for more clarification When I create a read-replica how it's going to work and handle requests ? I'm asking because it gives me an endpoint, so there must be a sort of implementation for it work properly right? (alongside the master RDS)

At that point you just have 2 database servers, and each server has a separate endpoint URL that you can connect to. Amazon is keeping the data between those servers in sync for you, that's all. There is no load balancing happening here. You seem to be making an assumption that there is more going on here than there actually is. If you are using an RDS database engine other than Aurora, you will have to implement the load balancing yourself.

Tapley answered 25/7, 2019 at 15:22 Comment(0)
O
0

AWS handles RDS failover automatically, not load balancing on replication. You can configure how the load balancing between your read replicas should work using:

Okelley answered 25/7, 2019 at 14:22 Comment(1)
I'm not asking about scaling mechanisms or failovers, I edited the question for more clarificationFraenum
A
0

Andrey's answer above is the most correct - the 'load balancing' is just DNS randomly picking on of the servers in the cluster. This results in the connections being very lopsided most of the time, if you are using the RO-cluster endpoint in conjunction with connection pooling, which most people are.

I recently opened a ticket with AWS support center and was pointed towards this 'smart driver' solution. I think the original post was looking for a way to keep the connections load-balanced.

There is a table in Aurora INFORMATION_SCHEMA that you can use to get the servers in your readonly cluster. Then you can make a connection string with those servers, and do the round-robin yourself.

EDIT: I updated my 1-line sql with something similar to the C# function we use to implement this. It returns the updated connection string.

public static string UpdateReadOnlyConnectionString()
{
    List<string> hostList = new List<string>();
    string connectionStringOrig = "Data Source=cluster-ro-endpoint;Initial Catalog=mydb;";
    builder = new MySqlConnectionStringBuilder(connectionStringOrig);
    // MySqlHelper.GetPreparedStatement gets a command with the main RDS cluster connections string.
    using (MySqlCommand cmd = MySqlHelper.GetPreparedStatement("SELECT rhs.SERVER_ID FROM INFORMATION_SCHEMA.REPLICA_HOST_STATUS rhs " +
        "WHERE rhs.SESSION_ID <> 'MASTER_SESSION_ID'"))
    {
        // simplifying from my wrapper classes, you get the idea
        using (MySqlConnector.MySqlDataReader reader = cmd.ExecuteReader())
        {
            while (reader.Read())
            {
                string host = reader.GetString("SERVER_ID");
                // this is a map between the SERVER_ID and the endpoint
                string server = AuroraEndpoints[host].ServerEndpoint;
                hostList.Add(server);
            }
        }
    }
    builder.Server = string.Join(",", hostList);
    return builder.ConnectionString;    
}

this will give you a live snapshot of the servers in the read-only cluster. We are using MySqlConnector drivers that will take a list of endpoints in the connection string and balance connections between those. If your driver doesn't have this capability, you can just pick one at random for each connection. You will need to either construct the endpoints from the instance ID or map them manually, and use them in the connection string instead of the ro cluster endpoint (you still use the primary cluster endpoint for the writer node).

You need to continue to monitor the REPLICA_HOST_STATUS table. If the set of readers changes due to a failover, or any other reason, we clear the connection pool so any new commands get a connection from the current set of readers.

I have tried this and it does result in the number of connections being balanced. It does not balance the actual load, but in our case the number of web connections follow the load. You could use the IOS or CPU columns in that same table to add true performance load balancing, to taste.

I realize this is 5 years after OP question, but this seems the most relevant and recent post that I can find, and I was not able to find this solution easily until AWS pointed me to it. I also found these similar questions, older and not as concisely asked: Amazon Aurora DB Cluster Not Auto Balancing Correctly and AWS Aurora - load balance read operation between read replica and writer instance.

Appositive answered 6/2 at 22:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.