AWS DynamoDB vs RDS for Lambda serverless architecture
Asked Answered
C

2

6

I am part of a team currently developing a Proof of Concept architecture/application for a communication service between governmental offices and the public (narrowed down to the health-sector for now). The customer has specifically requested a mainly serverless approach through AWS services, and I am in need of advice for how to set up this architecture, namely the Lambda to Database relationship.

Roughly, the architecture would make use of API Gateway to handle requests, which would invoke different Lambdas, as micro-services, that access the DB.

The following image depicts a quick relationship schema. Basically, a Patient inputs a description of his Condition which forms the basis for a Case. That Case is handled during one or many Sessions by one or many Nurses that take Notes related to the Case. DB Schema (not enough reputation)

From my research, I've gathered that in the case of RDS, there is a trade-off between security (keeping the Lambdas outside of a public VPC containing an RDS instance, foregoing security best-practices, a no-no for public sector) and performance (putting the Lambda in a private VPC with an RDS instance, and incurring heavy cold-start times due to the provisioning of ENI). The cold-start times can however be negated by pinging them with CloudWatch, which may or may not be optimal.

In the case of DynamoDB, I am personally very inexperienced (more so than in MySQL) and unsure of whether the data is applicable to a NoSQL model. If it is, DynamoDB seems like the better approach. From my understanding though, NoSQL has less support for complex queries that involve JOINs etc. which might eliminate it as an option.

It feels as if SQL/RDS is more appropriate in terms of the data/relations, but DynamoDB gives less problems for Lambda/AWS services if a decent data model is found. So my question is, would it be preferable to go for a private RDS instance and try to negate the cold-starts by warming up the most critical Lambdas, or is there a NoSQL model that wouldn't cause headaches for complex queries, among other things? Am I missing any key aspects that could tip the scale?

Chromoprotein answered 15/2, 2019 at 19:59 Comment(2)
One important consideration to using DynamoDB is to layout your required access patterns and make the DynamoDB schema support these.Torray
If filtering is a key aspect of this project I would avoid dynamo IMO. If you look at dynamodb's documentation you'll see that there are various ways to filter data when you query it. They have some good examples to give you an idea of what it can do. However, it's not built for that purpose. It is also important to note that you'll need to rewrite your schema to fit a NoSQL db should you go that route. NoSQL typically doesn't require such a large breakdown of data, especially if you want to maintain some filtering and querying flexibility.Raper
F
5

Let's start by clearing up some rather drastic misconceptions on your part:

From my research, I've gathered that in the case of RDS, there is a trade-off between security (keeping the Lambdas outside of a public RDS instance, foregoing security best-practices, a no-no for public sector) and performance (putting the Lambda in a private RDS instance, and incurring heavy cold-start times). The cold-start times can however be negated by pinging them with CloudWatch, which may or may not be optimal

  1. RDS is a database server. You don't run anything inside or outside of it.
  2. You may be thinking of a VPC, or Virtual Private Cloud. This is an isolated network in which you can run your RDS instances and Lambdas.
  3. Running inside or outside of a VPC has no impact on cold start times. You pay the cold start penalty when AWS has to start a new container to run your Lambda. This can happen either because it hasn't been running recently, or because it needs to scale to meet concurrent requests. The actual cold start time will depend on your language: Java is significantly slower than Python, for example, because it needs to start the JVM and load classes before doing anything.

Now for your actual question

Basically, a Patient inputs a description of his Condition which forms the basis for a Case. That Case is handled during one or many Sessions by one or many Nurses that take Notes related to the Case.

This could be implemented in a NoSQL database such as DynamoDB. Without more information, I would probably make the Session the base document, using case ID as partition key and session ID as the sort key. If you don't understand what those terms mean, and how you would structure a document based around that key, then you probably shouldn't use DynamoDB.

A bigger reason to not use DynamoDB has to do with access patterns. Will you ever want to find all cases worked by a given nurse? Or related to a given patient? Those types of queries are what a relational database is designed for.

the case of DynamoDB, I am personally very inexperienced (more so than in MySQL)

Do you have anyone on your team who is familiar with NoSQL databases? If not, then I think you should stick with MySQL. You will have enough challenges learning how to use Lambda.

Frore answered 15/2, 2019 at 21:34 Comment(2)
"Running inside or outside of a VPC has no impact on cold start times." - Lambda running in VPC currently has very large impact on cold start times because it may have to provision ENI (Elastic Network Interface) which usually adds around 10 seconds to cold start. On last re:Invent Lambda team said that this will be fixed this year (ENI will be created during Lambda creation, not invocation).Guck
I realize I misphrased my post regarding how a VPC contains an RDS instance, and I've edited the post accordingly. Your point regarding the effect of language on cold-starts has already been taken into account, so good to have it confirmed. However, as @MarcinSucharski mentions, the ENI also has an effect, this post shows it in practice. However, your answer really highlights my lack of knowledge in DynamoDB/NoSQL, and if it's true that the ENI issue is getting fixed, then I'll probably go for MySQLChromoprotein
P
0

In terms of Lambda + VPC, there has been a new version released of Lambda Service to improve the ENI instantiation problem. Now instead of creating a single ENI for each function invocation, AWS creates a Hyperplane and reuses already existent ENIs. See more details here.

Also in terms of performance, I found this interesting article where the author ran a benchmark comparison between Lambda + RDS VPC, Lambda + RDS, Lambda + DynamoDB VPC, Lambda + DynamoDB. Although the performance improved a lot for Lambda VPC, using Lambda + RDS still incurs longer initialization time than Lambda + DynamoDB.

Pratique answered 5/12, 2023 at 21:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.