Local development and staging with Amazon Redshift
Asked Answered
T

6

43

I like to set up tools and services with production, staging, and local development. I'd like to use Amazon Redshift, and starting at $180 a month seems pretty reasonable for a columnar store database, but do I actually have to think about it as $180 x # of environments / month? Is there any way to have a free staging and local environment for Redshift?

It's also nice to be able to do development against a local instance rather than relying on the network. I assume that's not possible with Redshift.

What do you do to make local development easier, faster and cheaper when working with Redshift?

Taille answered 24/1, 2015 at 19:52 Comment(0)
O
29

Amazon Redshift was specifically created to run on AWS infrastructure. It is not available as a download. (Interestingly, Amazon DynamoDB does have a downloadable version for development purposes.)

The cheapest option might be to shutdown your Dev & Test instances each night and on weekends. Take a snapshot before deleting the cluster, then create a cluster the next morning based on the snapshot. This can be automated via the AWS Command-Line Interface (CLI), making it easy to schedule with cron or Scheduled Tasks.

You could also have a snapshot of Test data and restore that snapshot each morning, which means the test database doesn't fill-up with test cases.

Another cost saving might be to reduce the number of nodes for the non-production systems. Queries will run slower and the total amount of storage will be reduced, but it could be more cost-effective. Or even use a "Dense Storage" 2TB node instead of several "Dense Compute" SSD instances -- they will provide more storage on less nodes.

Odom answered 25/1, 2015 at 3:11 Comment(1)
One thing to note is while this is nice and all, it doesn't help me if I don't want to or cannot connect to AWS for development. For most of the larger services (SQS, Dynamo, S3, etc) I have a local development analogue.Junk
B
22

In addition to John Rotenstiens which lays out how to reduce costs if you have decided to run a second cluster for staging, there are some other options, for when your use case is non mission critical.

As Redshift is a fork of postgres 8, you can use the Amazon-provided postgresql 8.4 JDBC or ODBC drivers, and point them to a locally running postgres 8 instance. This works well during development, since what works here will usually work on your production system (there are some exceptions).

The other option is to have a separate table on your Redshift cluster to run non-production activities. This might good for you test suite and "final testing" development.

Then you can stage your deploy into production and monitor the staging environment for issues before the full deploy.

Brownell answered 3/2, 2015 at 4:16 Comment(0)
M
19

Another cost-cutting solution is to treat each database as an environment in a single cluster. Databases cost nothing, and you're allowed 60 of them in a cluster

We've tried the Postgres-as-emulator solution, and it's been kind of OK, but

  • The performance characteristics are radically different
  • It's easy to use Postgres features that are not in Redshift (or vice versa)
  • It's a pain to maintain a schema that has optional parts (indexes for one, sort keys for the other, for example).

We've backed away from that for the moment, though as we get larger we may have to use a hybrid solution where acceptance testing & staging are databases in Redshift, while developers go back to using Postgres.

Mide answered 6/10, 2016 at 17:10 Comment(1)
That's super helpful. Thanks!Taille
S
5

Here's the best alternative to access all AWS services locally, offline, without paying for cloud services. Localstack!

https://localstack.cloud/

https://github.com/localstack/localstack

All the major AWS services like Redshift, S3, DynamoDB, cloudwatch etc. are supported.

You can use this for all your non-prod environments and only pay for Production AWS services

Slipknot answered 23/12, 2018 at 7:33 Comment(2)
unfortunately though, the redshift service only mocks the redshift management endpoints (create cluster, etc...) and not the actual query engine. so you're still stuck using postgres locally (and all the drawbacks of doing so)Zealous
To simulate Redshift, check out the redshift-fake-driver project, which emulates Redshift (at least certain features only, such as UNLOAD and COPY) on top of PostgreSQL by translating Redshift specific commands on-the-fly in the JDBC driver itself. You can interface with the JDBC driver from Python as well using the JayDeBeApi package.Untruth
P
0

Redshift now has a serverless feature (currently in preview). Which allows you to pay as you go. You only pay wheb you are loading or querying data, so you also need to be sure if you have any automated processes that they do not run 24/7.

https://aws.amazon.com/redshift/redshift-serverless/

Parchment answered 9/6, 2022 at 10:57 Comment(0)
D
-1

[Amazon Redshift has announced serverless offering[(https://aws.amazon.com/redshift/redshift-serverless/). For DEV and QA environments, you can use serverless and pay only when you run queries/loads. There is no charge for idleness on serverless. Billing stops as soon as the query execution stops. This is a great way to save costs

Note: The minimum billing is 60 seconds

Deportment answered 16/8, 2023 at 12:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.