Create MySQL and Postgres instances using AWS Cloudformation

With these two simple Cloudformation templates, we will learn how to deploy Postgres and MySQL Aurora database instances with just one command CLI command. I have optimised template files reducing the number of properties to a minimum so it's easier to comprehend.
I know that Infrastructure as Code might seem a little bit hard for intermediate users who are not familiar with that concept but believe me this is the right way to go. Career-wise it will bring a lot of benefits.
Who is this tutorial for?
This story is for beginner and intermediate-level data and software engineers who would like to get into data engineering and learn a few advanced things, such as infrastructure as code.
I hope this tutorial will be useful for everyone involved in software engineering where a cloud database is required to store your application data.
Prerequisites
-
Basic knowledge of Bash scripting Bash/Shell scripting might seem a bit advanced for beginners but it is necessary to deploy AWS services, i.e. RDS.
-
AWS Command line tool installed (AWS CLI). All shell commands provided in this article can be run either from the command line or as a
.sh
script, i.e. simply run a file from your command line like so:$ ./deploy.sh
-
AWS account. This is free to register and there is a free tier so it shouldn't have any significant charges unless you deploy a really big database and forget to delete it.
Why Infrastructure as code (IAC)?
IAC is gaining popularity very quickly as it helps to manage cloud platform resources using template files (JSON or yaml code) consistently in a reproducible way. We can apply CI/CD pipelines and use Github integration to create production and staging environments for example. Infrastructure as code is a deployment and resource management standard these days and I previously wrote about it here:
That story explained how to deploy and provision AWS resources for our data lake, i.e. AWS S3 and AWS Lambda.
Data-wise, Infrastructure as code is a brilliant way to separate data environments and provision Data Pipeline resources. Using it we can create any data pipeline we need.
Why and when do we use Relational databases in data pipelines?
Typically, relational databases consist of row-based tables with columns that connect related data pieces using normalised schema.
This data architecture type aims to capture the data quickly and optimise the system to swiftly retrieve the most recent of it required by the application.
Because RDS database tables and joins are "normalised," they are more complex compared to a typical data warehouse design. However, this is a trade-off required by functional requirements.
Hence, the main difference between a "conventional database" and a "data warehouse" is that the former is created and optimised to "record" data, whilst the latter is created and built to "react to analytics". I previously wrote about it here:
Popular relational databases are MySQL, Microsoft SQL Server, Oracle and PostgreSQL.
Create MySQL Aurora instance
To create an Aurora-engined database instance we would need a cluster first. The thing is that Aurora instances have to be associated with a AWS::RDS::DBCluster via DBClusterIdentifier. If we have no cluster in our stack file we will be getting some generic Cloudfomation errors that don't make much sense.
You can use this Cloudformation template file:
If everything is correct then in AWS Console you will see something like this:

Create a Postgres database instance in AWS
To create a simple Postges instance we can use this command and the template below:
And the template would be this:
So as you can see, Postges is a bit easier as it doesn't require a cluster.

What if something goes wrong?
We can always take a look at the events section. Then Google it. I know, I know… AWS Cloudformation documentation is not perfect. This is life and we have to comply. Unfortunately.
For example, if something goes wrong and you bump into an error like:
"DS does not support creating a DB instance with the following combination: DBInstanceClass=db.t2.micro, Engine=postgres, EngineVersion=13.3…"
You will have to find the supported engine and version 13.3 for your region. To do so you can use this bash command:
aws rds describe-orderable-db-instance-options --engine Postgres --engine-version 13.3 --query "*[].{DBInstanceClass:DBInstanceClass,StorageType:StorageType}|[?StorageType=='gp2']|[].{DBInstanceClass:DBInstanceClass}" --output texts
Best practices to deploy RDS instances
The templates we used above are very simple. This is intentional so we can deploy them successfully with ease and start adding new features from there.
What else can we do?
Keep passwords in Secrets Manager
This is a really good idea so we would probably want to change our password property in the template to this:
RootUser:
Type: 'AWS::SecretsManager::Secret'
Properties:
Name: '/rds/EnvType/RootUser-secret'
Description: 'Aurora database master user'
SecretString: !Ref DBUser
And then we can use it in our template like so:
MasterUsername: !Join ['', ['{{resolve:secretsmanager:', !Ref RootUser, '}}']]
We can do the same thing with database password:
RootPassword:
Type: 'AWS::SecretsManager::Secret'
Properties:
Name: '/rds/EnvType/RootPassword-secret'
Description: 'Aurora database master password'
GenerateSecretString:
PasswordLength: 16
ExcludePunctuation: true
And then refer to it from the template:
RootUserPassword: !Join ['', ['{{resolve:secretsmanager:', !Ref RootPassword, '}}']]
What is RDSDBClusterParameterGroup?
It has a Type: ‘AWS::RDS::DBClusterParameterGroup'
In this attribute, we specify the name of the parameter group for the database cluster that will be connected to it. By doing so we associate our database instances and the Aurora database cluster.
We then reference it in AWS::RDS::DBCluster resource, i.e.:
Resources:
RDSCluster:
Properties:
DBClusterParameterGroupName:
Ref: RDSDBClusterParameterGroup
This is important because every time you need to update a parameter in the future, you must build a new parameter group and link it to the cluster.
To implement the modification, the primary database instance in the cluster has to be restarted.
Deletion protection
It's always a good idea to set our database cluster DeletionProtection: true
This prevents the accedental deletion of an RDS instance. It also prevents any instance replacements if we change some template file properties that would normally require resource replacements.
DBInstanceClass
This is an important one as it has a direct impact on costs related to our database. It goes under Type: ‘AWS::RDS::DBInstance' in the template.
Always choose the smallest one for testing purposes. Search fordb.r6g.large
in your template and change to db.t2.small
or db.t3.micro
DBInstanceClass: db.t2.small
Also if we don't need a big storage for the database in staging
is always a good idea to provision only the required minimum:
AllocatedStorage: '20'
DBInstanceClass: db.t2.small
Provision VPC resources in a separate stack template
We probably would want to deploy our database cluster in the Virtual Private Cloud (VPC). It's a good idea to create a separate stack for these types of resources and then use it in our database stack like so:
Resources:
RDSCluster:
Properties:
DBClusterParameterGroupName:
Ref: RDSDBClusterParameterGroup
# DBSubnetGroupName:
# Ref: DBSubnetGroup
Engine: aurora
MasterUserPassword:
Ref: DBPassword
MasterUsername:
Ref: DBUser
Type: "AWS::RDS::DBCluster"
DBInstanceIdentifier
The database instance is identified by this name. Amazon CloudFormation will create a special default physical ID and uses it as the name if you don't provide one.
AWS::RDS::DBParameterGroup
We should create this resource type for the same reason we created it for 'AWS::RDS::DBClusterParameterGroup'
It might be problematic for a production database to associate a new database parameter group later since doing so requires a database restart.
We also might want to create a different AWS::RDS::DBParameterGroup for write and replica instances, i.e.
WriterRDSDBParameterGroup:
Type: 'AWS::RDS::DBParameterGroup'
Properties:
Description: Custom db parameter group for writer
Family: aurora5.6
Parameters:
sql_mode: STRICT_ALL_TABLES
transaction_isolation: READ-COMMITTED
RDSDBParameterGroup:
Type: 'AWS::RDS::DBParameterGroup'
Properties:
Description: CloudFormation Sample Aurora Parameter Group
Family: aurora5.6
Parameters:
sql_mode: IGNORE_SPACE
max_allowed_packet: 1024
innodb_buffer_pool_size: '{DBInstanceClassMemory*3/4}'
Database monitoring
It is highly recommended to set up database monitoring during the initial setup. Many metrics, both at the cluster and database levels, are sent by default by Aurora to CloudWatch. So we can subscribe to RDS events by using AWS::RDS::EventSubscription resource.
ClusterEventSubscription:
Type: 'AWS::RDS::EventSubscription'
Properties:
EventCategories:
- configuration change
- creation
- failure
- deletion
- failover
- maintenance
- notification
SnsTopicArn: arn:aws:sns:us-east-1:09876543:example-topic
SourceIds:
- !Ref myAuroraDBCluster
SourceType: db-cluster
Enabled: true
After that we can create Cloudwatch metric filter and set Cloudwatch alarms like so:
DBCPUUtilizationHighAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'CPU utilization over last 5 minutes higher than 95%'
Namespace: 'AWS/RDS'
MetricName: CPUUtilization
Statistic: Average
Period: 60
EvaluationPeriods: 5
ComparisonOperator: GreaterThanThreshold
Threshold: 95
TreatMissingData: breaching
AlarmActions:
- arn:aws:sns:us-east-1:9876543:example-topic
OKActions:
- arn:aws:sns:us-east-1:9876543:example-topic
Dimensions:
- Name: DBInstanceIdentifier
Value: !Ref myDatabaseInstance
EnablePerformanceInsights
It's a helpful tool for gaining an understanding about database performance.
It has no extra charges if enabled to keep data for 7 days only.
EnablePerformanceInsights: true
This feature has to be enabled separately on both primary and replica instances.
Conclusion
In this story, you can find two simplified Cloudformation templates to deploy Postgres and MySQL database instances in the cloud. I hope you will find best practices covered in this article useful and it will help you to develop those templates into something bigger that suits your needs.
RDS databases are crucial for any data platform design and it is important to understand how they are being created. They are used in many data pipeline design patterns, i.e. batch and change data capture:
Applying some of the features mentioned in best practices might be useful to gain database performance insights as well.
Recommended read:
Amazon RDS Performance Insights | Cloud Relational Database | Amazon Web Services