Create MySQL and Postgres instances using AWS Cloudformation

Author:Murphy  |  View: 23091  |  Time: 2025-03-23 19:17:12
Photo by Yunqing Leo on Unsplash

With these two simple Cloudformation templates, we will learn how to deploy Postgres and MySQL Aurora database instances with just one command CLI command. I have optimised template files reducing the number of properties to a minimum so it's easier to comprehend.

I know that Infrastructure as Code might seem a little bit hard for intermediate users who are not familiar with that concept but believe me this is the right way to go. Career-wise it will bring a lot of benefits.

Who is this tutorial for?

This story is for beginner and intermediate-level data and software engineers who would like to get into data engineering and learn a few advanced things, such as infrastructure as code.

I hope this tutorial will be useful for everyone involved in software engineering where a cloud database is required to store your application data.

Prerequisites

  • Basic knowledge of Bash scripting Bash/Shell scripting might seem a bit advanced for beginners but it is necessary to deploy AWS services, i.e. RDS.

  • AWS Command line tool installed (AWS CLI). All shell commands provided in this article can be run either from the command line or as a .sh script, i.e. simply run a file from your command line like so: $ ./deploy.sh

  • AWS account. This is free to register and there is a free tier so it shouldn't have any significant charges unless you deploy a really big database and forget to delete it.

Why Infrastructure as code (IAC)?

IAC is gaining popularity very quickly as it helps to manage cloud platform resources using template files (JSON or yaml code) consistently in a reproducible way. We can apply CI/CD pipelines and use Github integration to create production and staging environments for example. Infrastructure as code is a deployment and resource management standard these days and I previously wrote about it here:

Infrastructure as Code for Beginners

That story explained how to deploy and provision AWS resources for our data lake, i.e. AWS S3 and AWS Lambda.

Data-wise, Infrastructure as code is a brilliant way to separate data environments and provision Data Pipeline resources. Using it we can create any data pipeline we need.

Why and when do we use Relational databases in data pipelines?

Typically, relational databases consist of row-based tables with columns that connect related data pieces using normalised schema.

This data architecture type aims to capture the data quickly and optimise the system to swiftly retrieve the most recent of it required by the application.

Because RDS database tables and joins are "normalised," they are more complex compared to a typical data warehouse design. However, this is a trade-off required by functional requirements.

Hence, the main difference between a "conventional database" and a "data warehouse" is that the former is created and optimised to "record" data, whilst the latter is created and built to "react to analytics". I previously wrote about it here:

Data Platform Architecture Types

Popular relational databases are MySQL, Microsoft SQL Server, Oracle and PostgreSQL.

Create MySQL Aurora instance

To create an Aurora-engined database instance we would need a cluster first. The thing is that Aurora instances have to be associated with a AWS::RDS::DBCluster via DBClusterIdentifier. If we have no cluster in our stack file we will be getting some generic Cloudfomation errors that don't make much sense.

You can use this Cloudformation template file:

If everything is correct then in AWS Console you will see something like this:

Cloudformation events. Image by author

Create a Postgres database instance in AWS

To create a simple Postges instance we can use this command and the template below:

And the template would be this:

So as you can see, Postges is a bit easier as it doesn't require a cluster.

Postgres instance created. Image by author.

What if something goes wrong?

We can always take a look at the events section. Then Google it. I know, I know… AWS Cloudformation documentation is not perfect. This is life and we have to comply. Unfortunately.

For example, if something goes wrong and you bump into an error like:

"DS does not support creating a DB instance with the following combination: DBInstanceClass=db.t2.micro, Engine=postgres, EngineVersion=13.3…"

You will have to find the supported engine and version 13.3 for your region. To do so you can use this bash command:

aws rds describe-orderable-db-instance-options --engine Postgres --engine-version 13.3     --query "*[].{DBInstanceClass:DBInstanceClass,StorageType:StorageType}|[?StorageType=='gp2']|[].{DBInstanceClass:DBInstanceClass}"  --output texts

Best practices to deploy RDS instances

The templates we used above are very simple. This is intentional so we can deploy them successfully with ease and start adding new features from there.

What else can we do?

Keep passwords in Secrets Manager

This is a really good idea so we would probably want to change our password property in the template to this:

  RootUser:
    Type: 'AWS::SecretsManager::Secret'
    Properties:
      Name: '/rds/EnvType/RootUser-secret'
      Description: 'Aurora database master user'
      SecretString: !Ref DBUser

And then we can use it in our template like so:

MasterUsername: !Join ['', ['{{resolve:secretsmanager:', !Ref RootUser, '}}']]

We can do the same thing with database password:

RootPassword:
    Type: 'AWS::SecretsManager::Secret'
    Properties:
      Name: '/rds/EnvType/RootPassword-secret'
      Description: 'Aurora database master password'
      GenerateSecretString:
        PasswordLength: 16
        ExcludePunctuation: true

And then refer to it from the template:

RootUserPassword: !Join ['', ['{{resolve:secretsmanager:', !Ref RootPassword, '}}']]

What is RDSDBClusterParameterGroup?

It has a Type: ‘AWS::RDS::DBClusterParameterGroup'

In this attribute, we specify the name of the parameter group for the database cluster that will be connected to it. By doing so we associate our database instances and the Aurora database cluster.

We then reference it in AWS::RDS::DBCluster resource, i.e.:

Resources:
  RDSCluster: 
    Properties: 
      DBClusterParameterGroupName: 
        Ref: RDSDBClusterParameterGroup

This is important because every time you need to update a parameter in the future, you must build a new parameter group and link it to the cluster.

To implement the modification, the primary database instance in the cluster has to be restarted.

Deletion protection

It's always a good idea to set our database cluster DeletionProtection: true

This prevents the accedental deletion of an RDS instance. It also prevents any instance replacements if we change some template file properties that would normally require resource replacements.

DBInstanceClass

This is an important one as it has a direct impact on costs related to our database. It goes under Type: ‘AWS::RDS::DBInstance' in the template.

Always choose the smallest one for testing purposes. Search fordb.r6g.largein your template and change to db.t2.small or db.t3.micro

DBInstanceClass: db.t2.small

Also if we don't need a big storage for the database in staging is always a good idea to provision only the required minimum:

AllocatedStorage: '20'
DBInstanceClass: db.t2.small  

Provision VPC resources in a separate stack template

We probably would want to deploy our database cluster in the Virtual Private Cloud (VPC). It's a good idea to create a separate stack for these types of resources and then use it in our database stack like so:

Resources:
  RDSCluster: 
    Properties: 
      DBClusterParameterGroupName: 
        Ref: RDSDBClusterParameterGroup
      # DBSubnetGroupName: 
      #   Ref: DBSubnetGroup
      Engine: aurora
      MasterUserPassword: 
        Ref: DBPassword
      MasterUsername: 
        Ref: DBUser
    Type: "AWS::RDS::DBCluster"

DBInstanceIdentifier

The database instance is identified by this name. Amazon CloudFormation will create a special default physical ID and uses it as the name if you don't provide one.

AWS::RDS::DBParameterGroup

We should create this resource type for the same reason we created it for 'AWS::RDS::DBClusterParameterGroup'

It might be problematic for a production database to associate a new database parameter group later since doing so requires a database restart.

We also might want to create a different AWS::RDS::DBParameterGroup for write and replica instances, i.e.

  WriterRDSDBParameterGroup:
    Type: 'AWS::RDS::DBParameterGroup'
    Properties:
      Description: Custom db parameter group for writer
      Family: aurora5.6
      Parameters:
        sql_mode: STRICT_ALL_TABLES
        transaction_isolation: READ-COMMITTED
  RDSDBParameterGroup:
    Type: 'AWS::RDS::DBParameterGroup'
    Properties:
      Description: CloudFormation Sample Aurora Parameter Group
      Family: aurora5.6
      Parameters:
        sql_mode: IGNORE_SPACE
        max_allowed_packet: 1024
        innodb_buffer_pool_size: '{DBInstanceClassMemory*3/4}'

Database monitoring

It is highly recommended to set up database monitoring during the initial setup. Many metrics, both at the cluster and database levels, are sent by default by Aurora to CloudWatch. So we can subscribe to RDS events by using AWS::RDS::EventSubscription resource.

ClusterEventSubscription:
    Type: 'AWS::RDS::EventSubscription'
    Properties:
      EventCategories:
        - configuration change
        - creation
        - failure
        - deletion
        - failover
        - maintenance
        - notification
      SnsTopicArn: arn:aws:sns:us-east-1:09876543:example-topic 
      SourceIds:
        - !Ref myAuroraDBCluster
      SourceType: db-cluster
      Enabled: true

After that we can create Cloudwatch metric filter and set Cloudwatch alarms like so:

DBCPUUtilizationHighAlarm:
    Type: 'AWS::CloudWatch::Alarm'
    Properties:
      AlarmDescription: 'CPU utilization over last 5 minutes higher than 95%'
      Namespace: 'AWS/RDS'
      MetricName: CPUUtilization
      Statistic: Average  
      Period: 60
      EvaluationPeriods: 5
      ComparisonOperator: GreaterThanThreshold
      Threshold: 95
      TreatMissingData: breaching
      AlarmActions: 
       - arn:aws:sns:us-east-1:9876543:example-topic
      OKActions:
       - arn:aws:sns:us-east-1:9876543:example-topic
      Dimensions:
      - Name: DBInstanceIdentifier
        Value: !Ref myDatabaseInstance

EnablePerformanceInsights

It's a helpful tool for gaining an understanding about database performance.

It has no extra charges if enabled to keep data for 7 days only.

EnablePerformanceInsights: true

This feature has to be enabled separately on both primary and replica instances.

Conclusion

In this story, you can find two simplified Cloudformation templates to deploy Postgres and MySQL database instances in the cloud. I hope you will find best practices covered in this article useful and it will help you to develop those templates into something bigger that suits your needs.

RDS databases are crucial for any data platform design and it is important to understand how they are being created. They are used in many data pipeline design patterns, i.e. batch and change data capture:

Data pipeline design patterns

Applying some of the features mentioned in best practices might be useful to gain database performance insights as well.

Recommended read:

Data pipeline design patterns

AWS::SecretsManager::Secret

Working with parameter groups

AWS::RDS::EventSubscription

Amazon RDS Performance Insights | Cloud Relational Database | Amazon Web Services

Tags: AWS Data Engineering Data Pipeline MySQL Postgres

Comment