Deploying LLM Apps to AWS, the Open-Source Self-Service Way

Author:Murphy | View: 22191 | Time: 2025-03-22 23:26:38

Image generated by DALL-E 3 by the author

LLM applications, when developed to use third-party hosted LLMs such as OpenAI, do not require MLOps overhead. Such containerized LLM-powered apps or microservices can be deployed with DevOps practices. In this article, let's explore how to deploy our LLM app to a cloud provider such as AWS, fully automated with infrastructure and application pipelines. LlamaIndex has a readily made RAGs chatbot for the community. Let's use RAGs as the sample app to deploy.

IaC Self-Service

IaC, short for Infrastructure as Code, automates infrastructure provisioning, ensuring that configurations are consistent and repeatable. There are many tools to accomplish IaC. We will focus on HashiCorp's Terraform in this article, mainly because Terraform is cloud-agnostic.

The primary purpose of IaC self-service is to empower developers with more access, control, and ownership over their pipelines to boost productivity.

For those interested, I wrote a 5-part series on DevOps self-service model about a year ago to detail all aspects related to a DevOps self-service model.

High-Level Deployment Diagram

There are many different options for deploying a containerized application to AWS. ECS fargate stands out for a few good reasons:

Serverless computing for containers, no server management required
Increased Elasticity and Scalability
Simplified deployments

We first flesh out the high-level deployment diagram for our RAGs app.

To deploy RAGs into AWS, we need pipelines.

Overview of Pipelines

Let's first explore the self-service pipeline architecture based on a 3–2–1 rule (a term coined by me):

3 types of source code: terraform code, app source code, and GitHub Actions workflow code.
2 types of pipelines: infrastructure pipeline and application pipeline.
1 pipeline integration glue: GitHub secrets creation automation.

Let's break it down.

Infrastructure Pipeline

We use Terraform core features such as terraform init, terraform plan, terraform apply, etc., in our infrastructure pipeline; see the diagram below. Despite the license change for Terraform in August 2023, Terraform's core features remain open source.

We add a few steps before terraform init:

Harden Runner for workflow security
Infracost for cloud cost management
TFLint for linting
Checkov for static IaC code analysis.

For more details on these tools, check out my article on pipeline security and guardrails.

Do we need to write our IaC code in Terraform from scratch? No. We turn to the well-known open-source Terraform reusable modules, terraform-aws-modules.

terraform-aws-modules

[terraform-aws-modules](https://github.com/terraform-aws-modules) is a diverse collection of pre-built, reusable, and open-source Terraform modules designed explicitly for managing resources on AWS. Led by Anton Babenko, terraform-aws-modules has 57 modules so far! These modules aim to simplify and automate infrastructure provisioning on AWS, standardize best practices, and allow you to focus on writing less infrastructure code and achieving faster deployments.

For GCP, there is terraform-google-modules, and for Azure, there is Azure-Verified-Modules.

You can write your own reusable module if you so choose, but these open-source reusable modules are community-supported and tested. They are great to jump-start your infrastructure pipeline development.

For our RAGs app, assuming we will deploy it in a new AWS account, we are going to choose the following modules from terraform-aws-modules at the bare minimum. I say "bare minimum" because you can add many other resources to this stack depending on your project needs, such as for authentication/authorization, etc. But for this POC demo app, we will stick with the minimum requirements as the article aims to demonstrate the self-service model and showcase the open-source IaC reusable modules. Mastering both ingredients lets you pick and choose the reusable modules to provision additional resources based on your project requirements.

terraform-aws-vpc: the networking module provisioning new VPC, public/private subnets, internet gateway, NAT gateway, route tables, etc.
terraform-aws-s3-bucket: S3 bucket for our ALB logs.
terraform-aws-alb: the application load balancer (ALB) for our ECS cluster.
terraform-aws-ecs: the elastic container service (ECS) fargate instance to which we will deploy RAGs.
terraform-aws-ecr: the elastic container registry (ECR) housing the docker image for our app.

Implementation Prerequisites

Configure OpenID Connect (OIDC) in AWS: We will use GitHub Actions workflows to kick off Terraform modules for infrastructure provisioning. OIDC allows our GitHub Actions workflows to access AWS without storing the AWS credentials on the GitHub side. GitHub has detailed instructions on how to configure OIDC in AWS. Keep in mind this step only needs to be done once per AWS account.
Terraform remote state management: The state of infrastructure is a crucial part of Terraform's operations, as it maps real-world resources to our configuration, tracks metadata, and improves performance for large infrastructures. Terraform remote state allows users to store the state of their infrastructure in a remote data store for centralization, security, consistency, and other benefits. Again, this step only needs to be done once per AWS account. I have developed a Terraform reusable module to handle remote state management through S3 bucket and DynamoDB for state locking. The source code is located in my GitHub repo. To kick it off, you can use a GitHub Actions workflow like my sample workflow. For those unfamiliar with GitHub Actions, see more in the "Application Pipelines" section below.

Step 1: Create GitHub environments

GitHub environments play an essential role in our pipelines as they can store secrets/variables at three levels: environment, repository, and organization. These secrets/variables can be passed through the pipelines during infrastructure provisioning or application CI/CD to aid pipeline operations.

For our RAGs app, let's create a GitHub environment named dev and create two environment variables: ROLE_TO_ASSUME for the application pipeline and TERRAFORM_ROLE_TO_ASSUME for the infrastructure pipeline, with their values pointing to their respective IAM role's ARN, assuming you already created the IAM roles by following instructions in the prerequisites section above. We use two different roles here so they can have different permissions assigned. Please note, you need to have admin rights to see the "Settings" tab in your repo.

Under the same "Settings" tab, we create a few secrets at the repository level, which means they can be applied to different environments for the same app.

NPM_TOKEN: You need this token to call Terraform reusable module(s) as the application doesn't pass such credentials when calling Terraform reusable module(s). A token with repo scope is required for the calling app to connect to the repo where Terraform reusable modules reside. This is especially important if your repo is private.
PIPELINE_TOKEN: You need this token for Terraform to call the GitHub provider to auto-create GitHub secrets/variables such as ECS_CLUSTER, ECS_SERVICE, etc. based on the resources Terraform provisioned. Such automation of secrets/variables integrates the infrastructure pipeline with the application pipeline, making it a seamless transition between your infrastructure provisioning and your application's CI/CD. This token needs to have the repo and read:public_key scopes.
OPENAI_KEY: This is where you store your OpenAI API key. Stored as a secret here, it doesn't get leaked into your source code. We will explore further in the "Application Pipeline" section on how to retrieve this secret and pass in the CI pipeline.
INFRACOST_API_KEY: The API key for Infracost, an infrastructure cost management tool that can help us automate cloud cost management.

Step 2: Add infrastructure pipeline code

Finally, let's add our infrastructure pipeline code to our repo. See the files/folder below related to the infrastructure pipeline. The sample code can be found in my repo. For a detailed dive into why and how our Terraform code is structured this way, refer to my article on Terraform project structure.

The main.tf file is the main wrapper for the Terraform reusable modules. Depending on your stack, you could call one or multiple reusable modules in this file. For our RAGs app, we call five reusable modules to provision our infrastructure, as mentioned above in the terraform-aws-modules section.

For each reusable module in terraform-aws-modules, refer to that module's example code for usage patterns. Depending on your use case, you can pick either the simple or complete example and use it as the base for that reusable module in your main.tf.

You then parameterize the sample example code, externalizing certain variables to the terraform.tfvars file under the .env folder for the specific environment. For example, the values for CPU/memory for your prod environment most likely will be different from those used for your dev environment. So, CPU/memory are good candidates for parameterization.

Let's look at a few key points in the sample Terraform code for ECS provisioning below.

Line 174 is where we call the reusable module terraform-aws-ecs.
Line 177 onwards is where we pass variables such as cluster_name to the reusable module.

Depending on your use case, if you have many applications sharing similar AWS stacks, you could move most of the logic in main.tf into a centralized reusable modules repo to have another abstraction layer on top of the original terraform-aws-modules. This approach would allow further reusability of your IaC code, allowing the caller repos to have minimal IaC code for parameterization. My article on the Terraform project structure details the implementation of such a central repo holding reusable modules within an organization. Feel free to check it out.

Step 3: Add GitHub Actions workflow for infrastructure pipeline

I have created a reusable GitHub Actions workflow for Terraform provisioning, capturing steps such as workflow security, cloud cost management, IaC linting, scanning, and eventually, terraform init, plan, and apply. It serves as a sample workflow, and you are welcome to revise it according to your needs.

From our RAGs repo, we add terraform-aws.yml under .github/workflows directory. The key logic in this workflow is highlighted in red below:

permissions: it's important to specify id-token: write, as this is needed for the GitHub Actions workflow to authenticate to AWS using OIDC.
uses: this line calls the reusable workflow, saving us from duplicating the same logic from one workflow to another and from one repo to another.
secrets: inherit: this line allows us to carry the secrets/variables configured in our repo at the environment/repository/organization level to the reusable workflows in a different repo.

Step 4: Kick off the infrastructure pipeline

Now, all the ducks are in a row. Let's kick off our infrastructure pipeline by triggering the "Terraform AWS provisioning" workflow in the RAGs repo.

This workflow will provision the AWS resources in our dev environment. Once completed, pay attention to the output from the Terraform Apply step; see the screenshot below. We will use the alb_dns value to launch our RAGs app later. Note down this value.

Let's log into AWS and peek at the VPC resource map; see the screenshot below. The networking (VPC, subnets, route tables, etc.) has been successfully provisioned. Also, verify that our new ECS cluster, ECR, ALB are ready.

Application Pipeline (CI/CD)

Now that our infrastructure in AWS is ready for our app. Let's move on to building and deploying our app to the brand-new ECS cluster in AWS.

The diagram below lists the main steps in a CI (Continuous Integration) pipeline.

And our sample CD (Continuous Deployment) pipeline looks like this:

Step 1: Containerize the app if it's not yet containerized

We first need to add a Dockerfile to our RAGs repo to build the code into a Docker image and push it to the newly provisioned ECR in AWS. See the sample Dockerfile snippet below for our RAGs app.

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt requirements.txt
COPY . .

RUN pip install -r requirements.txt

EXPOSE 8501
CMD ["streamlit", "run", "1_         Tags:          DevOps Hands On Tutorials Llamaindex Llm Applications Terraform          
          
		  
		  Add Fav


      

      
      
      
      
      Murphy
      Add friends
      View space
      Message
      
      
      
      
Recommend
◦ Object Detection Basics – A Comprehensive Beginner's Guide (Part 1)
◦ Analyzing Humanitarian Data Unstructured Excel tables with ChatGPT Code Interpreter
◦ Cognitive Prompting in LLMs
◦ Training LLM, from Scratch, in Rust
◦ Learn RabbitMQ for Event-Driven Architecture (EDA)
◦ Cluster Analysis for Aspiring Data Scientists
◦ Simulation 106: Modeling Information Diffusion and Social Contagion with Networks
◦ Optimizing Transformer Models for Variable-Length Input Sequences
◦ Unlocking the Power of Interaction Terms in Linear Regression
◦ How Did Open Food Facts Fix OCR-Extracted Ingredients Using Open-Source LLMs?
◦ Understanding the Optimization Process Pipeline in Linear Programming
◦ How to Evaluate Your Predictions