Running a Stable Diffusion cluster on GCP with tensorflow-serving (Part 1)

Author:Murphy | View: 30012 | Time: 2025-03-23 19:31:10

In the first part of this two-part tutorial, we will learn to create a Kubernetes cluster that deploys a Stable Diffusion model on Gcp. Stable Diffusion (a form of generative AI) is the new cool kid on the block. Stable Diffusion allows us to generate realistic images from a given text prompt. Due to the novelty and computational load posed by the Stable Diffusion model, it provides invaluable opportunities to address some unique challenges.

Note: You can follow this tutorial end-to-end even if you're a free user (as long as you have some of free tier credit left).

Github: https://github.com/thushv89/tf-serving-gke/tree/master/infrastrcture

But to create the perfect storm (or the perfect product), having access to the latest version of the model weights won't cut it. It takes effort to create a reliable production system around your model to support the user requests and serve them reliably, with a reasonable latency.

Few examples of images I obtained from the deployed model. Can you guess the prompts? (Image by author)

To do that, we will learn how to run a stable diffusion model on a GKE cluster. This 2-part tutorial will consist of 4 parts:

Setting up the accounts & roles (Part 1)
Setting up the cluster (Part 1)
Deploying a prediction service in the provisioned cluster (Part 2)
Generating new images with the deployed endpoint (Part 2)

Before you get started, make sure you have created a GCP project and logged in to your user account via gcloud auth login . You can use gcloud config set project and gcloud config set region to make sure you are in the correct project and the region.

Note: Most of the IAM (Identity & Access Management) I'm talking through here is based on my (limited) personal experience on the subject. If you see anything that could be improved let me know!

terraform: Manage infrastructure in style

If you're already familiar with terraform , go ahead and jump to the section "Defining the accounts & roles (IAM)".

Overview

For all of the infrastructure setup on GCP, we'll be using Terraform; an IaaS (Infrastructure as a Service) tool, allowing us to codify all of our infrastructure requirements. Why manage cloud resources through code, rather than error-prone laborious manual operations, you may ask? There are many (other) reasons:

Code (written in a human readable fashion) makes it easier to understand the architecture, improves reusability, etc.
terraform automatically manages dependencies and perform operations in the correct order
Version controlled code provides you the ability to get a snapshot of the state of your system at a given point in time (for troubleshooting)

terraform provides a comprehensive out of the box API to build infrastructure quickly for all of the common provides such as GCP, AWS, Azure, etc.

terraform concepts

terraform parlance organizes code into configurations. A terraform configuration operates on a working directory, where it will have the configuration files ending with the extension .tf or .tf.json;

variables.tf – Contains all the variable definitions that are used by the configuration
outputs.tf – Any outputs that needs to be written out
Apart from these, you can include any number of .tf files containing resource definitions, providers, etc. In our simple scenario, we only need a single file, which we'll call main.tf .

Next, let's look at how terraform enables compartmentalizing of code.

terraform is a declarative language, meaning you tell terraform what to do (like SQL), not how to do it (like Python). It's up to terraform to build a plan (e.g. in the form of a graph) and execute it.

We can then compose our terraform configuration using modules. Modularizing is optional, however, it breaks complex infrastructure to logical components/sub-systems and greatly enhances reusability. In our case, we'll be defining three modules ;

Manages accounts and roles (modules/iam)
Manages the GKE cluster (modules/gke_cluster)
Manages storage – setting up the GCS bucket (modules/storage)

When you go into these modules in the code, you'll see the following basic building blocks used in harmony to reach the desired state of infrastructure we need (see the Appendix for specific examples).

Resource blocks – Describes infrastructure objects (e.g. VMs, a cluster, VPCs)
Data source / data block – represents a source of data (e.g. a file) and the data associated with it
Provider plugin – Provides access to resource types and data sources associated with a certain provider.
Input & output variables of the modules

Once you define your configuration, you can run terraform plan to see what terraform will be executing. Next terraform apply can be used to apply those changes. Once applied, terraform will record the changes made in a terraform.tfstate file. So if you want to make a change (or destroy), terraform is aware of the current state of your infrastructure, so it can create a plan for the required changes.

If you need further reinforcing of terraform concepts, you can read the documentation here or go through this GCP tutorial. Now that we understand the basics of terraform, let's move on to understanding the logic.

Defining the accounts & roles (IAM)

For our operations relating to setting up the GKE cluster, we'll be creating a service account. As the name suggests, a service account is typically used by applications and workloads, and not an actual person. For example, GKE nodes can use a service account to execute an application as. A service account can be assigned permissions and roles (i.e. a collection of permission collated in a meaningful way) just like a user account. Few advantages of service accounts are,

We can quickly bind/remove bindings of a user to a service account, allowing us to provide necessary permission to a user with assigning roles/permissions to individual users repetitively.
Service accounts can be setup with short lived credentials making it more secure.

We'll be setting up two service accounts with the following IDs:

gke-admin – Has the required permission to create a GKE cluster and provision nodes
gke-node – Has the necessary permission to execute a workload successfully (e.g. read from a GCS bucket)

Though a service account is not directly used by or attached to a person, one can impersonate a service account, allowing the user to execute commands just as it's the service account. This is the method we'll be using to setup the cluster.

High-level view of what the identities & resources look like (Image by author)

Here's the process will be outlining in our terraform code,

Create service accounts: gke-admin and gke-node
Assign the required roles for the created accounts
- gke-admin: container.admin(e.g. create cluster), compute.viewer (e.g. creating the node pool), iam.serviceAccountUser
- gke-node: container.nodeServiceAccount (permission for a typical Kubernetes workload)
- You can have a look in GCP console → IAM → Roles to learn what permission are provided by each role.
Assign required roles to the user account to create a short-lived access token (iam.serviceAccountTokenCreator)
Create a binding from the user account to the service account so that the user can impersonate the service account

Finally, we will declare the names of the two service accounts we created as outputs (in outputs.tf), so they can be referenced by the configuration and other child modules.

To provision the infrastructure, we'll be using two forms of authentication,

Typical authentication you obtain by running gcloud auth login which will be used to create the service accounts and the bindings.
After that, we'll use impersonate the service account set up the cluster

Note 1: I have the owner role (i.e. project owner) attached to my user account, if you don't, you'd need the permission required to create a service account, etc.

Note 2: It might seem redundant to do all of these service account creation and bindings when you have the owner role, however, when working on a project, where you're collaborating with a team (or in an organization), you'll need to think and set up permission with a least-privileged user mindset, to avoid security exploitation.

We won't run terraform apply yet, as we'll be creating the service accounts and the GKE cluster all at once.

Defining the GKE cluster

We'll be creating a GKE cluster that can be setup by even a free-tier user. A cluster consists of a control plane and one or more worker nodes. The control plane provides access to the cluster, so that you can inspect nodes, pods, services, etc. Each node can run one or more pods (with a specific resource requirement – e.g. CPU/memory). A pod (which may run one or more containers) will be running a specified workload (e.g. tensorlfow-serving image to serve a model). You can refer here to learn about the GKE architecture.

High-level architecture of a GKE cluster (Image by author)

We'll be creating a cluster in the standard mode with:

machine_type: n2-standard-4
max_node_count(number of max nodes to provision): 2
preemptible : true (You could also use spot instances which are cheaper than preemptible instances. Learn the differences here.)

Note that we're not using GPUs, since as a free-user, you may not have the eligibility for a GPU quota. But if you do, feel free to follow the process here for setting up a node pool with GPUs.

Note 1: If you're a free-tier user, you will be restricted by two important quotas:

all_regions_cpus: defaults to 12

all_n2_cpus: defaults to 8

all_regions_gpus: defaults to 0

Since we're using the N2 type instances, each with 4 vCPUs, we can only spin up 2 of these instances withing the quota. You can experiment with other instances such as n2-standard-2 or n1 instances if you'd like to have more nodes in the cluster.

Note 2: These a global quotas, meaning, if you, for example, have a Vertext AI notebook with another n2 type instance started, it will count towards this quota as well.

If you don't respect these you'll run into Quota exceeded type of errors, when you apply these infrastructure on terraform.

You can read the full configuration in here. I won't go into the details here as it's straightforward. However, one caveat I'd like to raise here is that, the notion of regional and zonal clusters. Ignoring this distinction can lead to somewhat mysterious errors such as this Stackoverflow question.

Creating the infrastructure & resources on GCP

We have some housekeeping to do before we apply the discussed terraform changes. First, run

./setup.sh -u  -p  -r

This will create a config file that has the defined arguments so they can be imported into terraform code. Next, run,

terraform init

This will install the provide plugin as well as the local modules we have defined. Following this, we can run the following command to learn what terraform would be doing.

terraform plan [-var="include_module_storage="]

The plan would look like this.

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  + create
 <= read (data resources)

Terraform will perform the following actions:

  # data.google_service_account_access_token.default will be read during apply
  # (config refers to values not yet known)
 <= data "google_service_account_access_token" "default" {
      + access_token           = (sensitive value)
      + id                     = (known after apply)
      + scopes                 = [
          + "cloud-platform",
          + "userinfo-email",
        ]
      + target_service_account = (known after apply)
    }

  ...

  # module.iam.google_service_account_iam_binding.admin_account_iam will be created
  + resource "google_service_account_iam_binding" "admin_account_iam" {
      + etag               = (known after apply)
      + id                 = (known after apply)
      + members            = [
          + "user:[email protected]",
        ]
      + role               = "roles/iam.serviceAccountTokenCreator"
      + service_account_id = (known after apply)
    }

Plan: 9 to add, 0 to change, 0 to destroy.

If we're happy with the plan, we can run the command below to apply the changes.

terraform apply [-var="include_module_storage="]

If everything is successful, you should see a terraform.tfstate file appearing in your working directory, delineating all the changes that were applied. Visit the README here for detailed instructions. You can go to the GCP console → IAM → Service accounts and make sure the service accounts are created properly.

The created service accounts after applying the terraform transformation (Image by author)

You'll also see a cluster named sd-cluster in the GCP console → Kubernetes Engine → Clusters.

The cluster has been initialized with a single node (Image by author)

Once you go inside the cluster you can see more information about the node pool and nodes (Image by author)

Great, now we have everything we need to deploy our ML model as a service. We'll look at how we can do this in the next part of the tutorial.

So far you,

Learned what terraform is and how it can make infrastructure management easy
Created the identities (service accounts) and setting them up with correct roles
Understood what a GKE cluster is and created one by impersonating the required service account

Troubleshooting & Caveats

Error: Unable to connect to the server: x509: certificate has expired or is not yet valid
Solution 1: This can be due to gcloud session expiring. Simply run gcloud auth login and complete the login process.
Solution 2: There is a bug in WSL where the clock within WSL is out of sync with the Windows clock. You can run sudo hwclock -s to trigger the sync

Caveat: If you're using bash within Powershell (powered by WSL), you may not be able to export environment variables (to be used by terraform). So I'd recommend not using environment variables if you rely on that.

Appendix

Resource blocks

Describes one or more infrastructure objects (e.g. VMs, a cluster, VPCs). Each resource is identified by a resource type and a unique name.

resource "google_service_account" "sa_gke_admin" {
  account_id   = "gke-admin"
  display_name = "GKE Service Account (Admin)"
}

Data source / data block

Represents a source of data and the data associated with it

data "google_service_account_access_token" "default" {
 provider                = google.impersonation_helper
 target_service_account  = module.iam.service_account_gke_admin
 scopes                  = ["userinfo-email", "cloud-platform"]
 depends_on = [module.iam]
}

Provider plugin

Provides access to resource types and data sources associated with a certain provider.

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "3.5.0"
    }
  }
}

Input & output variables

Act as arguments and return types for modules.

variable "gcp_user" {
  type = string
  description = "Your username for GCP"
}

output "service_account_gke_node" {
  description = "GKE node service account"
  value       = google_service_account.sa_gke_node.email
}

variable "gcp_user" {
  type = string
  description = "Your username for GCP"
}

output "service_account_gke_node" {
  description = "GKE node service account"
  value       = google_service_account.sa_gke_node.email
}

Acknowledgement

I'd like to acknowledge the ML Developer Programs and the team for the GCP credits provided to make this tutorial a success.

Tags: Cloud Computing Gcp Machine Learning TensorFlow Terraform

Add Fav

Comment

Murphy

Recommend

◦ Customizing RStudio Container with Docker Compose

◦ Language Models and Spatial Reasoning: What's Good, What Is Still Terrible, and What Is Improv

◦ Should We Be Virtualizing Our Data Science Systems-or Not?

◦ Tool Use, Agents, and the Voyager Paper

◦ Start a New Year of Learning on the Right Foot

◦ Prediction Performance Drift: The Other Side of the Coin

◦ Databases and Data Modelling – A Quick Crash Course

◦ The Price of Gold: Is Olympic Success Reserved for the Wealthy?

◦ Precisely Compare Geographical Regions with GeoPandas

◦ Can Transformers Learn to Strategize?

◦ Evaluating Train-Test Split Strategies in Machine Learning: Beyond the Basics

◦ Analyze performance when aggregating data in Power BI and DAX Queries