Running a Stable Diffusion cluster on GCP with tensorflow-serving (Part 1)

In the first part of this two-part tutorial, we will learn to create a Kubernetes cluster that deploys a Stable Diffusion model on Gcp. Stable Diffusion (a form of generative AI) is the new cool kid on the block. Stable Diffusion allows us to generate realistic images from a given text prompt. Due to the novelty and computational load posed by the Stable Diffusion model, it provides invaluable opportunities to address some unique challenges.
Note: You can follow this tutorial end-to-end even if you're a free user (as long as you have some of free tier credit left).
Github: https://github.com/thushv89/tf-serving-gke/tree/master/infrastrcture
But to create the perfect storm (or the perfect product), having access to the latest version of the model weights won't cut it. It takes effort to create a reliable production system around your model to support the user requests and serve them reliably, with a reasonable latency.

To do that, we will learn how to run a stable diffusion model on a GKE cluster. This 2-part tutorial will consist of 4 parts:
- Setting up the accounts & roles (Part 1)
- Setting up the cluster (Part 1)
- Deploying a prediction service in the provisioned cluster (Part 2)
- Generating new images with the deployed endpoint (Part 2)
Before you get started, make sure you have created a GCP project and logged in to your user account via gcloud auth login
. You can use gcloud config set project
and gcloud config set region
to make sure you are in the correct project and the region.
Note: Most of the IAM (Identity & Access Management) I'm talking through here is based on my (limited) personal experience on the subject. If you see anything that could be improved let me know!
terraform: Manage infrastructure in style
If you're already familiar with
terraform
, go ahead and jump to the section "Defining the accounts & roles (IAM)".
Overview
For all of the infrastructure setup on GCP, we'll be using Terraform
; an IaaS (Infrastructure as a Service) tool, allowing us to codify all of our infrastructure requirements. Why manage cloud resources through code, rather than error-prone laborious manual operations, you may ask? There are many (other) reasons:
- Code (written in a human readable fashion) makes it easier to understand the architecture, improves reusability, etc.
terraform
automatically manages dependencies and perform operations in the correct order- Version controlled code provides you the ability to get a snapshot of the state of your system at a given point in time (for troubleshooting)
terraform
provides a comprehensive out of the box API to build infrastructure quickly for all of the common provides such as GCP, AWS, Azure, etc.
terraform concepts
terraform
parlance organizes code into configurations. A terraform
configuration operates on a working directory, where it will have the configuration files ending with the extension .tf
or .tf.json
;
variables.tf
– Contains all the variable definitions that are used by the configurationoutputs.tf
– Any outputs that needs to be written out- Apart from these, you can include any number of
.tf
files containing resource definitions, providers, etc. In our simple scenario, we only need a single file, which we'll callmain.tf
.
Next, let's look at how terraform
enables compartmentalizing of code.
terraform
is a declarative language, meaning you tellterraform
what to do (like SQL), not how to do it (like Python). It's up toterraform
to build a plan (e.g. in the form of a graph) and execute it.
We can then compose our terraform
configuration using modules. Modularizing is optional, however, it breaks complex infrastructure to logical components/sub-systems and greatly enhances reusability. In our case, we'll be defining three modules ;
- Manages accounts and roles (
modules/iam
) - Manages the GKE cluster (
modules/gke_cluster
) - Manages storage – setting up the GCS bucket (
modules/storage
)
When you go into these modules in the code, you'll see the following basic building blocks used in harmony to reach the desired state of infrastructure we need (see the Appendix for specific examples).
- Resource blocks – Describes infrastructure objects (e.g. VMs, a cluster, VPCs)
- Data source / data block – represents a source of data (e.g. a file) and the data associated with it
- Provider plugin – Provides access to resource types and data sources associated with a certain provider.
- Input & output variables of the modules
Once you define your configuration, you can run terraform plan
to see what terraform
will be executing. Next terraform apply
can be used to apply those changes. Once applied, terraform
will record the changes made in a terraform.tfstate
file. So if you want to make a change (or destroy), terraform
is aware of the current state of your infrastructure, so it can create a plan for the required changes.
If you need further reinforcing of terraform
concepts, you can read the documentation here or go through this GCP tutorial. Now that we understand the basics of terraform
, let's move on to understanding the logic.
Defining the accounts & roles (IAM)
For our operations relating to setting up the GKE cluster, we'll be creating a service account. As the name suggests, a service account is typically used by applications and workloads, and not an actual person. For example, GKE nodes can use a service account to execute an application as. A service account can be assigned permissions and roles (i.e. a collection of permission collated in a meaningful way) just like a user account. Few advantages of service accounts are,
- We can quickly bind/remove bindings of a user to a service account, allowing us to provide necessary permission to a user with assigning roles/permissions to individual users repetitively.
- Service accounts can be setup with short lived credentials making it more secure.
We'll be setting up two service accounts with the following IDs:
gke-admin
– Has the required permission to create a GKE cluster and provision nodesgke-node
– Has the necessary permission to execute a workload successfully (e.g. read from a GCS bucket)
Though a service account is not directly used by or attached to a person, one can impersonate a service account, allowing the user to execute commands just as it's the service account. This is the method we'll be using to setup the cluster.

Here's the process will be outlining in our terraform
code,
- Create service accounts:
gke-admin
andgke-node
-
Assign the required roles for the created accounts
gke-admin
:container.admin
(e.g. create cluster),compute.viewer
(e.g. creating the node pool),iam.serviceAccountUser
gke-node
:container.nodeServiceAccount
(permission for a typical Kubernetes workload)- You can have a look in GCP console → IAM → Roles to learn what permission are provided by each role.
- Assign required roles to the user account to create a short-lived access token (
iam.serviceAccountTokenCreator
) - Create a binding from the user account to the service account so that the user can impersonate the service account
Finally, we will declare the names of the two service accounts we created as outputs (in outputs.tf
), so they can be referenced by the configuration and other child modules.
To provision the infrastructure, we'll be using two forms of authentication,
- Typical authentication you obtain by running
gcloud auth login
which will be used to create the service accounts and the bindings. - After that, we'll use impersonate the service account set up the cluster
Note 1: I have the
owner
role (i.e. project owner) attached to my user account, if you don't, you'd need the permission required to create a service account, etc.Note 2: It might seem redundant to do all of these service account creation and bindings when you have the owner role, however, when working on a project, where you're collaborating with a team (or in an organization), you'll need to think and set up permission with a least-privileged user mindset, to avoid security exploitation.
We won't run terraform apply
yet, as we'll be creating the service accounts and the GKE cluster all at once.
Defining the GKE cluster
We'll be creating a GKE cluster that can be setup by even a free-tier user. A cluster consists of a control plane and one or more worker nodes. The control plane provides access to the cluster, so that you can inspect nodes, pods, services, etc. Each node can run one or more pods (with a specific resource requirement – e.g. CPU/memory). A pod (which may run one or more containers) will be running a specified workload (e.g. tensorlfow-serving
image to serve a model). You can refer here to learn about the GKE architecture.

We'll be creating a cluster in the standard mode with:
machine_type
: n2-standard-4max_node_count
(number of max nodes to provision): 2preemptible
: true (You could also usespot
instances which are cheaper thanpreemptible
instances. Learn the differences here.)
Note that we're not using GPUs, since as a free-user, you may not have the eligibility for a GPU quota. But if you do, feel free to follow the process here for setting up a node pool with GPUs.
Note 1: If you're a free-tier user, you will be restricted by two important quotas:
all_regions_cpus
: defaults to12
all_n2_cpus
: defaults to8
all_regions_gpus
: defaults to0
Since we're using the N2 type instances, each with 4 vCPUs, we can only spin up 2 of these instances withing the quota. You can experiment with other instances such as
n2-standard-2
orn1
instances if you'd like to have more nodes in the cluster.Note 2: These a global quotas, meaning, if you, for example, have a Vertext AI notebook with another
n2
type instance started, it will count towards this quota as well.If you don't respect these you'll run into
Quota exceeded
type of errors, when you apply these infrastructure on terraform.
You can read the full configuration in here. I won't go into the details here as it's straightforward. However, one caveat I'd like to raise here is that, the notion of regional and zonal clusters. Ignoring this distinction can lead to somewhat mysterious errors such as this Stackoverflow question.
Creating the infrastructure & resources on GCP
We have some housekeeping to do before we apply the discussed terraform
changes. First, run
./setup.sh -u -p -r
This will create a config file that has the defined arguments so they can be imported into terraform
code. Next, run,
terraform init
This will install the provide plugin as well as the local modules we have defined. Following this, we can run the following command to learn what terraform
would be doing.
terraform plan [-var="include_module_storage="]
The plan would look like this.
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
+ create
<= read (data resources)
Terraform will perform the following actions:
# data.google_service_account_access_token.default will be read during apply
# (config refers to values not yet known)
<= data "google_service_account_access_token" "default" {
+ access_token = (sensitive value)
+ id = (known after apply)
+ scopes = [
+ "cloud-platform",
+ "userinfo-email",
]
+ target_service_account = (known after apply)
}
...
# module.iam.google_service_account_iam_binding.admin_account_iam will be created
+ resource "google_service_account_iam_binding" "admin_account_iam" {
+ etag = (known after apply)
+ id = (known after apply)
+ members = [
+ "user:[email protected]",
]
+ role = "roles/iam.serviceAccountTokenCreator"
+ service_account_id = (known after apply)
}
Plan: 9 to add, 0 to change, 0 to destroy.
If we're happy with the plan, we can run the command below to apply the changes.
terraform apply [-var="include_module_storage="]
If everything is successful, you should see a terraform.tfstate
file appearing in your working directory, delineating all the changes that were applied. Visit the README here for detailed instructions. You can go to the GCP console → IAM → Service accounts and make sure the service accounts are created properly.

You'll also see a cluster named sd-cluster
in the GCP console → Kubernetes Engine → Clusters.


Great, now we have everything we need to deploy our ML model as a service. We'll look at how we can do this in the next part of the tutorial.
So far you,
- Learned what
terraform
is and how it can make infrastructure management easy - Created the identities (service accounts) and setting them up with correct roles
- Understood what a GKE cluster is and created one by impersonating the required service account
Troubleshooting & Caveats
- Error:
Unable to connect to the server: x509: certificate has expired or is not yet valid
- Solution 1: This can be due to
gcloud
session expiring. Simply rungcloud auth login
and complete the login process. - Solution 2: There is a bug in WSL where the clock within WSL is out of sync with the Windows clock. You can run
sudo hwclock -s
to trigger the sync
Caveat: If you're using bash within Powershell (powered by WSL), you may not be able to export environment variables (to be used by terraform
). So I'd recommend not using environment variables if you rely on that.
Appendix
Resource blocks
Describes one or more infrastructure objects (e.g. VMs, a cluster, VPCs). Each resource is identified by a resource type and a unique name.
resource "google_service_account" "sa_gke_admin" {
account_id = "gke-admin"
display_name = "GKE Service Account (Admin)"
}
Data source / data block
Represents a source of data and the data associated with it
data "google_service_account_access_token" "default" {
provider = google.impersonation_helper
target_service_account = module.iam.service_account_gke_admin
scopes = ["userinfo-email", "cloud-platform"]
depends_on = [module.iam]
}
Provider plugin
Provides access to resource types and data sources associated with a certain provider.
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "3.5.0"
}
}
}
Input & output variables
Act as arguments and return types for modules.
variable "gcp_user" {
type = string
description = "Your username for GCP"
}
output "service_account_gke_node" {
description = "GKE node service account"
value = google_service_account.sa_gke_node.email
}
variable "gcp_user" {
type = string
description = "Your username for GCP"
}
output "service_account_gke_node" {
description = "GKE node service account"
value = google_service_account.sa_gke_node.email
}
Acknowledgement
I'd like to acknowledge the ML Developer Programs and the team for the GCP credits provided to make this tutorial a success.