Running Jaffle Shop dbt Project in Docker

Author:Murphy  |  View: 26471  |  Time: 2025-03-23 18:46:20

If you are new to data build tool (dbt) you have probably come across the so called Jaffle Shop, a project used for testing purposes.

jaffle_shop is a fictional ecommerce store. This dbt project transforms raw data from an app database into a customers and orders model ready for analytics.

One fundamental issue I observed with the Jaffle Shop project is that it expects users, who may be newcomers to dbt, to configure and host a local database for the dbt models to materialize.

In this tutorial, I'll demonstrate how to create a containerized version of the project using Docker. This will allow us to deploy a Postgres instance and configure the dbt project to read from and write to that database. I'll also provide a link to a GitHub project I've created that will help you get all the services up and running in no time.


Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering


Creating the Dockerfile and docker-compose.yml

Let's begin by defining the services we want to run through Docker. First, we'll create a [docker-compose.yml](https://github.com/gmyrianthous/jaffle_shop/blob/main/docker-compose.yml) file where we'll define two services. The first service will be the Postgres database, and the second will be a custom service that we'll create in the next step using a Dockerfile.

# docker-compose.yml

version: "3.9"

services:
  postgres:
    container_name: postgres
    image: postgres:15.2-alpine
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    ports:
      - 5432
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
  Dbt:
    container_name: dbt
    build: .
    image: dbt-jaffle-shop
    volumes:
      - ./:/usr/src/dbt
    depends_on:
      postgres:
        condition: service_healthy

The file specifies the version of Docker Compose being used (version 3.9). It defines two services, postgres and dbt, each with their own settings.

The postgres service is based on the official postgres Docker image version 15.2-alpine. It sets the container name to postgres, maps port 5432 (the default port for Postgres) to the host machine, and sets environment variables for the Postgres user and password. The healthcheck section specifies a command to test if the container is healthy, and sets a timeout and retries for the check.

The dbt service specifies a dbt container of the Docker image of current directory (using a Dockerfile). It mounts the current directory as a volume within the container, and specifies that it depends on the postgres service, and will only start once the postgres service is healthy.

In order to containerize the Jaffle Shop project, we need to create a [Dockerfile](https://github.com/gmyrianthous/jaffle_shop/blob/main/Dockerfile) that installs the necessary dependencies for both Python and dbt, and ensures that the container remains active once the environment has been set up.

# Dockerfile

FROM --platform=linux/amd64 python:3.10-slim-buster

RUN apt-get update 
    && apt-get install -y --no-install-recommends

WORKDIR /usr/src/dbt

# Install the dbt Postgres adapter. This step will also install dbt-core
RUN pip install --upgrade pip
RUN pip install dbt-postgres==1.2.0
RUN pip install pytz

# Install dbt dependencies (as specified in packages.yml file)
# Build seeds, models and snapshots (and run tests wherever applicable)
CMD dbt deps && dbt build --profiles-dir ./profiles && sleep infinity

Configuring Postgres with dbt

To interact with dbt, we'll use the dbt Command Line Interface (CLI). A directory containing a [dbt_project.yml](https://github.com/gmyrianthous/jaffle_shop/blob/main/dbt_project.yml) file is considered a dbt project by the dbt CLI.

We'll create one and specify some basic configurations, such as the dbt project name and the profile to use (which we'll create in the next step). Additionally, we'll specify the paths containing the various dbt entities and provide configuration about their materialization.

# dbt_project.yml

name: 'jaffle_shop'

config-version: 2
version: '0.1'

profile: 'jaffle_shop'

model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
analysis-paths: ["analysis"]
macro-paths: ["macros"]

target-path: "target"
clean-targets:
    - "target"
    - "dbt_modules"
    - "logs"

require-dbt-version: [">=1.0.0", "<2.0.0"]

models:
  jaffle_shop:
      materialized: table
      staging:
        materialized: view

Now the profiles.yml file is used to store dbt profiles. A profile consists of targets, each of which specifying the connection details and credentials for the database or the data warehouse.

# profiles.yml

jaffle_shop:
  target: dev
  outputs:
    dev:
      type: postgres
      host: postgres
      user: postgres
      password: postgres
      port: 5432
      dbname: postgres
      schema: public
      threads: 1

This file defines a profile named jaffle_shop that specifies the connection details for a Postgres database running on a Docker container named postgres.

  • jaffle_shop: This is the name of the profile. It's an arbitrary name chosen by the user to identify the profile.
  • target: dev: This specifies the default target for the profile, which in this case is named dev.
  • outputs: This section lists the output configurations for the profile, with the default output configuration named dev.
  • dev: This specifies the connection details for the dev target, which uses a Postgres database.
  • type: postgres: This specifies the type of the output, which in this case is a Postgres database.
  • host: postgres: This specifies the hostname or IP address of the Postgres database server.
  • user: postgres: This specifies the username used to connect to the Postgres database.
  • password: postgres: This specifies the password used to authenticate with the Postgres database.
  • port: 5432: This specifies the port number on which the Postgres database is listening.
  • dbname: postgres: This specifies the name of the Postgres database to connect to.
  • schema: public: This specifies the schema name to use when executing queries against the database.
  • threads: 1: This specifies the number of threads to use when running dbt tasks.

Jaffle Shop dbt models and seeds

The source data for Jaffle Shop project consists of csv files for customers, payments and orders. In dbt, we can load this data into our database through seeds. We then use this source data to build dbt models on top of it.

Here's an example model that generates some metrics for our customers:

with customers as (

    select * from {{ ref('stg_customers') }}

),

orders as (

    select * from {{ ref('stg_orders') }}

),

payments as (

    select * from {{ ref('stg_payments') }}

),

customer_orders as (

        select
        customer_id,

        min(order_date) as first_order,
        max(order_date) as most_recent_order,
        count(order_id) as number_of_orders
    from orders

    group by customer_id

),

customer_payments as (

    select
        orders.customer_id,
        sum(amount) as total_amount

    from payments

    left join orders on
         payments.order_id = orders.order_id

    group by orders.customer_id

),

final as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order,
        customer_orders.most_recent_order,
        customer_orders.number_of_orders,
        customer_payments.total_amount as customer_lifetime_value

    from customers

    left join customer_orders
        on customers.customer_id = customer_orders.customer_id

    left join customer_payments
        on  customers.customer_id = customer_payments.customer_id

)

select * from final

Running the services via Docker

Now let's build and spin up our Docker services. To do so, we'll simply need to run the following commands:

$ docker-compose build
$ docker-compose up

The commands above will run a Postgres instance and then build the dbt resources of Jaffle Shop as specified in the repository. These containers will remain up and running so that you can:

  • Query the Postgres database and the tables created out of dbt models
  • Run further dbt commands via dbt CLI

Running dbt commands via CLI

The dbt container, has built the specified models already. However, we can still access the container and run dbt commands via dbt CLI, either for new or modified models. To do so, we will first need to access the container.

The following command will list all active containers:

$ docker ps

Copy the id of dbt container, and then enter it when running the next command:

$ docker exec -it  /bin/bash

The command above will essentially give you access to the container's bash which means you are now able to run dbt commands.

# Install dbt deps (might not required as long as you have no -or empty- `dbt_packages.yml` file)
dbt deps

# Build seeds
dbt seeds --profiles-dir profiles

# Build data models
dbt run --profiles-dir profiles

# Build snapshots
dbt snapshot --profiles-dir profiles

# Run tests
dbt test --profiles-dir profiles

Note that since we have mounted the local directory to the running container, any changes in the local directory will be reflected to the container immediately. This means you are also able to create new models or modify existing ones and then go into the runnning container and build models, run tests, etc.


Querying the dbt models on Postgres database

You are also able to query the postgres database and the dbt models or snapshots created on it. In the same way, we will have to enter the running postgres container in order to be able to query the database directly.

# Get the container id for `postgres` service
$ docker ps

# Then copy the container id to the following command to enter the 
# running container
$ docker exec -it  /bin/bash

We will then use psql, a terminal-based interface for PostgreSQL that allows us to query the database:

$ psql -U postgres

The two commands shared below can be used to list tables and views respectively:

postgres=# dt
             List of relations
 Schema |     Name      | Type  |  Owner   
--------+---------------+-------+----------
 public | customers     | table | postgres
 public | orders        | table | postgres
 public | raw_customers | table | postgres
 public | raw_orders    | table | postgres
 public | raw_payments  | table | postgres
(5 rows)

postgres=# dv
            List of relations
 Schema |     Name      | Type |  Owner   
--------+---------------+------+----------
 public | stg_customers | view | postgres
 public | stg_orders    | view | postgres
 public | stg_payments  | view | postgres
(3 rows)

And you can now query dbt models via a SELECT query:

SELECT * FROM ;

Getting the full code

I've created a GitHub repository you can clone on your local machine and run the containerised version Jaffle Shop dbt project quickly. You can find the project as well as the code shared in this tutorial in the following link.

GitHub – gmyrianthous/jaffle_shop: This is a containerised version of Jaffle Shop dbt project


Final Thoughts

Data build tool (dbt) is one of the rapidly growing technologies in modern data stacks. If you're just starting to learn how to use dbt, I highly recommend experimenting with the Jaffle Shop project. It's a self-contained project created by dbt Labs for testing and experimentation purposes.

dbt is a tool commonly used by data analysts and analytics engineers (in addition to data engineers), and it requires a connection to a database or data warehouse. However, many analysts might not be comfortable configuring and initializing a local database.

In this article, we demonstrate how to get started with dbt and run all the services required to materialize dbt models on a local Postgres database. I hope this tutorial will help you get your dbt project and database up and running as quickly as possible. If you experience any issues running the project, please let me know in the comments, and I'll do my best to help you debug your code and configuration.


Tags: Data Engineering Data Science Dbt Programming Technology

Comment