Decision Science Meets Design
The design process has changed dramatically over the last few decades. What was once a domain driven by human intuition, judgment, and aesthetic preferences is now augmented by computational methods and data-driven processes. This transition is exemplified by the intersection of data science and design, a crossroads where precision meets creativity.
The utility of data-driven techniques in design is well demonstrated within its subdomain, generative design, a methodology that uses computational algorithms to produce multiple design variations based on predefined criteria. However, as these design problems become more intricate and multidimensional, there's a need for more sophisticated techniques to find satisfactory solutions. This is where decision science, specifically reinforcement learning, comes into play.
Applying Decision Science to Generative Design
The core of design is not so much creation as it is the series of purposeful decisions that lead to that creation.
Decision science is fundamentally about making informed choices by assessing the options available within a context based on their predicted or known consequences. It entails quantitative statistical approaches combined with optimization processes. When applied to generative design, decision science can help determine the design decisions or sequences of decisions that would improve a certain configuration or design instance. This process would require three components:
- Evaluating Designs: assessing the performance or quality of each variation to understand the contribution of each design choice towards the desired outcome
- Optimization: synthesizing the sequence of design choices that would yield feasible and satisfactory design variants
- Scenario Analysis: exploring various design possibilities by making design decisions in different contexts and constraints
Framing Generative Design Problems as Markov Decision Processes (MDPs)
Before delving deeper into deep reinforcement learning (DRL) for generative design, framing these design problems in the context of Markov Decision Processes (MDPs) is vital. But what is an MDP?
MDPs are a mathematical framework for modeling decision-making in settings where outcomes are partially determined by probabilistic dynamics and partly by the actions of the decision-maker. It is comprised of the following major components:
- States (S): represent different scenarios or conditions.
- Actions (A): represent choices available in each state.
- Transitions (P): represent the probability of moving from one state to another after taking an action.
- Rewards (R): represent feedback or outcome from taking an action in a state.
In the context of generative design, we can think of states as design configurations, actions as design modifications, and transitions as the likelihood of a design modification leading from one initial design configuration to another. The reward, on the other hand, is the feedback that conveys the performance measure of the design instance to the designer and guides the whole design process.
Tackling Generative Design with Deep Reinforcement Learning (DRL)
Reinforcement learning (RL) aims to learn the best strategy of action to perform a task through a trial-and-error process. The agent, which in our context is an adaptive design policy, learns from the environment by taking actions (modifying designs) and receiving rewards or penalties based on the outcomes (efficiency or performance).
The challenge arises when dealing with large state and action spaces in design problems. This is where deep learning, specifically deep reinforcement learning (DRL), becomes invaluable. DRL combines the decision-making capabilities of RL with the powerful function approximation of deep learning. In simpler terms, it uses neural networks to predict the best actions to take in large and complex design scenarios.
DRL in Action: Optimizing Building Placement on Topography
Consider the challenge of constructing a building on uneven terrain. One of the considerations that a designer might have is to situate the building so that the earthmoving (cutting and filling) is minimized. Every possible position within the terrain represents an action, and the resulting cut-and-fill volume represents the reward (or a penalty in this case).
We'll go through a workflow to showcase how DRL agents can be trained to place buildings on topography while minimizing the cut and fill volume required.
Defining the Observation and Action Spaces
We first define the DRL agent's action space. The agent controls three parameters of the building mass: its x and y coordinates and its rotation angle (theta). We use a 3-dimensional discrete action space representation. As for the observation space, an image frame of the terrain with the position of the building on it is used to represent the state of our environment.
import numpy as np
import torch
# 3-dim action space
param1_space = np.linspace(start=0.1, stop=0.9, num=17)
param2_space = np.linspace(start=0.1, stop=0.9, num=17)
param3_space = np.linspace(start=0, stop=160, num=17)
# Define action space
param1_space = torch.from_numpy(param1_space)
param2_space = torch.from_numpy(param2_space)
param3_space = torch.from_numpy(param3_space)
Reward Function
The agent's primary objective is to minimize the earthmoving required to construct the building. For this, the reward function penalizes the agent according to the cut and fill volume.
The agent receives a penalty value equivalent to the cut and fill volume necessary to place the building at each step. In the multi-building setting, there is an additional -5 penalty if the building mass intersects with any previously positioned building masses during the training episode. The reward signal is computed at each step in the Rhinoceros 3D Grasshopper environment according to the following code:
# Grasshopper reward computation code
try:
from ladybug_rhino.grasshopper import all_required_inputs
except ImportError as e:
raise ImportError('nFailed to import ladybug_rhino:nt{}'.format(e))
if all_required_inputs(ghenv.Component):
reward = 0
reward -= Soil_volume / 1000
done = False
bInter_relationList = [list(i) for i in bInter_relation.Branches]
if len(bInter_relationList[0]) > 1:
for i in bInter_relationList[0]:
# building mass is inside some previously placed one
if i == 0:
reward -= 5
# building mass intersects with some previously placed one
elif i == 1:
reward -= 5
# compensate for self-intersection
reward += 5
Connection between Agent and Environment
With the observation and action spaces and reward function established, it is necessary to facilitate the interaction between the DRL agent and the Grasshopper simulation environment. This is orchestrated using sockets, a popular method for inter-process communication.
# Define Socket connection between Grasshopper and RL agent in Python
import socket
HOST = '127.0.0.1'
timeout = 20
def done_from_gh_client(socket):
socket.listen()
conn, _ = socket.accept()
with conn:
return_byt = conn.recv(5000)
return_str = return_byt.decode()
return eval(return_str)
def reward_from_gh_client(socket):
socket.listen()
conn, _ = socket.accept()
with conn:
return_byt = conn.recv(5000)
return_str = return_byt.decode()
if return_str == 'None':
return_float = 0
else:
return_float = float(return_str)
return return_float
def fp_from_gh_client(socket):
socket.listen()
conn, _ = socket.accept()
with conn:
return_byt = conn.recv(5000)
fp = return_byt.decode()
return fp
def send_ep_count_to_gh_client(socket, message):
message_str = str(message)
message_byt = message_str.encode()
socket.listen()
conn, _ = socket.accept()
with conn:
conn.send(message_byt)
def send_to_gh_client(socket, message):
message_str = ''
for item in message:
listToStr = ' '.join(map(str, item))
message_str = message_str + listToStr + 'n'
message_byt = message_str.encode()
socket.listen()
conn, _ = socket.accept()
with conn:
conn.send(message_byt)
DRL Actor Critic Model Definition
After defining various utility functions for communication, we define and then initialize the DRL model and the ADAM optimizer for training:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
from torch.distributions import Categorical
# Actor Critic Model Architecture
def enc_block(in_c, out_c, BN=True):
if BN:
conv = nn.Sequential(
nn.Conv2d(in_c, out_c, kernel_size=4, stride=2,
padding=1, bias=True),
nn.BatchNorm2d(out_c),
nn.LeakyReLU(negative_slope=0.2, inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2)
)
return conv
else:
conv = nn.Sequential(
nn.Conv2d(in_c, out_c, kernel_size=4, stride=2,
padding=1, bias=True),
nn.LeakyReLU(negative_slope=0.2, inplace=True)
)
return conv
class GRUpolicy(nn.Module):
def __init__(self, n_gru_layers, hidden_size, lin_size1, lin_size2,
enc_size1, enc_size2, enc_size3):
super(GRUpolicy, self).__init__()
#critic
self.critic_enc1 = enc_block(3, enc_size1, BN=False)
self.critic_enc2 = enc_block(enc_size1, enc_size2, BN=True)
self.critic_enc3 = enc_block(enc_size2, enc_size3, BN=True)
self.critic_enc4 = enc_block(enc_size3, 128, BN=True)
self.critic_linear1 = nn.Linear(512, lin_size1)
self.critic_linear2 = nn.Linear(lin_size1, lin_size2)
self.critic_linear3 = nn.Linear(lin_size2, 1)
# actor
self.gru1 = nn.GRU(4, hidden_size, n_gru_layers, batch_first=True)
self.gru2 = nn.GRU(4, hidden_size, n_gru_layers, batch_first=True)
self.gru3 = nn.GRU(4, hidden_size, n_gru_layers, batch_first=True)
self.actor_linear = nn.Linear(hidden_size, 17)
def forward(self, state):
state = Variable(state.unsqueeze(0))
# critic
enc = self.critic_enc1(state)
enc = self.critic_enc2(enc)
enc = self.critic_enc3(enc)
enc = self.critic_enc4(enc)
value = F.relu(self.critic_linear1(torch.flatten(enc)))
value = F.relu(self.critic_linear2(value))
value = self.critic_linear3(value)
# actor
seq = torch.reshape(enc, (1, 128, 4))
out1, h_1 = self.gru1(seq)
out_s1 = torch.squeeze(out1[:, -1, :])
out_l1 = self.actor_linear(out_s1)
prob1 = F.softmax(out_l1, dim=-1)
dist1 = Categorical(prob1)
out2, h_2 = self.gru2(seq, h_1)
out_s2 = torch.squeeze(out2[:, -1, :])
out_l2 = self.actor_linear(out_s2)
prob2 = F.softmax(out_l2, dim=-1)
dist2 = Categorical(prob2)
out3, _ = self.gru3(seq, h_2)
out_s3 = torch.squeeze(out3[:, -1, :])
out_l3 = self.actor_linear(out_s3)
prob3 = F.softmax(out_l3, dim=-1)
dist3 = Categorical(prob3)
return value, dist1, dist2, dist3
# Set device
is_cuda = torch.cuda.is_available()
device = torch.device('cuda' if is_cuda else 'cpu')
print(f'Used Device: {device}')
# Initialize DRL model
actorcritic = GRUpolicy(config.n_gru_layers, config.hidden_size,
config.lin_size1, config.lin_size2,
config.enc_size1, config.enc_size2,
config.enc_size3).to(device)
# Initialize optimizer
ac_optimizer = optim.Adam(actorcritic.parameters(), lr=config.lr, weight_decay = 1e-6)
The agent architecture is defined within the GRUpolicy class. It is an actor-critic architecture. The actor provides a probability distribution over actions given a state, and the critic estimates the value of that state, which is the expected return starting from that state and following the agent's policy.
Agent Training
Once the socket connection between the agent and the environment is defined, the model architecture is implemented, and an instance of the model is initialized, we are ready to train the DRL agent to place building masses on topography properly.
The core of the experiment is the training loop. Here, the agent interacts with the environment iteratively over several training episodes, following these steps:
- Once the agent receives the current state observation, it selects an action based on its current policy.
# model forward pass
value, dist1, dist2, dist3 = actorcritic.forward(state)
# get action from probability distributions
param1 = param1_space[dist1.sample()]
param2 = param2_space[dist2.sample()]
param3 = param3_space[dist3.sample()]
action = [param1, param2, param3]
- The agent then sends the action to Grasshopper and gets the resulting reward and new state.
# Send action through socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind((HOST, 8080))
s.settimeout(timeout)
send_to_gh_client(s, action)
# Send episode count through socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind((HOST, 8083))
s.settimeout(timeout)
send_ep_count_to_gh_client(s, episode)
####### Awaiting Grasshopper script response #######
# Receive observation from Grasshopper Client
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind((HOST, 8084))
s.settimeout(timeout)
fp = fp_from_gh_client(s)
# Receive Reward from Grasshopper Client
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind((HOST, 8081))
s.settimeout(timeout)
reward = reward_from_gh_client(s)
# Receive done from Grasshopper Client
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind((HOST, 8082))
s.settimeout(timeout)
done = done_from_gh_client(s)
- Then, it updates its policy based on the episodic reward it receives.
# compute loss functions
returns = []
for t in reversed(range(len(rewards))):
Qval = rewards[t] + config.gamma * Qval
returns.insert(0, Qval)
returns = torch.cat(returns).detach()
values = torch.cat(values)
log_probs = torch.cat(log_probs)
advantage = returns - values
actor_loss = -(log_probs * advantage.detach()).mean()
critic_loss = 0.5 * advantage.pow(2).mean()
ac_loss = actor_loss + critic_loss - config.beta * entropy
# update actor critic
ac_optimizer.zero_grad()
ac_loss.backward()
ac_optimizer.step()
This process continues for several training iterations until the agent converges on an optimal placement of the building mass that minimizes the terrain cut and fill volume. This iterative trial-and-error learning process, guided by reward feedback, results in optimized design configurations.
Through this experiment, you can see how DRL agents can be tailored to address real-world design challenges. This approach to generative design, enabled by DRL, presents promising avenues for future exploration at the intersection of Data Science and design.
_All implementations and details on this experiment, including implementations of the environment in Grasshopper and its associated Rhinoceros 3D files, are available in the CutnFill_DeepRL repository on GitHub._
Conclusion
By framing generative Design problems as MDPs and harnessing the power of deep reinforcement learning, we can explore design spaces more extensively, evaluate designs more objectively, and optimize them more efficiently. Computational techniques like DRL are becoming more common in design practice. The convergence of data science and design promises a future in which aesthetically pleasing and rigorously informed design proposals are autonomously generated and evaluated, leading to a quick and effective iterative design process where humans and machines collaborate to produce intelligent design solutions.