Simulated Data, Real Learnings : Simulating Systems

INTRODUCTION
Simulation is a powerful tool in the Data Science tool box. In this article, we'll talk about how simulating systems can help us formulate better strategies and make better decisions.
The specific topics of this article are:
- What is system simulation?
- System simulation optimization
- System simulation risk assessment
This is the fourth part in multi-part series that discusses simulation in data science. The first article covered how simulation can be used to test machine learning approaches, the second covered using simulation to estimate the power of a designed experiment and the third article discusses how we can make strategies by simulating scenarios.
Here are the links to those articles:
Simulated Data, Real Learnings : Part 1
WHAT IS DATA SIMULATION?
The first article spends a lot more time defining simulation. Reference that for a more in-depth conversation on what simulation is. To avoid redundancy, I'll just give a quick definition here:
Data simulation is the creation of fictitious data that mimics the properties of the real-world.
Okay, with that out of the way, let's talk about system simulation!
SYSTEM MODELLING
In many domains, data scientists need to answer questions about systems. Systems are a set of connected components that are designed to accomplish an objective. Systems are all around us – airline operations, supply chains, public transportation, electrical and plumbing systems – to name a few.
When we simulate systems, we need a good idea of how the system works; a really good idea of how the system works. We need to have a solid understanding of the system's components and how they are connected. We can get this by consulting with business partners or domain experts, or by being/becoming domain experts ourselves (there is no rule that says data scientists can't be experts of their domain and data science!).

System simulation has two primary elements; (1) simulating the system and (2) simulating data to pass through the system. Once we have a good understanding of the framework of the system, we can program a simulated system using our favorite language. Then, we can simulate any kind of data we want to pass through the system. The value add of this exercise is gathering output KPIs from the simulated data we input to the system. We can simulate any valid input we can imagen, and we can get an estimate of the KPIs we would observe under the simulated input. This can be very powerful!
By tweaking both the simulated system and the simulated data, we can generate a lot of valuable insights. System modelling with simulated data is often used for (1) optimization and (2) risk assessments. We'll touch on these points next!
Before we continue, a disclaimer. Sometimes simulation can feel like cheating — because we are just making data up — and it can be, if we are not thorough in ensuring that our simulation is based on reasonable assumptions. Remember, the insights we generate are only as good as the simulations we create. Garbage in, garbage out! Be sure you and your stake holders have thoroughly validated the assumptions that go into simulation before using the results for decision making.
Okay, enough of the stern words! Let's set up an example and then go through how we can use system simulation for optimization and risk assessment!
System Modeling Simulation Example : Grocery Store Lines
The rest of the article is going to demonstrate the power of system simulation via example. I'm a big fan of the early 2000's TV show Myth Busters. I'm taking my inspiration for the example from an experiment they ran in season 14, episode 5. In this episode, they compare two grocery store line set ups. We are going to test those two set ups using simulated systems rather than the physical system the Myth Busters used.
Here is the set up: We want to compare two different grocery store line configurations; (1) multiple lines, each with a single register and (2) a single line with multiple registers. The first configuration has a line for each register, the second has a single line that feeds into multiple registers – see the visual representation below. You've probably seen both of these in your shopping experiences.

We are going to do our system simulation in Python. The main metric we are going to keep track of is the amount of time that the last customer in the line has to wait. So, the code populates the line or lines with customers that have random check out times (how long it takes to scan their items and pay) and then keeps track of how long the last customer in the line has to wait.
Here the function to simulate the multiple lines, with multiple registers:
def multiple_lines_trial(number_lines, customers):
    '''
      Simulates the wait time of the last customer in line
      for a check out configuration with a line for each register.
      Inputs:
        number_lines (int) : number of lines and registers at checkout
        customers (list)   : list of customers checkout times, one time
                             for each customer
      Returns:
        total_wait_time (float) : total time that the last customer waited
                                  to get to a register
    '''
    # make empty list of lists, one list for each register to
    # hold customers
    lines = [[] for i in range(number_lines)]
    # remove last customer since we are calculating the time
    # the customer waits to get to the register - ignore
    # that customers actual check out time
    customers_cp = copy(customers)
    last_customer = customers_cp.pop()
    # each customr goes to the shortest line they see
    while customers_cp:
        # add customer to shortest line
        temp_cust = customers_cp[0]
        customers_cp = customers_cp[1:]
        line_length_list = [len(lines[i]) for i in range(number_lines)]
        shortest_line = line_length_list.index(min(line_length_list))
        lines[shortest_line].append(temp_cust)
    # last customer picks the shortest line as well,
    # total checkout time of shortest line at the end is the
    # last customers wait time
    line_length_list = [len(lines[i]) for i in range(number_lines)]
    shortest_line = line_length_list.index(min(line_length_list))
    total_wait_time = sum(lines[shortest_line])
    return total_wait_timeHere the function to simulate the single line, with multiple registers:
def single_line_trial(cashiers, customers):
    '''
      Simulates the wait time of the last customer in line
      for a check out configuration with a single that 
      feeds to multiple registers.
      Inputs:
        number_lines (int) : number of lines and registers at checkout
        customers (list)   : list of customers checkout times, one time
                             for each customer
      Returns:
        total_wait_time (float) : total time that the last customer waited
                                  to get to a register
    '''
    # empty list of wait times
    wait_time = []
    #make list of customers at checkout
    custm_at_checkout = customers[0:cashiers]
    #cut out number of cashiers from customers that are now at register
    customers = customers[cashiers:len(customers)]
    #go thru each customer 
    for i in customers:
        #pick shortest time at checkout
        done_at_checkout = min(custm_at_checkout)
        #get index of customer that is done
        done_index = custm_at_checkout.index(done_at_checkout)
        #subtract shortest time from each customer at checkout
        for j, cust in enumerate(custm_at_checkout):
            custm_at_checkout[j] -= done_at_checkout
        #remove customer that is done 
        custm_at_checkout.pop(done_index)
        #add the next customer in line at register
        if customers:
            custm_at_checkout.append(customers[0])
        else:
            # if no customers left, exit
            break
        # remove customer that is now at register from waiting customers
        customers = customers[1:]
        #add waiting time to list
        wait_time.append(done_at_checkout)
    #sum total wait time
    total_wait_time = sum(wait_time)
    return total_wait_timeOkay, now we have the code to simulate a single customer's wait time under both configurations! Of course, we don't want to make big decisions with a Simulation of just a single customer's experience. Let's put together some code to simulate multiple customer's wait time.
Here's the code to iteratively simulate a single customer's wait time to create multiple customer's wait times:
def run_trials(cashiers,
                customer_number,
                num_trials,
                line_type_func):
    '''
      Runs multiple customers through simulation and gather each run's
      wait time.
      Inputs
        number_lines (int)        : number of lines and registers at checkout
        customers (list)          : list of customers checkout times, one time
                                    for each customer
        num_trials (int)          : number of simulation iterations to run
        line_type_func (function) : the function that corresponds to the 
                                    grocery line configuration simulation
      Returns
        cust_wait_times (list) : list of customer wait times, one wait
                                 time for each simulation
    '''
    cust_wait_times = []
    # create list of customer check out durations based on exponential dist
    customers = list(np.random.exponential(0.5, customer_number))
    # run multiple trials
    for _ in range(num_trials):
        temp_wait = line_type_func(cashiers, customers)
        cust_wait_times.append(temp_wait)
    return cust_wait_timesNow, we have our system simulation all set up. The next step is to start running simulations to make optimal decision and understand the risks that our system is subject to!
A quick note on assumptions – With the way we wrote the system's simulation, we are making multiple assumptions. For example, we are assuming that customer duration at checkout is exponentially distributed with a lambda of 0.5; we are assuming that customers always go to the shortest line when there are multiple lines; all cashiers have the same proficiency; etc. Make sure that you understand the assumptions you are making (some assumptions are implicit in your simulation design choices, make sure you think through your design to discover and understand them!) and that you are okay with them.
Optimization
Alright, so we have everything set up and ready to start generating insights! As I've mentioned multiple times, we can use our set up to make optimal decisions.
We need to decide which line configuration is most optimal. With the framework we have now, we can run simulations to get to this answer! Let's run multiple simulations, varying the number of cashiers and the number of customers in line and see how the two configurations compare across multiple possible circumstances.
Let's write another function that calls the other functions to compare the two approaches:
def compare_trials(cashiers,
                   customer_number,
                   num_trials):
    '''
      Compares multiple trials of the two line configurations.
      Note that each trial is a paired comparison, it passes the
      same customer input into both lines and compares the KPIs
      Inputs: 
        cashiers (int)        : number of cashiers or cash 
                                      registers
        customer_number (int) : number of customers in the 
                                      system
         num_trials (int)      : number of trials to be run
      Outputs:
        single_line_watis (list) : list of wait times for the 
                                   last customer of each single
                                   line trial
        multi_line_waits (list)  : list of wait times for the 
                                   last customer of each multi
                                   line trial
    '''
    single_line_waits = []
    multi_line_waits = []
    for _ in range(num_trials):
        customers = list(np.random.exponential(0.5, customer_number))
        single_line_wait = single_line_trial(cashiers, customers)
        single_line_waits.append(single_line_wait)
        multi_line_wait = multiple_lines_trial(cashiers, customers)
        multi_line_waits.append(multi_line_wait)        
    return single_line_waits, multi_line_waitsWith this code now written, let's finally run the simulation and create some visualizations of the results:
cashier_based_waits = {}
# iterate thru different numbers of cashiers
for cashier in [2, 5, 8]:
    temp_single_wait_times = []
    temp_multi_wait_times = []
    # iterate thru different numbers of customers
    for cust_num in range(10, 150):
        temp_single_line, temp_multi_line = compare_trials(cashier,
                                                           cust_num,
                                                           150)
        # gather and save KPIs in lists
        temp_single_mean = np.mean(temp_single_line)
        temp_multi_mean = np.mean(temp_multi_line)
        temp_single_wait_times.append(temp_single_mean)
        temp_multi_wait_times.append(temp_multi_mean)
    cashier_based_waits[cashier] = (temp_single_wait_times,
                                    temp_multi_wait_times)Here's the code to make the plots:
# Create subplots
fig, axes = plt.subplots(1, 3, figsize=(15, 5))  # 1 row, 3 columns
x = range(10, 150)
# Plot data on each subplot
axes[0].plot(x, cashier_based_waits[2][0], label = 'single line')
axes[0].plot(x, cashier_based_waits[2][1], label = 'multi lines')
axes[0].legend()
axes[0].set_title('2 Cashiers')
axes[0].set_xlabel('Num Customers')
axes[0].set_ylabel('Wait times')
axes[1].plot(x, cashier_based_waits[5][0], label = 'single line')
axes[1].plot(x, cashier_based_waits[5][1], label = 'multi lines')
axes[1].set_title('5 Cashiers')
axes[1].set_xlabel('Num Customers')
axes[1].set_ylabel('Wait times')
axes[1].legend()
axes[2].plot(x, cashier_based_waits[8][0], label = 'single line')
axes[2].plot(x, cashier_based_waits[8][1], label = 'multi lines')
axes[2].set_title('8 Cashiers')
axes[2].set_xlabel('Num Customers')
axes[2].set_ylabel('Wait times')
axes[2].legend()
# Adjust layout
plt.tight_layout()
plt.savefig('wait_times.png')
# Show plot
plt.show()Alright, that was a lot of work, we finally have the results of our analysis, let's take a look!

What is the take away here? It looks like with 2 cashiers, the two approaches are essentially the same, but as the number of cashiers increases, differences start to emerge. The multi-line approach starts showing more variance and is almost always higher than the single-line approach. This make sense, because as there are more cashiers, each line is shorter (given the same number of customers) which means that there is a higher likelihood that the line may randomly have more slow customers (think about the variance of data with small sample sizes). Since the single line approach just has one line that is serviced by multiple registers, it is not subject to the same ‘small sample size' phenomena. Given these results, if there are no external factors that need to be considered and we feel that our simulation is representative of the real world we should use the single-line configuration to have shorter and less volatile wait times.
Note that we don't have to make this decision and never look back! We can decide to move forward with the single line strategy and then observe the actual wait times in our stores to see if we observe expected results. Perhaps we don't have 100% faith that our simulation is correct in recommending the single line over the multi-line. We could use the simulation's results to put most of our stores on the single-line set up (because we think it is the best) while putting a smaller fraction on the multi-line. We can then observe to see if our simulated conclusions play out in the real world. Even if we don't feel 100% good about our simulation, we can still use our learnings to inform how we gather more data in our real world experimentation.
Risk Assessments
System modelling with simulated data can also be used to find potential issues, weak points and bottle necks in a system. In this discipline, after we create the modelled system, we simulate extreme data that is meant to expose the system's weaknesses.
If we are running a system analysis for an airline, we could simulate data that represents the Denver airport closing for two days due to bad weather. We could then observe where and how badly this extreme scenario hits in our system. In the example of the grocery store configuration, we could simulate what would happen if a small percent of customers randomly took a really long time at a register (e.g. paying with checks, using a coupon for each item, buying groceries for a daycare).
From the insights we get from simulating and inputting extreme data to our system models, we can have a better understanding of where our system will break or strain under pressure. From this knowledge, we can place safeguards or other provisions in our system to protect against extreme circumstances.
Let's modify our simulated data to make 10% of customers take 3 times as long at the check out and see how that impacts our wait times.
Here, I added stress_factor and stress_pct parameters to the compare_trials function. These parameters allow the user to add a wait multiplier (stress_factor) to a randomly selected percentage of the customers (stress_pct).
Here is the updated code:
def compare_trials(cashiers,
                   customer_number,
                   num_trials,
                   stress_factor = 0,
                   stress_pct = 0.10):
 '''
      Compares multiple trials of the two line configurations.
      Has the capacity to stress the system by creating customers
      that have a checkout time that is increased by a multiplier.
      Note that each trial is a paired comparison, it passes the
      same customer input into both lines and compares the KPIs.
      Inputs: 
        cashiers (int)                 : number of cashiers or cash 
                                         registers
        customer_number (int)          : number of customers in the 
                                         system
        num_trials (int)               : number of trials to 
                                         be run
        stress_factor (float, def = 0) : multiplier to customers
                                         wait time if customer
                                         is selected as a stress
                                         customer
        stress_pct (float, def = 0.1)  : percent of customers that
                                         are randomly selected to
                                         be stress customers
      Outputs:
        single_line_watis (list) : list of wait times for the 
                                   last customer of each single
                                   line trial
        multi_line_waits (list)  : list of wait times for the 
                                   last customer of each multi
                                   line trial
    '''
    single_line_waits = []
    multi_line_waits = []
    for _ in range(num_trials):
        customers = list(np.random.exponential(0.5, customer_number))
        # add stress factor
        if stress_factor > 0:
            num_slow_custs = int(customer_number * stress_pct)
            slow_cust_index = np.random.choice(range(0, customer_number),
                                               num_slow_custs)
            for i in slow_cust_index:
                customers[i] = customers[i]*stress_factor
        single_line_wait = single_line_trial(cashiers, customers)
        single_line_waits.append(single_line_wait)
        multi_line_wait = multiple_lines_trial(cashiers, customers)
        multi_line_waits.append(multi_line_wait)        
    return single_line_waits, multi_line_waitsWith this new capability in place, let's run a scenario with 3 cashiers and 100 customers with and without stress – we'll then look at out the stress impacts the wait times for the our favorite configuration, the single line.
single_line_stress, multi_line_stress = compare_trials(3,
                                                   100,
                                                   100,
                                                   stress_factor = 3,
                                                   stress_pct = 0.1)
single_line_no_stress, multi_line_no_stress = compare_trials(3,
                                                   100,
                                                   100)
plt.hist(single_line_stress, label = 'Stressed Scenario', alpha = 0.5)
plt.hist(single_line_no_stress, label = 'Baseline Scenario', alpha = 0.5)
plt.legend()
plt.savefig('stressed_dist.png')
plt.show()
Wow, those 10% of slow customers really causes issues in our lines! We not only have a higher average wait time, but we also have higher variance! Now we can quantify the challenges associated with slower customers. If we feel like the results of the stress testing are above our tolerance threshold, we can look into potential mitigating strategies. Perhaps we could have an extra employee that typically works in another area but can be on call take slow customers at a register that is usually not open. Or maybe we could find what causes extremely slow wait times and train cashiers in techniques to speed up the process.
The main takeaway is that we can quantify the impact of specific stresses to our system and from that understanding, we can create proportionate counter measures to make sure our system performs within tolerance even in extreme situations.
Wrapping it up
System simulation is a low-cost, flexible way to generate insights about real world systems. Simulated systems and data can be used to optimize systems to increase efficiency. They can also be used to stress test systems for extreme scenarios. It is important to make sure that the simulated scenario and data are representative of the real world. We can take the insights we create and implement them immediately if we are confident in their validity, or we can use them to inform testing strategies to gather more confidence in the insights.

