Hyperparameter Optimization With Hyperopt - Intro & Implementation
Hyperopt is an open-source hyperparameter optimization tool that I personally use to improve my Machine Learning projects and have found it to be quite easy to implement. Hyperparameter optimization, is the process of identifying the best combination of hyperparameters for a machine learning model to satisfy an objective function (this is usually defined as "minimizing" the objective function for consistency). To use a different analogy, each machine learning model comes with various knobs and levers that we can tune, until we get the outcome that we have been looking for from the model. The act of finding the right combination of hyperparameters that results in the outcome that we have been looking for is called hyperparameter optimization. Some examples of such parameters are: learning rate, architecture of a neural network (e.g. number of hidden layers), choice of the optimizer, etc.
If you are interested in exploring other hyperparameter optimization strategies, such as grid search, random search and bayesian optimization, check out the post below:
Hyperparameter Optimization – Intro and Implementation of Grid Search, Random Search and Bayesian…
Let's get started!
1. Basics
1.1. Concepts and Installation
Let's first define some relevant concepts for using Hyperopt.
- Objective Function: This is the function that hyperparameter optimization tries to minimize. More specifically, objective function accepts a combination of hyperparameters as input and returns the error level of the model (a.k.a. loss), given those accepted hyperparameters. The goal of hyperparameter optimization is to find the combination of hyperparameters that minimize this error/loss.
- Search Space: The range of input values (i.e. parameters) that the Objective Function accepts as arguments.
- Optimization Algorithm: As the name suggests, this is the algorithm used to minimize the Objective Function. Hyperopt utilizes different search algorithms, such as random search and Tree of Parzen Estimators (TPE) (documentation).
Now that we are familiar with the concepts, let's install Hyperopt by running the command below:
pip install hyperopt
Now that we have the library installed, we will first walk through a very simple example to get a handle on how Hyperopt works. After that, we will move on to more interesting and complicated examples.
1.2. Simple Example
Let's start with a very simple one to help us understand how the overall process of hyperparameter optimization using Hyperopt works. We will start with a quadratic function of f(x) = (x - 1)²
. This function's optimized point is at x = 1
and therefore we know what to expect. As it's been a while since any of us took a calculus course, let's look at the plot of the function, which helps us better understand how that point minimizes the function. Code block below will return the plot:
# Import libraries import numpy as np import matplotlib.pyplot as plt # Define the function def f(x): return (x - 1) ** 2 # Generate x values from -5 to 5 x = np.linspace(-4, 6, 100) # Calculate corresponding y values y = f(x) # Find the minimum point min_point = np.min(y) # Create the plot plt.plot(x, y, label='f(x) = (x-1)^2') plt.xlabel('x') plt.ylabel('f(x)') plt.title('f(x) = (x-1)^2') # Set the x-axis limits plt.xlim(-4, 6) # Add a horizontal dashed line at the minimum point plt.axhline(y=min_point, color='red', linestyle='dashed', label='Minimum Point') # Add a legend plt.legend() # Display the plot plt.show()
Results:
As we can see, the minimum point happens where x=1
. Let's implement this using Hyperopt and see how it works.
In order to do so, we will take the following steps:
- Import necessary libraries and packages
- Define the objective function and the search space
- Run the optimization process
- Print the results (i.e. the optimized point that we expect to be
x = 1
)
Code block below, follows the above steps:
# 1. Import necessary libraries and packages from hyperopt import hp, fmin, tpe, Trials # 2. Define the objective function and the search space def objective_function(x): return (x - 1)**2 search_space = hp.uniform('x', -2, 2) # 3. Run the optimization process # Trials object to store the results trials = Trials() # Run the optimization best = fmin(fn=objective_function, space=search_space, algo=tpe.suggest, trials=trials, max_evals=100) # 4. Print the results print(best)
Results:
"best" returns the best combination of hyperparameters that the model was able to find and in this case it is almost equal to x = 1
, as we expected! The process to implement Hyperopt is generally the same and now that we have walked through a simple one, let's move to a more advanced example.
2. Hyperopt Implementation
We will implement two separate examples as follows:
- A classification with Support Vector Machine
- A regression with Random Forest Regressor
We will walk through the details of each of these two examples.
2.1. Support Vector Machines and Iris Data Set
In a previous post I used Grid Search, Random Search and Bayesian Optimization for hyperparameter optimization using the Iris data set provided by scikit-learn. Iris data set includes 3 different irises petal and sepal lengths and is a commonly-used data set for classification exercises. In this post, we will use the same data set but we will use a Support Vector Machine (SVM) as a model with two parameters that we can optimize as follows:
C
: Regularization parameter, which trades off misclassification of training examples against simplicity of the decision surface.gamma
: Kernel coefficient, which defines how much influence a single training example has. The larger gamma is, the closer other examples must be to be affected.
Since the goal of this exercise is to go through the hyperparameter optimization, I will not go deeper into what SVMs do but if you are interested, I find this scikit-learn post helpful.
We will generally follow the same steps that we used in the simple example earlier but will also visualize the process at the end:
- Import necessary libraries and packages
- Define the objective function and the search space
- Run the optimization process
- Visualize the optimization
2.1.1. Step 1 – Import Libraries and Packages
Let's import the libraries and packages and then load the data set.
# Import libraries and packages from sklearn import datasets from sklearn.svm import SVC from sklearn.model_selection import cross_val_score # Load Iris dataset iris = datasets.load_iris() X = iris.data y = iris.target
2.1.2. Step 2 – Define Objective Function and Search Space
Let's first start with defining the objective function, which will train an SVM and returns the negative of the cross-validation score – that is what we want to minimize. Note that we are minimizing the negative of cross-validation score to be consistent with the general goal of "minimizing" the objective function (instead of "maximizing" the cross-validation score).
def objective_function(parameters): clf = SVC(**parameters) score = cross_val_score(clf, X, y, cv=5).mean() return -score
Next we will define the search space, which consists of the values that our parameters of C
and gamma
can take. Note that we will use Hyperopt's hp.uniform(label, low, high)
, which returns a value uniformly between "low" and "high" (source).
# Search Space search_space = { 'C': hp.uniform('C', 0.1, 10), 'gamma': hp.uniform('gamma', 0.01, 1) }
2.1.3. Run Optimization
Same as the simple example earlier, we will use a TPE algorithm and store the results in a Trials
object.
# Trials object to store the results trials = Trials() # Run optimization best = fmin(fn=objective_function, space=search_space, algo=tpe.suggest, trials=trials, max_evals=100)
Results:
2.1.4. Visualize Optimization
As we remember from the simple example, "best" includes the selected set of hyperparameters that Hyperopt found based on the implemented optimization strategy. Let's look at the results!
print(best)
As expected, now we have a combination of hyperparameters that minimize the optimization function, using Hyperopt.
Let's visually look at how the objective function values change as the hyperparameters change. We will start with defining a function named plot_obj_vs_hp()
that accomplishes this visualization. And then use that function to visualize the results. Make sure to look for the red dot – that one indicates the best combination of hyperparameters, according to our hyperparameter optimization!
# Import libraries import matplotlib.pyplot as plt def plot_obj_vs_hp(trials, search_space, best): # Extract the results results = trials.trials # Create a list of hyperparameters hyperparameters = list(search_space.keys()) # Create a new figure with 2 subplots side by side fig, axes = plt.subplots(1, 2, figsize=(12, 6)) # Loop through hyperparameters and generate plots for idx, hp in enumerate(hyperparameters): # Extract the values of a given hyperparameter hp_values = [res['misc']['vals'][f'{hp}'] for res in results] # Flatten the list of values hp_values = [item for sublist in hp_values for item in sublist] # Extract the corresponding objective function values objective_values = [res['result']['loss'] for res in results] # Create the scatter plot axes[idx].scatter(hp_values, objective_values, label='Trial Hyperparameter Combinations') # Highlight the best hyperparameters axes[idx].scatter(best[hp], min(objective_values), color='red', label='Best Hyperparameter Combinations') axes[idx].set_xlabel(f'{hp}') axes[idx].set_ylabel('Loss') axes[idx].set_title(f'Loss vs. {hp}') axes[idx].legend(loc='upper right') plt.tight_layout() plt.show()
# Plot optimization vs. hyperparameters in 2D plot_obj_vs_hp(trials, search_space, best)
Results:
Note that since C
and gamma
are not really related to each other, we are showing them separately versus changes of the objective function. Since we want the objective function to be minimized, then we're looking for the furthest bottom side of the plots above and based on the results of the hyperparameter optimization, we know that what we are looking for is where {'C': 5.164418859504847, 'gamma': 0.07084064498886927}
, which results in an objective function loss of around -0.986 and is indicated by a red dot.
I was also curious to look at these plots in a three-dimensional manner so I created the function below to accomplish that. Let's look at the plot.
# Import libraries from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt # Define 3D plot function def plot_obj_vs_hp_3d(trials, search_space, best): # Extract the results results = trials.trials # Create a list of hyperparameters hyperparameters = list(search_space.keys()) # Extract the values of hyperparameters hp_values_0 = [res['misc']['vals'][f'{hyperparameters[0]}'] for res in results] hp_values_1 = [res['misc']['vals'][f'{hyperparameters[1]}'] for res in results] # Flatten the lists of values hp_values_0 = [item for sublist in hp_values_0 for item in sublist] hp_values_1 = [item for sublist in hp_values_1 for item in sublist] # Extract the corresponding objective function values objective_values = [res['result']['loss'] for res in results] # Create a new figure fig = plt.figure(figsize=(10, 7)) # Add a 3D subplot ax = fig.add_subplot(111, projection='3d') # Create the scatter plot scatter = ax.scatter(hp_values_0, hp_values_1, objective_values, c=objective_values, cmap='viridis', label='Trial hyperparameters') # Highlight the best hyperparameters ax.scatter(best[hyperparameters[0]], best[hyperparameters[1]], min(objective_values), color='red', label='Best hyperparameters') # Add labels using hyperparameters from search_space ax.set_xlabel(hyperparameters[0]) ax.set_ylabel(hyperparameters[1]) ax.set_zlabel('Loss') ax.set_title('Loss Across Hyperparameters') fig.colorbar(scatter) ax.legend(loc='upper right') plt.show()
# Plot optimization vs. hyperparameters in 3D plot_obj_vs_hp_3d(trials, search_space, best)
Results:
Admittedly, this is not very easy to read but let's give it a shot. We are looking for the lowest loss, which is the darkest dots on the plot (and the red dot is almost hidden by one of the dark dots). Visually it aligns with the two-dimensional plots that we had generated before.
Next, let's focus on a regression example.
2.2. Random Forest and Diabetes Data Set
This example focuses on a regression model that attempts at measuring the progression of the disease, one year after baseline. This data set is also taken from scikit-learn but the difference is that the data set is mainly used for regression (instead of classification that we looked at with the Iris example).
If you are interested in learning more about the differences between regression and classification, check out the post below.
Classification vs. Regression in Machine Learning – Which One Should I Use?
We will use a Random Forest Regressor model for this example and will optimize the objective function for two hyperparameters as follows:
n_estimators
: Number of trees in the random forestmax_depth
: Maximum depth of trees in the random forest
The overall process of optimization is the same as what we have done so far. So, let's break it down into our four usual steps!
2.2.1. Step 1 – Import Libraries and Packages
We will start with importing the libraries and packages and then loading the data set.
# Import libraries and packages from sklearn import datasets from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import cross_val_score from hyperopt import fmin, tpe, hp, Trials import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D # Load Diabetes dataset diabetes = datasets.load_diabetes() X = diabetes.data y = diabetes.target
2.2.2. Step 2 – Define Objective Function and Search Space
Similar to last time, let's first start with defining the objective function, which will train our Random Forest Regressor and return the negative of the cross-validation score.
Next we will define the search space, which consists of the values that our parameters of n_estimators
and max_depth
can take. Note that we will use Hyperopt's hp.choice(label, options)
, which takes a value for the hyperparameter (i.e. label
) and the possible values (i.e. options
) for that hyperparameter (source).
# Define objective function def objective_function(parameters): # Initiate RandomForestRegressor regressor = RandomForestRegressor(**parameters) # Calculate the mean cross-validation score using 5 folds score = cross_val_score(regressor, X, y, cv=5).mean() return -score # Define search Space search_space = { 'n_estimators': hp.choice('n_estimators', range(10, 300)), 'max_depth': hp.choice('max_depth', range(1, 30)), }
2.2.3. Run Optimization
Same as the examples earlier, we will use a TPE algorithm and store the results in a Trials
object.
# Trials object to store the results trials = Trials() # Run optimization best = fmin(fn=objective_function, space=search_space, algo=tpe.suggest, trials=trials, max_evals=100)
Results:
2.2.4. Visualize Optimization
As we remember from the simple example, "best" includes the selected set of hyperparameters that Hyperopt found based on the implemented optimization strategy. Let's look at the results!
print(best)
Results:
As expected, now we have a combination of hyperparameters that minimize the optimization function, using Hyperopt. Let's visually look at how the objective function values change as the hyperparameters change, using the functions that we had defined before to create the plots in two and three dimensions.
# Plot optimization vs. hyperparameters in 2D plot_obj_vs_hp(trials, search_space, best)
Results:
# Plot optimization vs. hyperparameters in 3D plot_obj_vs_hp_3d(trials, search_space, best)
Results:
Conclusion
In this post, we introduced Hyperopt – a powerful and open-source hyperparameter optimization tool and then walked through examples of implementation in the context of classification via a Support Vector Machine and regression via a Random Forest Regressor. Then we looked the best combination of hyperparameters found through these processes and visualized the results in two and three dimensions.
Thanks for Reading!
If you found this post helpful, please follow me on Medium and subscribe to receive my latest posts!
(All images, unless otherwise noted, are by the author.)