Introduction to PyTorch: from training loop to prediction

In this post we will cover how to implement a logistic regression model using PyTorch in Python.
PyTorch is one of the most famous and used deep learning frameworks by the community of data scientists and machine learning engineers in the world, and thus learning this tool becomes an essential step in your learning path if you want to build a career in the field of applied AI.
It joins TensorFlow, another very famous deep learning framework developed by Google.
There are no notable fundamental differences, except for the structure and organization of their APIs, which can be very different.
While both frameworks allow us to create very complex neural networks, PyTorch is generally preferred due to its more pythonic style and the freedom it allows the developer to integrate custom logic into the software.
We will use the Sklearn breast cancer dataset, an open source dataset already used previously in some of my previous article to train a binary classification model.
The goal is to explain how to:
- go from a pandas dataframe to PyTorch's Datasets and DataLoaders
- create a neural network for binary classification in PyTorch
- create predictions
- evaluate the performance of our model with utility functions and matplotlib
- use this network to make predictions
By the end of this article we will have a clear idea of how to create a neural network in PyTorch and how the training loop works.
Let's get started!
Install PyTorch and other dependencies
We start our project by creating a virtual environment in a dedicated folder.
Visit this link to learn how to create a virtual environment with Conda.
How to Set Up a Development Environment for Machine Learning
Once our virtual environment has been created, we can run the command
$ pip install torch -U
in the terminal. This command will install the latest version of PyTorch, which as of this writing is version 2.0.
Starting a notebook, we can check the library version using torch.__version__
after doing import torch
.
We can verify that PyTorch is correctly installed in the environment by importing and launching a small test script, as shown in the official guide.
import torch
x = torch.rand(5, 3)
print(x)
>>> tensor([[0.3890, 0.6087, 0.2300],
[0.1866, 0.4871, 0.9468],
[0.2254, 0.7217, 0.4173],
[0.1243, 0.1482, 0.6797],
[0.2430, 0.4608, 0.8886]])
If the script executes correctly then we are ready to proceed with the project. Otherwise I suggest the reader to refer to the official guide located here https://pytorch.org/get-started/locally/.
Let's continue with the installation of the additional dependencies:
- Sklearn;
pip install scikit-learn
- Pandas;
pip install pandas
- Matplotlib;
pip install matplotlib
Libraries like Numpy are automatically install when you install PyTorch.
Import and explore the dataset
Let's start by importing the installed libraries and breast cancer dataset from Sklearn with the following code snippet
import torch
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
breast_cancer_dataset = load_breast_cancer(as_frame=True, return_X_y=True)
Let's create a dataframe dedicated to holding our X and y like this
df = breast_cancer_dataset[0]
df['target'] = breast_cancer_dataset[1]
df

Our goal is to create a model that can predict the target column based on the characteristics in the other columns.
Let's go do a minimum of exploratory analysis to get some awareness of the dataset. We will use the sweetviz library to automatically create an analysis report.
We can install sweetviz with the command pip install sweetviz
and create an EDA (exploratory data analysis) report with this piece of code
import sweetviz as sv
eda_report = sv.analyze(df)
eda_report.show_notebook()

Sweetviz will create a report right in our notebook for us to explore.

We see how several columns are highly associated with a value of 0 or 1 of our target column.
Being a multidimensional dataset and having variables with different distributions, a neural network is a valid option to model this data. That said, this dataset can also be modeled by simpler models, such as decision trees.
We will now import two other libraries in order to visualize the dataset. We will use PCA (Principal Component Analysis) from Sklearn and Seaborn to visualize the multidimensional dataset.
PCA will help us compress the large number of variables into just two, which we will use as the X and Y axis in a Seaborn scatterplot. Seaborn takes an additional parameter called hue to color the dots based on an additional variable. We will use our target.
import seaborn as sns
from sklearn import decomposition
pca = decomposition.PCA(n_components=2)
X = df.drop("target", axis=1).values
y = df['target'].values
vecs = pca.fit_transform(X)
x0 = vecs[:, 0]
x1 = vecs[:, 1]
sns.set_style("whitegrid")
sns.scatterplot(x=x0, y=x1, hue=y)
plt.title("Proiezione PCA")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.xticks([])
plt.yticks([])
plt.show()

We see how class 1 data points group based on common characteristics. It will be the goal of our neural network to classify the rows between targets 0 or 1.
Create the datasets and dataloaders classes
PyTorch provides Dataset
and DataLoader
objects to allow us to efficiently organize and load our data into the neural network.
It would be possible to use pandas directly, but this would have disadvantages because it would make our code less efficient.
The Dataset
class allows us to specify the right format for your data and apply the retrieval and transformation logics that are often fundamental (think of the data augmentation applied to images).
Let's see how to create a PyTorch Dataset
object.
from torch.utils.data import Dataset
class BreastCancerDataset(Dataset):
def __init__(self, X, y):
# create feature tensors
self.features = torch.tensor(X, dtype=torch.float32)
# create label tensors
self.labels = torch.tensor(y, dtype=torch.long)
def __len__(self):
# we define a method to retrieve the length of the dataset
return self.features.shape[0]
def __getitem__(self, idx):
# necessary override of the __getitem__ method which helps to index our data
x = self.features[idx]
y = self.labels[idx]
return x, y
This is a class that inherits from Dataset
and allows the DataLoader
, which we will create shortly, to efficiently retrieve batches of data.
The class takes X and y as input.
Training, validation and test datasets
Before proceeding to the following steps, it is important to create training, validation and test sets.
These will help us evaluate the performance of our model and understand the quality of the predictions.
For the interested reader, I suggest reading the article 6 Things You Should Do Before Training Your Model and what is cross-validation in machine learning to better understand why splitting our data into three partitions is an effective method for performance evaluation.
With Sklearn this becomes easy with the train_test_split
method.
from sklearn import model_selection
train_ratio = 0.50
validation_ratio = 0.20
test_ratio = 0.20
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=1 - train_ratio)
x_val, x_test, y_val, y_test = model_selection.train_test_split(x_test, y_test, test_size=test_ratio/(test_ratio + validation_ratio))
print(x_train.shape, x_val.shape, x_test.shape)
>>> (284, 30) (142, 30) (143, 30)
With this small snippet of code we created our training, validation and test sets according to controllable splits.
Data normalization
When doing deep learning, even for a simple task like binary classification, it is always necessary to normalize our data.
Normalizing means bringing all the values of the various columns in the dataset to the same numerical scale. This helps the neural network converge more effectively and thus make accurate predictions faster.
We will use Sklearn's StandardScaler
.
from sklearn import preprocessing
scaler = preprocessing.StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_val_scaled = scaler.transform(x_val)
x_test_scaled = scaler.transform(x_test)
Notice how fit_trasform
is applied only to the training set, while transform
is applied to the other two datasets. This is to avoid data leakage – when information from our validation or test set is unintentionally leaked into our training set. We want our training set to be the only source of learning, unaffected by test data.
This data is now ready to be input to the BreastCancerDataset
class.
train_dataset = BreastCancerDataset(x_train_scaled, y_train)
val_dataset = BreastCancerDataset(x_val_scaled, y_val)
test_dataset = BreastCancerDataset(x_test_scaled, y_test)
We import the dataloader and initialize the objects.
from torch.utils.data import DataLoader
train_loader = DataLoader(
dataset=train_dataset,
batch_size=16,
shuffle=True,
drop_last=True
)
val_loader = DataLoader(
dataset=val_dataset,
batch_size=16,
shuffle=False,
drop_last=True
)
test_loader = DataLoader(
dataset=test_dataset,
batch_size=16,
shuffle=False,
drop_last=True
)
The power of the DataLoader
is that it allows us to specify whether to shuffling our data and in what number of batches the data should be supplied to the model. The batch size is to be considered a hyperparameter of the model and therefore can impact the results of our inferences.
Neural network implementation in PyTorch
Creating a model in PyTorch might sound complex, but it really only requires understanding a few basic concepts.
- When writing a model in PyTorch, we will use an object-based approach, like with datasets. It means that we will create a class like class
MyModel
which inherits from PyTorch'snn.Module
class. - PyTorch is an autodifferentiation software. It means that when we write a neural network based on the backpropagation algorithm, the calculation of the derivatives to calculate the loss is done automatically behind the scenes. This requires writing some dedicated code that might get confusing the first time around.
I advise the reader who wants to know the basics of how neural networks work to consult the article Introduction to neural networks – weights, biases and activation
Introduction to neural networks – weights, biases and activation
That said, let's see what the code for writing a logistic regression model looks like.
class LogisticRegression(nn.Module):
"""
Our neural network accepts num_features and num_classes.
num_features - number of features to learn from
num_classes: number of classes in output to expect (in this case, 1 or 2, since the output is binary (0 or 1))
"""
def __init__(self, num_features, num_classes):
super().__init__() # initialize the init method of nn.Module
self.num_features = num_features
self.num_classes = num_classes
# create a single layer of neurons on which to apply the log reg
self.linear1 = nn.Linear(in_features=num_features, out_features=num_classes)
def forward(self, x):
logits = self.linear1(x) # pass our data through the layer
probs = torch.sigmoid(logits) # we apply a sigmoid function to obtain the probabilities of belonging to a class (0 or 1)
return probs # return probabilities
Our class inherits from nn.Module
. This class provides the methods behind the scenes that make the model work.
init method
The __init__
method of a class contains the logic that runs when instantiating a class in Python. Here we pass two arguments: the number of features and the number of classes to predict.
num_features
corresponds to the number of columns that make up our dataset minus our target variable, while num_classes
corresponds to the number of results that the neural network must return.
In addition to the two arguments and their class variables, we see the super().__init__()
line. The super function initializes the init method of the parent class. This allows us to have the functionality of nn.Module
within our model.
Always in the init block, we implement a linear layer called self.linear1
, which takes as arguments the number of features and the number of results to return.
forward() method
By writing the forward
method we tell Python to override the same method within PyTorch's nn.Module
parent class. In fact, this method is called when performing a forward pass – that is, when our data passes from one layer to another.
forward
accepts input x which contains the features on which the model will calibrate its performance.
The input passes through the first layer, creating the logits
variable. The logits are the neural network calculations that are not yet converted into probabilities by the final activation function, which in this case is a sigmoid. In fact, they are the internal representation of the neural network before being mapped to a function that allows it to be interpreted.
In this case the sigmoid function will map the logits to probabilities between 0 and 1. If the output is less than 0, then the class will be 0 otherwise it will be 1. This happens in the line self.probs = torch.sigmoid(x)
.
Utility functions for plotting and accuracy calculation
Let's create utility functions to use in the training loop that we will see shortly. These two are used to compute the accuracy at the end of each epoch and to display the performance curves at the end of the training.
def compute_accuracy(model, dataloader):
"""
This function puts the model in evaluation mode (model.eval()) and calculates the accuracy with respect to the input dataloader
"""
model = model.eval()
correct = 0
total_examples = 0
for idx, (features, labels) in enumerate(dataloader):
with torch.no_grad():
logits = model(features)
predictions = torch.where(logits > 0.5, 1, 0)
lab = labels.view(predictions.shape)
comparison = lab == predictions
correct += torch.sum(comparison)
total_examples += len(comparison)
return correct / total_examples
def plot_results(train_loss, val_loss, train_acc, val_acc):
"""
This function takes lists of values and creates side-by-side graphs to show training and validation performance
"""
fig, ax = plt.subplots(1, 2, figsize=(15, 5))
ax[0].plot(
train_loss, label="train", color="red", linestyle="--", linewidth=2, alpha=0.5
)
ax[0].plot(
val_loss, label="val", color="blue", linestyle="--", linewidth=2, alpha=0.5
)
ax[0].set_xlabel("Epoch")
ax[0].set_ylabel("Loss")
ax[0].legend()
ax[1].plot(
train_acc, label="train", color="red", linestyle="--", linewidth=2, alpha=0.5
)
ax[1].plot(
val_acc, label="val", color="blue", linestyle="--", linewidth=2, alpha=0.5
)
ax[1].set_xlabel("Epoch")
ax[1].set_ylabel("Accuracy")
ax[1].legend()
plt.show()
Model training
Now we come to the part where most deep learning newcomers struggle: the PyTorch training loop.
Let's look at the code and then comment it
import torch.nn.functional as F
model = LogisticRegression(num_features=x_train_scaled.shape[1], num_classes=1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
num_epochs = 10
train_losses, val_losses = [], []
train_accs, val_accs = [], []
for epoch in range(num_epochs):
model = model.train()
t_loss_list, v_loss_list = [], []
for batch_idx, (features, labels) in enumerate(train_loader):
train_probs = model(features)
train_loss = F.binary_cross_entropy(train_probs, labels.view(train_probs.shape))
optimizer.zero_grad()
train_loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print(
f"Epoch {epoch+1:02d}/{num_epochs:02d}"
f" | Batch {batch_idx:02d}/{len(train_loader):02d}"
f" | Train Loss {train_loss:.3f}"
)
t_loss_list.append(train_loss.item())
model = model.eval()
for batch_idx, (features, labels) in enumerate(val_loader):
with torch.no_grad():
val_probs = model(features)
val_loss = F.binary_cross_entropy(val_probs, labels.view(val_probs.shape))
v_loss_list.append(val_loss.item())
train_losses.append(np.mean(t_loss_list))
val_losses.append(np.mean(v_loss_list))
train_acc = compute_accuracy(model, train_loader)
val_acc = compute_accuracy(model, val_loader)
train_accs.append(train_acc)
val_accs.append(val_acc)
print(
f"Train accuracy: {train_acc:.2f}"
f" | Val accuracy: {val_acc:.2f}"
)
Unlike TensorFlow, PyTorch requires us to write a training loop in pure Python.
Let's see the procedure step by step:
- We instantiate the model and the optimizer
- We decide on a number of epochs
- We create a for loop that iterates through the epochs
- For each epoch, we set the model to training mode with
model.train()
and cycle through thetrain_loader
- For each batch of the
train_loader
, calculate the loss, bring the calculation of the derivatives to 0 withoptimizer.zero_grad()
and update the weights of the network withoptimizer.step()
At this point the training loop is complete, and if you want you can integrate the same logic on the validation dataloader as written in the code.
Here is the result of the training after the launch of this code

Neural network performance evaluation
We use the previously created utility function to plot loss in training and validation.
plot_results(train_losses, val_losses, train_accs, val_accs)

Our binary classification model quickly converges to high accuracy, and we see how the loss drops at the end of each epoch.
The dataset turns out to be simple to model and the low number of examples does not help to see a more gradual convergence towards high performance by the network.
I emphasize that it is possible to integrate the TensorBoard software into PyTorch to be able to log performance metrics automatically between the various experiments.
Create predictions
We have reached the end of this guide. Let's see the code to create predictions for our entire dataset.
# we transform all our features with the scaler
X_scaled_all = scaler.transform(X)
# transform in tensors
X_scaled_all_tensors = torch.tensor(X_scaled_all, dtype=torch.float32)
# we set the model in inference mode and create the predictions
with torch.inference_mode():
logits = model(X_scaled_all_tensors)
predictions = torch.where(logits > 0.5, 1, 0)
df['predictions'] = predictions.numpy().flatten()
Now let's import the metrics
package from Sklearn which allows us to quickly calculate the confusion matrix and classification report directly on our pandas dataframe.
from sklearn import metrics
from pprint import pprint
pprint(metrics.classification_report(y_pred=df.predictions, y_true=df.target))

And the confusion matrix, which shows the number of correct answers on the diagonal
metrics.confusion_matrix(y_pred=df.predictions, y_true=df.target)
>>> array([[197, 15],
[ 13, 344]])
Here is a small function to create a classification line that separates the classes in the PCA graph
def plot_boundary(model):
w1 = model.linear1.weight[0][0].detach()
w2 = model.linear1.weight[0][1].detach()
b = model.linear1.bias[0].detach()
x1_min = -1000
x2_min = (-(w1 * x1_min) - b) / w2
x1_max = 1000
x2_max = (-(w1 * x1_max) - b) / w2
return x1_min, x1_max, x2_min, x2_max
sns.scatterplot(x=x0, y=x1, hue=y)
plt.title("PCA Projection")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.xticks([])
plt.yticks([])
plt.plot([x1_min, x1_max], [x2_min, x2_max], color="k", label="Classification", linestyle="--")
plt.legend()
plt.show()
And here's how the model separates benign from malignant cells

Conclusions
In this article we have seen how to create a binary classification model with PyTorch, starting from a Pandas dataframe.
We've seen what the training loop looks like, how to evaluate the model, and how to create predictions and visualizations to aid interpretation.
With PyTorch it is possible to create very complex neural networks … just think that Tesla, the manufacturer of electric cars based on AI, uses PyTorch to create its models.
For those who want to start their deep learning journey, learning PyTorch as early as possible becomes a high priority task as it allows you to build important technologies that can solve complex data-driven problems.
If you want to support my content creation activity, feel free to follow my referral link below and join Medium's membership program. I will receive a portion of your investment and you'll be able to access Medium's plethora of articles on Data Science and more in a seamless way.
Recommended Reads
For the interested, here are a list of books that I recommended for each ML-related topic. There are ESSENTIAL books in my opinion and have greatly impacted my professional career. Disclaimer: these are Amazon affiliate links. I will receive a small commission from Amazon for referring you these items. Your experience won't change and you won't be charged more, but it will help me scale my business and produce even more content around AI.
- Intro to ML: Confident Data Skills: Master the Fundamentals of Working with Data and Supercharge Your Career by Kirill Eremenko
- Sklearn / TensorFlow: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurelien Géron
- NLP: Text as Data: A New Framework for Machine Learning and the Social Sciences by Justin Grimmer
- Sklearn / Pytorch: Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python by Sebastian Raschka
- Data Viz: Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Knaflic
Useful Links (written by me)
- Learn how to perform a top-tier Exploratory Data Analysis in Python: Exploratory Data Analysis in Python – A Step-by-Step Process
- Learn the basics of TensorFlow: Get started with TensorFlow 2.0 – Introduction to deep learning
- Perform text clustering with TF-IDF in Python: Text Clustering with TF-IDF in Python