How does a decision tree know the next best question to ask from the data?

Author:Murphy | View: 21725 | Time: 2025-03-23 12:02:00

Introduction

Decision trees are versatile machine learning algorithms that can perform both classification and regression tasks. They make decisions by asking questions about the data based on its features, using an IF-ELSE structure to follow a path, that ultimately leads to the final prediction. The challenge is to find out what question to ask at each step of the decision-making process, which is also equivalent to asking how to determine the best split at each decision node.

In this article, we will attempt to build a decision tree for a simple binary classification task. The objective of this article is to understand how an impurity measure (e.g. entropy) is used at each node to determine the best split, eventually constructing a tree-like structure that uses a rule-based approach to get to the final prediction.

To gain intuition behind Entropy and gini impurity (another metric used to measure randomness and determine the quality of split in decision trees), quickly check out this article.

Problem definition and data

Problem: Given its length and weight measurements, predict whether a fish is tuna or salmon.

The challenge is to predict the type (target variable) of fish given its weight and length. This is an example of a binary classification task since there are two possible values of our target variable type i.e., tuna and salmon.

You can download the dataset from here.

It's highly encouraged to code along as you're reading this article to get the maximum understanding

Tags: Data Science Decision Tree Entropy Hands On Tutorials Machine Learning