Elliot Activation Function: What Is It and Is It Effective?

Author:Murphy  |  View: 21459  |  Time: 2025-03-23 19:53:22
Elliot Activation Function (Image from Author)

Introduction

Are you in the middle of creating a new machine-learning model and unsure of what activation function you should be using?

But wait, what is an activation function?

Activation Functions allow machine learning models to understand and solve nonlinear problems. Using an activation function in neural networks specifically helps with the passing of the most important information from each neuron to the next. Today, the ReLU Activation Function is generally used in the architecture of Neural Networks, however, that does not necessarily mean it is always the best choice. (Check out my post below on the ReLU and LReLU Activations).

Leaky ReLU vs. ReLU Activation Functions: Which is Better?

I recently came across the Elliot Activation Function which was praised as being a possible alternative to various activation functions, including the Sigmoid and Hyperbolic Tagenet. Today we will run an experiment to test the Elliot Activation Function's performance.

Experiment 1: Test the Elliot Activation Function's performance against the Sigmoid Activation Function and Hyperbolic Tangent Activation Function.

Experiment 2: Test the Elliot Activation Function's performance against the ReLU Activation Function.

The goal is to answer the question: Is the Elliot Activation Function effective or not?

Elliot Activation Function

The Elliot Activation Function will result in an approximation that is relatively close to the Sigmoid and Hyperbolic Tangent Activation Functions. Some have found that Elliot performs calculations 2x faster than the Sigmoid Activation Function [3]. Just like the Sigmoid Activation function, the Elliot Activation Function is constrained between 0 and 1.

Elliot Activation Function (Image from Author)

Experiment

PROBLEM: Keras currently does not have the Elliot Activation Function in its repository.

SOLUTION: We can use the Keras backend and create it ourselves!

def elliot(x):
  return ((.5*x) / (1 + K.abs(x)))

elliot = Activation(elliot)

For this experiment, let's see how the Elliot Activation Function compares to its similar counterparts as well as the ReLU Activation, a fundamental activation function used in neural networks today.

Dataset and Setup

The first step of all Python projects is to import your packages.

import keras.backend as K
from keras.layers import Layer
from keras.layers import Activation
import pandas as pd 
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from keras import layers
from keras import Sequential

The dataset used today is the iris dataset, which can be found here. This dataset is publicly available and is allowed for public use (there is an option to load it into Python through sklearn).

iris = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")

# Preprocess the data
X = iris.iloc[:, :-1].values
y = iris.iloc[:, -1].values

# Encode the categorical output labels
encoder = LabelEncoder()
y = encoder.fit_transform(y)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Next, let's create the four models. These will be fairly simple models. Each will have one layer of 8 neurons and an activation function. The final layer will have 3 neurons and use a Softmax Activation Function.

#Model 1 (Sigmoid)

#Model 1
model = Sequential()
model.add(layers.Dense(8, input_dim=4, activation='sigmoid'))
model.add(layers.Dense(3, activation='softmax'))

#Model 2 (Tanh)
model = Sequential()
model.add(layers.Dense(8, input_dim=4, activation='tanh'))
model.add(layers.Dense(3, activation='softmax'))

#Model 3 (ReLU)
model = Sequential()
model.add(layers.Dense(8, input_dim=4, activation='relu'))
model.add(layers.Dense(3, activation='softmax'))

#Model 4 (Elliot)
model = Sequential()
model.add(layers.Dense(8, input_dim=4, activation=elliot))
model.add(layers.Dense(3, activation='softmax'))

Next, simply train the models and analyze the results.

# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model (Pick the number of epochs)
model.fit(X_train, y_train, epochs=1, batch_size=10)

Results

The results were… surprising, actually. As expected, the Elliot Activation Function produced models with similar performances to those adopting the Sigmoid And Hyperbolic Tangent Activation Functions.

1 Epoch

  • Sigmoid @ 1: Accuracy: 0.3109 | Loss: 2.0030
  • Elliot @ 1: Accuracy: 0.3361 | Loss: 1.0866

At 1 Epoch, the Elliot Activation Function model outperformed the Sigmoid Activation Function model with a 2.61% higher accuracy and an almost 100% decrease in the amount of loss.

10 Epochs

  • Sigmoid @ 10: Accuracy: 0.3529 | Loss: 1.0932
  • Elliot @ 10: Accuracy: 0.6891 | Loss: 0.9434

At 10 epochs, the model with the Elliot Activation Function had an almost 30% higher accuracy with a lower loss compared to the model using the Sigmoid Activation Function.

100 Epochs

  • Sigmoid @ 100: Accuracy: 0.9496 | Loss: 0.4596
  • Elliot @ 100: Accuracy: 0.9580 | Loss: 0.5485

The Sigmoid Model outperformed the Elliot Model, however, it should be noted that their performances were almost exactly the same.

1000 Epochs

  • Sigmoid @ 1000: Accuracy: 0.9832 | Loss: 0.0584
  • Elliot @ 1000: Accuracy: 0.9832 | loss: 0.0433

At 1000 Epochs, the performances of the two different models were almost exactly the same.

Overall, the model using the Elliot Activation Function performed slightly better than the model using the Sigmoid Activation Function.

Elliot versus Hyperbolic Tangent

1 Epoch

  • tanh @ 1 : Accuracy: 0.3361 | loss: 1.1578
  • Elliot @ 1: Accuracy: 0.3361 | Loss: 1.0866

At 1 Epoch, the Elliot Activation Function Model performs the same as the Hyperbolic Tangent Activation Function Model. I expected these functions to produce similar performing models since each similarly constrains the values passed on to the next layer of a neural network.

10 Epochs

  • tanh @ 10: Accuracy: 0.3277 | Loss: 0.9981
  • Elliot @ 10: Accuracy: 0.6891 | Loss: 0.9434

The model with the Elliot Activation Function greatly outperforms the model with the Hyperbolic Tangent Activation Function, just as the Elliot Model did at 10 Epochs when compared to the Sigmoid Model.

100 Epochs

  • tanh @ 100: Accuracy: 0.9916 | Loss: 0.2325
  • Elliot @ 100: Accuracy: 0.9580 | Loss: 0.5485

At 100 epochs, the Hyperbolic Tangent model performs much better than the Elliot model. At higher epochs, the Elliot Activation Function seems to underperform compared to the tanh activation function but let's see how their performances differ at 1000 epochs.

1000 Epochs

  • tanh @ 1000: Accuracy: 0.9748 | Loss: 0.0495
  • Elliot @ 1000: Accuracy: 0.9832 | Loss: 0.0433

Well, at 1000 epochs, the Elliot Activation Function model slightly outperforms the Hyperbolic Tangent Activation Function model.

Overall, I would say that the Hyperbolic Tangent and Elliot Activation Function Models work almost the same in the layers of a neural network. There could be a difference in the time to train a model, however, these models were very simple and time may become a bigger factor with the more data one has as well as the size of the network they are creating.

Elliot versus ReLU

1 Epoch

  • ReLU @ 1: Accuracy: 0.6639 | Loss: 1.0221
  • Elliot @ 1: Accuracy: 0.3361 | Loss: 1.0866

At 1 epoch, the model with the ReLU Activation Function does much better which outlines one observation that the Elliot Activation Function is causing the model to train slower.

10 Epochs

  • ReLU @ 10: Accuracy: 0.6471 | Loss: 0.9413
  • Elliot @ 10: Accuracy: 0.6891 | Loss: 0.9434

Wow! The model containing the Elliot activation actually performed better than the model with the ReLU Activation Function with a 4.2% higher accuracy and 0.21% lower loss.

100 Epochs

  • ReLU @ 100 : Accuracy: 0.9160 | Loss: 0.4749
  • Elliot @ 100: Accuracy: 0.9580 | Loss: 0.5485

Even though the model which adopted the Elliot Activation Function had a higher loss, it was able to achieve higher accuracy by 4.2%. Again, this shows the strength of the Elliot Activation Function when placed within neural networks.

1000 Epochs

  • ReLU @ 1000: Accuracy: 0.9916 | Loss: 0.0494
  • Elliot @ 1000: Accuracy: 0.9832 | Loss: 0.0433

While the model with the Elliot Activation Function did not do better in terms of accuracy, the loss was lower and I was still happy with the results. As shown at 1000 epochs, the Elliot Activation Function was almost just as good as the ReLU Activation Function and with the right problem and hyperparameter tuning, the Elliot Activation Function could the more optimal choice.

Conclusion

Today, we looked at a lesser-known activation function: The Elliot Activation. To test its performance, it was compared against two activation functions that are similar in their shape: the Sigmoid and Hyperbolic Tangent Activation Functions. That trial resulted in the Elliot Function performing the same if not better than using either of those two functions within the body of a neural network. Next, we compared the performance of an Elliot Activation Function to the standard ReLU Activation Function used in Neural Networks today. Of the 4 trials, the model adopting the Elliot Activation Function performed better 50% of the time. For the other trials it underperformed, its performance was still almost exactly the same as the model which was deployed with the ReLU Activation Function. I recommend trying the Elliot Activation Function in your next Neural Network because there is a chance it may perform better!

If you enjoyed today's reading, PLEASE give me a follow and let me know if there is another topic you would like me to explore! If you do not have a Medium account, sign up through my link here! I will receive a small commission when you use my link. Additionally, add me on LinkedIn, or feel free to reach out! Thanks for reading!

  1. Dubey, Shiv Ram, Satish Kumar Singh, and Bidyut Baran Chaudhuri. "A comprehensive survey and performance analysis of activation functions in deep learning." arXiv preprint arXiv:2109.14545 (2021).
  2. Sharma, Sagar, Simone Sharma, and Anidhya Athaiya. "Activation functions in neural networks." towards Data Science 6.12 (2017): 310–316.
  3. https://www.gallamine.com/2013/01/a-sigmoid-function-without-exponential_31.html

Tags: Activation Functions Data Science Deep Learning Machine Learning Python

Comment