Understand SQL Injection and Learn to Avoid It in Python with SQLAlchemy

Author:Murphy | View: 25215 | Time: 2025-03-23 18:58:45

Image by mohamed_hassan (Hosting Web Man) on Pixabay

SQL Injection is one of the most common and also most dangerous web security vulnerabilities which allows hackers to inject malicious SQL code into unvalidated and unsanitized plain SQL queries. It is also a common issue that new developers overlook.

The cause and solution for SQL Injection are actually pretty simple. In this post, we will explore SQL Injection with some simple queries and will pretend to be an attacker to exploit our database. At the end of this post, you will fully understand SQL Injection and will never make the mistake again after realizing its power and danger.

Preparation

As usual, we will create a MySQL database using Docker:

# Create a volume to persist the data.
$ docker volume create mysql8-data

# Create the container for MySQL.
$ docker run --name mysql8 -d -e MYSQL_ROOT_PASSWORD=root -p 13306:3306 -v mysql8-data:/var/lib/mysql mysql:8

# Connect to the local MySQL server in Docker.
$ docker exec -it mysql8 mysql -u root -proot

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.31    |
+-----------+
1 row in set (0.00 sec)

Note that the root user is used for simplicity in this post, but it should never be used directly in our web applications in practice.

Then let's create some database and table to play with. The data set is the same as the one used in the previous series of posts for simplicity.

CREATE DATABASE `data`;

CREATE TABLE `data`.`student_scores` (
  `student_id` smallint NOT NULL,
  `subject` varchar(50) NOT NULL,
  `score` tinyint DEFAULT '0',
  PRIMARY KEY (`student_id`,`subject`),
  KEY `ix_subject` (`subject`),
  KEY `ix_score` (`score`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci
;

INSERT INTO `data`.student_scores
    (student_id, subject, score)
VALUES
    (1, 'Literature', 90),
    (1, 'Math', 60),
    (2, 'Literature', 80),
    (2, 'Math', 80),
    (3, 'Literature', 70),
    (3, 'Math', 95)
;

Since we will use SQLAlchemy in this post for database connection, we need to install the necessary libraries. As usual, it's recommended to create a separate virtual environment so that the libraries won't impact the system also other virtual environments.

conda create --name sql python=3.11
conda activate sql

pip install -U "SQLAlchemy>=2.0.0,<2.1.0"
pip install -U "pymysql>=1.0.0,<1.1.0"
pip install -U "cryptography>=40.0.0,<40.1.0"

Explore SQL Injection

Let's now create a simple function to read some data:

from sqlalchemy import create_engine, text

db_url = "mysql+pymysql://root:root@localhost:13306/data"
engine = create_engine(db_url, pool_size=5, pool_recycle=3600)
conn = engine.connect()

def read_student_scores(student_id):
    sql_text = text(f"""
        SELECT subject, score
        FROM data.student_scores
        WHERE student_id = {student_id}
    """)
    result = list(conn.execute(sql_text))

    print(result)

The read_student_scores() function looks seemingly normal from a simple coding point of view. However, there is a huge security issue that can be exploited by malicious users.

If we use it normally, it will just work normally:

read_student_scores(1)
# [('Literature', 90), ('Math', 60)]

However, it can also return something that is not supposed to be returned by evil users. The first thing to hack is to return all the records, even those that a user is not supposed to see:

read_student_scores('-1 OR 1')
# [('Literature', 90), ('Math', 60), ('Literature', 80), ('Math', 80), ('Literature', 70), ('Math', 95)]

This is possible because the read_student_scores() function doesn't clean and validate the input parameter and simply concatenates the input data with the original query.

This is not uncommon for many developers. Actually, I have seen quite some legacy code written in this way. It is lucky that they have not been hacked before. Or maybe they have …

SQL Injection can be more harmful than shown above and can actually return anything for hackers.

Now, let's pretend we are malicious users and try to get some information that is not supposed to be returned by this function.

The first thing a hacker wants to know is how many columns are returned. It is obvious in this example that two columns are returned. However, when the output is displayed with some user interface, it may not be that obvious.

There are many ways to guess how many columns are returned, two common ones are to use ORDER BY and UNION. Let's see how it works:

read_student_scores('-1 ORDER BY 1')
# []
read_student_scores('-1 ORDER BY 2')
# []
read_student_scores('-1 ORDER BY 3')
# OperationalError: (pymysql.err.OperationalError) (1054, "Unknown column '3' in 'order clause'")

From the above queries, the results, and the errors, we know that two columns are returned.

We can arrive at the same conclusion using UNION:

read_student_scores('-1 UNION SELECT 1')
# OperationalError: (pymysql.err.OperationalError) (1222, 'The used SELECT statements have a different number of columns')
read_student_scores('-1 UNION SELECT 1,2')
# [('1', 2)]

Using UNION we are able to guess the correct number of columns with a smaller number of tests. Actually, UNION is the most commonly used Hacking tool to exploit a database.

Let's try to read something that's not supposed to be returned normally:

Python">read_student_scores('-1 UNION SELECT DATABASE(), @@VERSION')
# [('data', '8.0.31')]

The database name and version are returned!

Let's see something even scarier:

read_student_scores('-1 UNION SELECT user, authentication_string FROM mysql.user')
# [('root', '$A$005$jx1cZx1aj*tx16_aIt.tkx1a0b8,6nT16rTboTxEGJsq8R.xLN1dlygQWOe12XurOijG5v9'), ('mysql.infoschema', '$A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED'), ('mysql.session', '$A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED'), ('mysql.sys', '$A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED'), ('root', '$A$005$x0c=x10gEx7f]gx18WQNnB`Y&I1x18zPIQ3wM3cj43wk4Qq4/Tt88B0ypKrwYLYnD3BpGqfY5')]

The username and authentication strings of all DB users are returned! Using some brute guessing tools, the hacker can hack the passwords in a short time, especially if simple passwords are used.

How to avoid SQL Injection?

Now that we have understood what Sql Injection is and how dangerous it can be, let's see how it can be avoided in practice.

The most efficient way to prevent SQL Injection is to use parametrized queries, which is achieved with the :param_name syntax in SQLAlchemy:

def read_student_scores(student_id):
    sql_text = text("""
        SELECT subject, score
        FROM data.student_scores
        WHERE student_id = :student_id
    """)
    result = list(conn.execute(sql_text, parameters={"student_id": student_id}))

    print(result)

Note that Sqlalchemy 2.0 is used in this post and thus the syntax for specifying parameters will be different from that in SQLAlchemy 1.x (normally 1.4).

Let's see what the evil queries will return with parametrized queries:

read_student_scores('-1 OR 1')
# []
read_student_scores('-1 UNION SELECT DATABASE(), @@VERSION')
# []
read_student_scores('-1 UNION SELECT user, authentication_string FROM mysql.user')
# []

All these evil queries return an empty result, which is much safer than before.

In this post, we have introduced what SQL Injection is, how it works, and how to avoid it using simple examples.

Even though using parameterized queries can prevent most instances of SQL injections, to make our applications even more robust, we should apply the following strategies as well:

Limit the permissions for the DB user to query the database. A root user is used in this example for simplicity, but it should never be used in practice. Actually, we should create a dedicated DB user (with a strong password) for our web application and only grant minimum permissions for it.
Clean and validate the input queries. If the type of the input data is not consistent or if it contains suspicious characters like hash (#), semicolon (;), minus (-), or even the word UNION, the case should be handled in a safe, robust, but also user-friendly way.
Never show debugging logs directly to end users. Debugging logs should only be used for internal users because they can contain sensitive information that can be used by malicious users to exploit the system.