Python: Computing Integrals the Right Way
When dealing with scientific computing or physics problems, it is very common to compute the integral of some functions.

In this short post, I want to demonstrate 3 ways you can compute the integral of a 1D function in python. We will split the approaches in 2 cases :
- First case: computing integral of a sampled function
- Second case: computing integral of a generic function
In the first case, the function we want to integrate has already been sampled to some sample points, and we don't have access to the underlying "true" function. For example, we don't know the "formula" of that function, and we cannot sample any other point of that function. In other words, we only have arrays of x and corresponding y values.
In the second case, we will consider that we have a function object, to which we can pass a sample point and it'll return the value of that function at that point. This is the ideal approach because we have access to all the information of the function.
Note that we can always use the sampled approach: if we still have access to the function, we can choose sample points and use them to sample the function and use the first approach. But as we will see, the choice of those sample points is precisely the important part to compute clean integrals.
Integral definition
In this first post, we will focus on a single variable function that returns another single value. The integral we want to compute is the standard definition:

Now our goal is to compute I as accurately as possible – within the limits of what we know of "f". As an example, we'll try to compute the following integral:

Using basic math, you can show that the value of that integral is

In this post, we'll see different approaches to compute the integral, and see how close we are to the true value.
First approach: Integrate sampled functions
Let's say we retrieve data from a sampled function, in the form of an X array and the corresponding values in a Y array. How would you compute the integral under that (X,Y) curve ?
As the function is already sampled, we are not dealing with a continuous signal but with a discrete signal. In other words, we don't know "f", we only know a sequence of values of f:

Hence, it seems logical to transpose the integral formula using discrete notations, where the integral symbol turns into a discrete symbol, and ‘dx' turns into the distance between each x samples:

Seems logical, right ? Let's see how that works for our example. To compute the differences of the xs, we'll use numpy's diff function. To simplify, we'll use equally sampled xs with numpy's linspace function:
import numpy as np
def f(x):
return 2*x + np.cos(2*np.pi/4*x)
expected_value = 1 + 2/np.pi
N = 101
xs = np.linspace(0, 1, N)
ys = f(xs)
I = np.sum(np.diff(xs) * ys[1:])
print(f"Integral estimated to: {I}")
print(f"Error: {I/expected_value-1:.4%}")
Integral estimated to: 1.641606682344361
Error: 0.3047%
So only 0.3% error is pretty good, we should be able to improve the integral using more points right ?
N = 1000
xs = np.linspace(0, 1, N)
ys = f(xs)
I = np.sum(np.diff(xs) * ys[1:])
print(f"Integral estimated to: {I}")
print(f"Error: {I/expected_value-1:.4%}")
Integral estimated to: 1.6371201417061898
Error: 0.0306%
Better, but still not perfect.
The error is due to boundaries effects: notice that we do not use the first value of y (y_0), and the distance between each x (x_i+1 – x_i) is multiplied by the value of f for x_i. This approach is called the "rectangle" rule or "Riemann sum", and corresponds to the following geometry:

It's basically the simplest – and worst—approach to estimate an integral from sampled values.
To improve our approach a good step, we should use instead numpy's trapz that is specifically designed to this problem: it uses the trapezoidal rule to compute the integral from a vector X and vector Y. Basically, it computes the integral using the following geometry:

Let's see how it performs:
I = np.trapz(ys, x=xs)
print(f"Integral estimated to: {I}")
print(f"Error: {I/expected_value-1:.8%}")
Integral estimated to: 1.6366196412056895
Error: -0.00079982% # compared to 0.3047% for the rectangle rul
So about 1e-3% error! It is pretty good compared to .3% of the rectangle rule.
Using more points (assuming we can have more, which is not always possible) we get:
# for N = 1000
Integral estimated to: 1.6366196412056895
Error: -0.00000801%
About 1e-5% error, starting to be pretty decent.
To improve even further, here's what we can do:
- Increase the number of sampled points (if possible): as a general rule, increasing the number of points will always decrease the error, but in various proportions. In other words, increasing the number of samples improves the error down to a certain amount – it becomes less and less efficient. Note that this requires to still have "access" to the function – and if it is the case, you'd better use the second approach. Also, computing the values of f is sometimes time-consuming.
- Make assumptions: when using the rectangle or trapezoidal rules, we make the underlying assumption that the function is "constant" between each x-samples. If we "allow" it, we can make other assumptions, like the function behaves like a polynomial of a certain order, and use that information to compute the integral. Given the fact that we use the "sampled approach", we usually don't have access to more information about the true function, so making different assumptions (constant, polynomial), is not better that the other, it's just different.
In the end, both of these approaches require access and/or more knowledge on the true function. Which is why we'll move on to the second approach!
Second approach: integrate continuous function
In this case, we still have access to the function f.
For cases like that, there are a lot of possibilities to compute integrals. Trying to understand them, how they work, their pros and cons, is a good start to grasp the complexity of those problems.
Which is where scipy.integrate comes into play: this module provides with lots of Integration schemes, and the default settings will most of the time be quite enough.
In our case, we are going to use the quad function, which is the general purpose integration function for 1-dimensional functions.
Let's see how it does for our problem:
Python">from scipy.integrate import quad
I = quad(f, 0, 1)[0] # integrate f between 0 and 1
print("Best integral value", I)
print(f"Error {I/expected_value-1:.16%}")
Best integral value 1.6366197723675815
Error 0.0000000000000000%
Tada! Numerical-precision integration, right out of the box. So remember, rather than sampling the function f yourself, and using a trapezoid to compute the integral, let Scipy do the hard work !
Scipy is truly filled with amazingly useful tools for many applications. Check its documentation once from time to time.
Wrap up
So remember:
- if you only have X and Y vectors sampled from an unknown function f, use numpy's trapz to quickly and safely compute its integral using the trapezoidal rule. It's probably the greatest compromise between complexity and accuracy.
- if you have acces to f, use scipy's quad function to compute the integral: it'll probably give you amazing results out of the box, and you can further customize the integration scheme with parameters if needed.
In the next post, I'll show how to compute integral of complex-valued functions, so stay tuned!
If you're considering joining Medium, use this link to quiclky subscribe and become one of my refered member :
and subscribe to get an notification when I publish new post:
Finaly, you can check out some of my other post, on Fourier transform or linear algebra techniques for datascience:
Fourier-transform for time-series : detrending
PCA/LDA/ICA : a components analysis algorithms comparison
PCA-whitening vs ZCA-whitening : a numpy 2d visual
300-times faster resolution of Finite-Difference Method using numpy