Optimization: Geometrical Interpretation of the Newton-Raphson Method

Author:Murphy  |  View: 29729  |  Time: 2025-03-23 12:26:17
Photo by Ansgar Scheffold on Unsplash

Gradient descent is widely regarded as one of the fundamental numerical optimization techniques, and the Newton-Raphson method stands out as a significant component within this domain. This method possesses notable qualities in terms of its simplicity, elegance, and computational power, warranting an in-depth exploration.

Within this article, our objective is to elucidate the geometric principles underlying the functioning of the Newton-Raphson method. This elucidation aims to provide readers with an intuitive understanding of its mechanics and dispel any potential complexities associated with its mathematical foundations.

Subsequently, in order to establish a robust mathematical framework for our discussion, we will delve into the mathematical intricacies of the method, accompanied by a practical implementation in the Python programming language.

Following this presentation, we will distinguish between the two primary applications of the Newton-Raphson method: root finding and optimization. This differentiation will clarify the distinct contexts in which the method finds utility.

To conclude, we will conduct a comparative analysis between the Newton-Raphson method and the gradient descent method, offering insights into their respective strengths and weaknesses.

If you're interested in mathematical concepts and want to learn them quickly thanks to Python, have a look at my book:

Unveiling 70 Mathematical Concepts with Python: A Practical Guide to Exploring Mathematics Through…


A graphical overview

An iterative process for finding root. Image by the author.

Fundamentally, the Newton-Raphson method is an iterative procedure designed for the numerical determination of the root of a mathematical equation. Its primary objective is to ascertain a value, denoted as x_root, such that the equation f(x_root) = 0is satisfied. It is imperative to note that due to its numerical nature, the value of x_rootobtained through this method is an approximation.

The core principle governing this method is as follows: commencing with an initial point denoted as x_0, the aim is to generate a sequence of values, denoted as x_i, which progressively converges towards the sought-after value of x_root. To achieve this, the function is locally approximated in the vicinity of the point (x_i, f(x_i))through a process known as linearization. This approximation is accomplished by computing the tangent line to the curve of the function at the point(x_i, f(x_i)).

The determination of the root for this localized approximation is straightforward: it corresponds to the point of intersection between this tangent line and the x-axis. The abscissa value x_i+1 of this intersection serves as the updated approximation for the root, x_root.

This iterative process is reiterated until the value of f(x)approaches sufficiently close to zero, indicating proximity to the root. The figure provided above serves as an illustrative representation of this process, depicting the tangents drawn at each iteration and their intersections with the x-axis.

In this specific example, the underlying function employed is a polynomial, (x-6)(x-6)(x-6), which possesses a triple root at x=6. It is evident that after a few iterations, approximately 10 in this instance, the method converges to a point in close proximity to the theoretical root, x_root=6.

The code used to draw this figure follows:

<script src="https://gist.github.com/kayhman/0bdf7e11d9fdb738fb3f03b26f5b6fb7.js"></script>

True to my usual practices, I rely on automatic differentiation in the code above, particularly using the JAX library to efficiently compute the derivative of f.

Mathematical derivation of the method

After giving you a feeling of the way the Newton-Raphson method works, it's time to see how this method can be derived mathematically.

Given a function f(x) and a starting point x_ithe idea is to linearize the function at this point (x_i, f(x_i)) . This is exactly what we did in the previous section, by tracing tangent at this point.

Formally speaking, this can be formulated as a first-order Taylor expansion of f at x_i :

First order taylor expansion. Formula by the author.

This formula gives the formula of a linearization of f around x_i , delta being the variable.

Given this formula, following the geometrical method depicted above subsume to compute the delta, the update to add to x_iso that this linearization intersects with the x-axis, i.e. :

Compute x_i+1. Formula by the author.

Newton-Raphson method for roots identification

As mentioned in the introduction, the Newton-Raphson method can be applied to address two primary issues: root finding and function optimization. In this section, we will specifically focus on root finding.

First, let's revisit the concept of root finding. In a geometric context, it essentially entails identifying the abscissas, denoted as xwhere a function fintersects the x-axis. This aligns with the fundamental idea illustrated in the initial figure of this article. From an algebraic standpoint, this translates to the task of determining xsuch that ‘f(x) = 0.'

Consequently, the Newton-Raphson method has been designed to tackle this precise challenge: pinpointing the values of xwhere the curve of a function f intersects the x-axis.

Newton-Raphson method for optimization

As previously elucidated, the primary problem addressed by this method is root finding. However, in many instances, the Newton-Raphson method is not employed to locate roots but rather for optimization. In other words, it is used to identify the values of xat which the function freaches either a maximum or a minimum point.

How is this possible? Initially, it may appear that root finding and optimization are entirely distinct challenges. However, they are intricately connected. This connection arises from the fact that a function reaches its extremum at a point where its slope changes sign, meaning its derivative becomes zero. Consequently, seeking the extremum of a function is equivalent to identifying the root of its derivative.

The application of the Newton-Raphson method to this latter scenario is relatively straightforward. The process involves substituting fwith its derivative f'and replacing f'with its second derivative f''.

The snippet below shows how to adapt the method to maximize the highly non-linear function -(-(x-6)**2 / 3.0 + log(6+x) + 2*sin(x)):

<script src="https://gist.github.com/kayhman/29626769dc51e9d0cae04632c053b3d9.js"></script>

This code finds the x where the function reaches its maximum, by applying the Newton-Raphson method to find the root of f'and plot the successive points explored:

Optimizing a clearly non-linear function. Image by the author.

Geometrical interpretation of optimization using the Newton-Raphson method

There are essentially two ways to interpret the adaptation of the Newton-Raphson method for optimization. The first approach involves applying the geometrical interpretation of the method used for root finding but extending it to the derivative, denoted as f', of the function f. You can find this detailed in the earlier section of this article.

The second, more intriguing perspective asserts that in the context of optimization, the function fis no longer approximated locally by a straight line (i.e., a first-order polynomial) but instead by a second-order polynomial. In a geometric sense, this means that the curve of fis approximated by a parabola. In this scenario, the objective is not to determine the intersection of the local approximation with the x-axis, as we do in root finding, but rather to locate the minimum of this function.

As the saying goes, "an image is worth a thousand words," and the visual representation provided below helps illustrate this concept effectively.

Optimizing by fitting parabolla. Figure by the author.

Basically, the algorithm works as follows:

  • Pick a point
  • Fit a parabola, i.e. a polynomial p(x)=ax**2+bx+cat this point so that it goes through the same point, and has the same first and second derivatives that f.
  • Use abscise x of the extremum of the parabola as a new starting point
  • Iterate

Pythonically speaking, this gives:

<script src="https://gist.github.com/kayhman/2f8707126c0f25dd58322278b691e54d.js"></script>

Please note that this geometrical re-interpretation of the Newton-Raphson method gives exactly the same results ( 7.4074383 in this case) as the previous implementation, and converges as quickly. It's just a bit less efficient as we need to compute the three weights of the polynomial p .

Condition of application

The Newton-Raphson method is a very elegant and powerful method. However, when applying for optimization, it requires some conditions to work.

First, the function to optimize must be twice differentiable. Second, it is necessary to start with a good approximation of the optimal solution, to avoid local minima.

Newton-Raphson is not Gradient Descent

Another very popular method for optimization is the gradient descent. However, they differ in many points.

First, Newton-Raphson, when applied to optimization, requires that the function f is twice differentiable. On the opposite, Gradient Descent only requires first-order differentiation.

Second, Newton-Raphson requires less tuning, as there is no learning rate to configure. The newt point is automatically given by the minimum of the parabola.

Third, this direct estimation of the next point, without the need to set a learning rate ensures faster convergence. More precisely, the convergence is quadratic for the Newton-Raphson method whereas it is only linear for Gradient Descent.

https://www.buymeacoffee.com/guillaumes0

In summary, this article provided an exploration of the Newton-Raphson Method, a fundamental numerical optimization technique, with a focus on its geometrical interpretation. The method's elegance and computational power were highlighted, and the article aimed to elucidate its mechanics in an intuitive manner.

The Newton-Raphson Method was described as an iterative procedure for root finding, where its objective was to approximate a root of a mathematical equation. The core principles were discussed, emphasizing linearization and tangent line intersections as key concepts.

The article then distinguished between the two primary applications of the Newton-Raphson Method: root finding and optimization. Root finding was explained as finding values where a function intersects the x-axis, while optimization was described as finding extrema where a function's derivative is zero. The method's adaptation for optimization, using second-order polynomials to approximate the function, was outlined.

The conditions for applying the Newton-Raphson Method for optimization were mentioned, including the requirement for the function to be twice differentiable and the importance of starting with a good initial approximation.

Lastly, a comparison was made between the Newton-Raphson Method and the Gradient Descent Method in terms of their differentiation requirements, tuning, and convergence speed. The article concluded by emphasizing the unique strengths of the Newton-Raphson Method in optimization.

In a soon-to-come follow-up article, we are going to explore the practical applications of the Newton-Raphson Method in greater detail.

If you're interested in mathematical concepts and want to learn them quickly thanks to Python, have a look at my book:

Unveiling 70 Mathematical Concepts with Python: A Practical Guide to Exploring Mathematics Through…

Tags: Hands On Tutorials Machine Learning Newtons Method Optimization Algorithms Python

Comment