The Matrix Algebra of Linear Regression in R

Author:Murphy  |  View: 23806  |  Time: 2025-03-23 18:40:44

Introduction

I recently wrote an article that explored the matrix algebra and mathematical operations that sit behind linear regression. Now, while it's certainly important to have a firm grasp of theoretical principles, nothing actually beats doing those calculations. So, in this follow up article, we're going to look at how to implement those matrix operations using R.

This article should be treated as a companion to my earlier post, so if you haven't read it already, I encourage you to check it out; however, if you haven't, you'll still be able to following along.

The Data

As our working example I've chosen the cars dataset from the datasets package in R. It's nice and simple set of data that comprise stopping distances for cars travelling at varying speeds. Accordingly, the dataset contains just two variables: speed and dist. Now, the observations were made during the 1920s so it's certainly not the most current data! Nevertheless, it's perfect for building a simple linear regression model.

First, let's just take a quick look at the data:

> str( cars )
'data.frame': 50 obs. of  2 variables:
 $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
 $ dist : num  2 10 4 22 16 10 18 26 34 17 ...

There's nothing complicated here. We have a total of 50 observations and two numeric variables. The only point to note, perhaps, is that both speed and dist are integer values. This does introduce some discretisation but that doesn't matter so much for now.

To get a sense of the relationship between dist and speed, below I've plotted stopping distance as a function of speed. There's a decent positive association between these variables suggesting that stopping distances increase with higher speeds. I've also overlayed the best fitting regression line using ggplot's geom_smooth function.

Our goal, then, is to estimate the parameters of this line without using the inbuilt lm function. Instead, we're going to apply matrix operations to obtain regression coefficients and then generate fitted values that should fall along this line.

Plot of the stopping distance as a function of speed. From the cars dataset in R (image by author).

A Quick Review

It's worth reminding ourselves what we're actually aiming to do. To that end, we are attempting to model a response vector Y containing n observations as a weighted linear combination of m predictor variables plus an intercept term. For the example data here we only have a single predictor, speed,so we can let m = 1.

The predictor and intercept together form an n × p design matrix, where p = m + 1 reflects __ the number of unknown regression coefficients in the model which must be estimated from data. Estimation requires finding a solution to the following _normal equatio_n:

The normal equation (image by author).

Rearranging the equation to solve for the unknown coefficients, we arrive at the following solution:

Estimation equation (image by author).

where b is a p-dimensional vector containing the best fitting regression coefficients.

In approaching this problem, I'm going to break down the estimation equation into parts that each deal with a different operation, before bringing them back together to estimate the model parameters. In effect, there are three operations we need to do, each of which returns a matrix we need to use at some point.

Matrix Operations in R

Before diving in, I'll just quickly review the matrix operations that we'll be using. There are only three operations that are important for this example. The first is matrix multiplication. If you haven't already come across this, matrix multiplication in R is performed using the %*% operator. The second operation involves finding the inverse of a matrix, which in R is done using the solve function (no, we won't be doing this part by hand!). The final operation is t, which is the transpose operator.

Setting Up

First, we need to define a response vector and create the design matrix. For our toy example, we're modelling distance as a function of speed, so therefore dist is our response variable. Let's assign those values its own vector called y_vec:

# Assign dependent variable to the response vector
y_vec <- cars$dist

Note the $ is used here to access the distcolumn in the cars data frame. I find that this method isn't always convenient, or tidy, so you'll see that I soon switch to using with for some calculations.

We next need to assign speed to the design matrix, which can be done using the model.matrix function. Here I'm creating a variable called x_mat:

# Create a design matrix with the single independent variable
x_mat <- model.matrix( dist ~ speed, data = cars )

Note that the call to model.matrix is very similar to an lm function call. There's obviously a good reason for that, but here the formula just lets model.matrix know that speed is a predictor variable. Let's just take a quick look at what this produces:

> head(x_mat)
  (Intercept) speed
1           1     4
2           1     4
3           1     7
4           1     7
5           1     8
6           1     9

We can see that we have two columns: one for our speed variable and another containing only ones. Recall that the design matrix includes an additional column to accomodate the intercept term.

Now that we have y_vec and x_mat loaded into our workspace, we can move onto the matrix operations.

Step 1

If we look at the right hand side of the estimation equation, the first term is the inverse of the design matrix multiplied by itself. I'm going to deal with the inverse in Step 2, so we first need to compute _X_ᵀX.

For our simple linear regression model here, _X_ᵀX will be square 2 × 2 matrix. For practical purposes I'm just going to denote this as matrix A. Using the matrix multiplication, %*%,and transpose, t , operators to compute this matrix, I'll assign the output to an object called A:

# Compute cross-product of the design matrix
A  <- t( x_mat ) %*% x_mat

Now, in my earlier post we learned a little bit about the elements contained in this matrix by working through the matrix operations. I won't revisit those specific details here and will just note the result below:

The cross product of the design matrix (image by author).

We can see that each element in A is quite easy to interpret, and indeed compute. Notice that a₁₁ is just the total number of observations, n, which is 50 for the cars dataset. The elements along the minor diagonal, a₁₂ and a₂₁, are just the sum of speed, while a₂₂ is the sum of speed squared.

We can do some quick checks to verify that our object A does indeed contain these values. Let's first print A to see what we have:

> A
            (Intercept) speed
(Intercept)          50   770
speed               770 13228

The value at A[1,1] matches the total number of observations in cars so we're off to a good start. We can next check the values in A[1,2] and A[2,1] by simply summing speed:

> sum( cars$speed )
[1] 770

Another match – things are looking good! Finally, we can check the value in A[2,2] by computing the square of speed and summing those values:

> sum( cars$speed ^ 2 )
[1] 13228

That's correct, too!

Step 2

The next step is to find the inverse of the matrix XᵀX. In R, we use the solve function to perform this operation on A. Let's do that first and assign the output to an object called A_inv:

# Inverting the matrix A
A_inv <- solve( A )

Now, before we take a look at A_inv lets first examine the solution for the inverse of XᵀX:

The inverse of the design matrix cross product (image by author).

If we ignore the first term for now, then what we see is that some of the elements in the original matrix A have been swapped around, while others have been transformed slightly. Specifically, elements a₁₂ and a₂₁ have had their signs reversed and now reflect the negative sum of speed . Also, the values at a₁₁ and a₂₂ in the original matrix have swapped locations, so now a₂₂ is the total number of observations and a₁₁ is the __ sum of speed squared.

Let's compile a matrix that contains these adjusted elements and assign it to an object called A_rev , then check that everything worked:

# Shifting the elements in A
n_obs <- nrow( cars )
A_rev <- with(cars, matrix( c(sum( speed ^ 2), -sum( speed ), -sum( speed ), n_obs), nrow = 2))
> A_rev
      [,1] [,2]
[1,] 13228 -770
[2,]  -770   50

Everything looks good! We can now multiply A_rev by the first term which is a scalar constant equal to the reciprocal of n times the sum of squared deviations around the mean of speed. Let's quickly compute the constant term by creating a variable called c:

# Computing the scalar constant for matrix inversion
c <- (n_obs * with( cars, sum( (speed - mean(speed)) ^ 2) )) ^ -1

I won't print this out because it's an exceedingly small value. All that's left to do now is to multiply c and A_rev together and see what we get:

# Computation of the matrix inverse
> c * A_rev
            [,1]         [,2]
[1,]  0.19310949 -0.011240876
[2,] -0.01124088  0.000729927

Let's now compare these values to the solution we got using solve by printing A_inv to the console:

# Print output from solve() function
> A_inv
            (Intercept)        speed
(Intercept)  0.19310949 -0.011240876
speed       -0.01124088  0.000729927

Fantastic – each element in A_inv matches with its corresponding element in A_rev.

Before moving on to the next step there is one additional check we could do. If for some reason we were suspicious that the output from solve was errant in some way, we can lean on the fact that any n × n square matrix A is considered invertible if and only if there exists another square n × n matrix B that results in the following identity:

Condition for matrix invertibility (image by author).

What this implies is, if A_inv is indeed the multiplicative inverse of A then, if we multiply A and A_inv together, we should get back the identity matrix:

> round( A %*% A_inv )
            (Intercept) speed
(Intercept)           1     0
speed                 0     1

It's looks like the solve function is doing sensible things.

Just a note before moving on: these checks on the matrix outputs are not at all necessary for any practical purpose. Rather, what I'm hoping to do by including these little tests is further enhance your understanding about these concepts by actually putting the mathematics into action.

Step 3

Now that we have the inverse of XᵀX taken care of, we can shift our attention to the last term in the estimation equation: XᵀY. This operation involves multiplying the design matrix with the response vector. Given the data we're working with, this computation will produce a p × 1 vector that we'll simply denote as B. Again, we'll just apply the matrix multiplication operator and assign the output to a variable, though this time we call the variable B:

# Matrix multiplying the design matrix and response vector
B <- t( x_mat ) %*% y_vec

As we've done above, let's consider what the expected output of this operation is:

The cross product of the design matrix with the response vector (image by author).

Again, we have some elements that are fairly easy to intuit. One is simply the sum of the response variable dist and the other is just the sum of the product of speed and dist. Let's quickly compute those values and see whether they match up with the values already stored in B:

# Compute vector and compare with B
> with( cars, matrix( c( sum( dist ), sum( speed * dist) ) , ncol = 1 ))
      [,1]
[1,]  2149
[2,] 38482
>
> B
             [,1]
(Intercept)  2149
speed       38482

We have a match!

Parameter Estimation

Alright. We now have everything we need to estimate the model coefficients. In fact, the only objects we need are A_inv and B, and all we need to do is multiply these together:

# Estimate the parameter values using solution to normal equation
b_vec <- A_inv %*% B

Let's print this to the console and see what we have:

> b_vec
                  [,1]
(Intercept) -17.579095
speed         3.932409

Taking a quick look at the slope parameter, we've got a positive value. This is a good sign given the positive association observed earlier and hopefully means we're on the right track. Let's now see how these estimates compare to the coefficients returned using the lm function:

> coef( lm( dist ~ speed, data = cars ) )
(Intercept)       speed 
 -17.579095    3.932409 

Identical!

There's just one last thing I want to point out. I demonstrated in my earlier post that if you do some algebra with the estimation equation you arrive at a very convenient solution. That is, the slope parameter is equivalent to:

The slope parameter (image by author).

and the intercept parameter is equivalent to:

The intercept parameter (image by author).

Both these are very straightforward to compute and can be done in R as follows:

# The slope parameter
b_1 <- with(cars, cov(speed, dist) / var(speed) )

# The intercept parameter
b_0 <- with(cars, mean(dist) - mean(speed) * b_1 )

Printing these objects to console shows that we get the same parameter values:

> c( "intercept" = b_0, "slope" = b_1 )
 intercept      slope 
-17.579095   3.932409

Interpreting the Coefficients

Looking at our estimates, we can see that we have a negative intercept term which doesn't make much sense. The intercept here is the value of dist when speed is set to zero, implying that when the vehicle has zero speed – i.e., it's stantionary – it has a negative stopping distance. I'm not going to address that issue here, so for now we'll leave this alone and just accept that this is reasonable (it's not, though).

The slope parameter, on the other hand, is certainly more useful. What this indicates is that for every unit increase in speed (which is in miles per hour) the stopping distance increase by nearly 4 feet. So, if a vehicle increases its speed from 10mph to 11mph, the distance required to stop jumps from 21.7ft to 25.7ft. However, the same increase in stopping distance would also be expected if the car increased its speed from 60mph to 61mph, which might not be as reasonable.

Computing the Fitted Values

The last thing we need to do is compute the fitted values. To do so, we just grab our parameter estimates held in b_vec and multiply those with the design matrix:

# Compute the fitted values
y_hat <- x_mat %*% b_vec

# Compute the residuals
e_vec <- y_hat - y_vec

I have also computed the error vector in the above code, but this is just to show you how, more than anything. We're not going to be doing anything with those values here.

The very last thing we can do is overlay our fitted values on the plot we produced earlier. All things being good, we should see that our fitted values fall nicely along the regression line:

Overlaying the fitted values on the plot produced earlier (image by author).

And indeed they do.

Wrapping Up

The focus here has very much been on parameter estimation so there is much I haven't touched on. For example, model diagnostics, computing standard errors for the parameter estimates, and model fit measures like the coefficient of determination, . Because this post is intended to be a companion to my earlier article I wanted to avoid introducing new concepts here, but I'll cover those concepts another time. Nevertheless, I hope this article helps you along your Linear Algebra journey in some way. If you want to check out the code used in this article its available at my Git page.


Related Articles


Thanks for reading!

If you enjoyed this post and would like to stay up to date then please consider following me on Medium. This will ensure you don't miss out on any new content.

To get unlimited access to all content consider signing up for a Medium subscription.

You can also follow me on Twitter, LinkedIn, or check out my GitHub if that's more your thing.

Tags: Data Science Linear Algebra Linear Regression Machine Learning R Programming

Comment