Autocorrelation

Introduction

Autocorrelation is the situation in which time series data for one period is significantly correlated with the time series data for a previous or subsequent period. The term serial correlation helps us to understand what this definition means.

In fact, it's not just correlation between period n and period n+1 but also between period n and period n+2. That is, between current data and at a lagged by one period or two periods or three periods and more.

In this article, I am concerned initially with data that are lagged by one period only. When we discuss the ACF and the PACF, however, we will look at lagging data over several periods.

Why is this a problem?

If a model suffers from autocorrelation, then the interpretations of RSQ, STDERR, TRENDLINE and the COEFFICIENTS of a model are invalid. In other words, the presence of autocorrelation means that we cannot use a model

Let's look at some data and assess the presence of autocorrelation of those data

You can download the spreadsheet I have created for this file by clicking on the link at the end of this page.

The data and analysis in the screenshot below relate to data for amazon.com for the years shown.

Zoom in or open my spreadsheet file to see this image more clearly.

What we are doing here, for data with a lag of one period, is 

taking the revenue data for the 10 years from 2008 to 2017

using regression analysis to find the regression equation for the data where X is Time and Y is revenue

Y = a + bX = 32,961,382.4727 + 16,416.655*X

apply that equation along row 4 to create the predicted values of revenue

in row 5 subtract your predicted revenue from the actual revenue to find the residuals

in row 6 square the residuals for all years

We use those calculations and results to find what is called the Durbin-Watson Statistic, DW:

DW = Sum of Squared Difference of Residuals/Sum of Squared Residuals

Again, loo at the spreadsheet to find the formulas for these two values:

R10=SUMXMY2(R5:Z5,S5:AA5)

R11=SUM(R6:AA6)

So DW=SUMXMY2(R5:Z5,S5:AA5)/SUM(R6:AA6) = 0.6759 in this case

We now need to know that k' is the number of variables we use EXCEPT the intercept, 1 in this case

and n is the number of values we are using, 10 in this case: 10 years' worth of data

dL and dU are taken from tables of DW statistics and they are given to be 0.8790 and 1.3200 respectively

Finally, we can now say whether our model suffers from autocorrelation by using the formula:

R15=IF(R12<R13,"yes",IF(R12>R14,"no","unclear"))

That is, if DW is less than dL, yes, there is autocorrelation in the model

otherwise, if DW is greater than dU, no, there is no autocorrelation in the model

if neither of these two conditions is true, the situation is unclear

That's it: time series, one variable over time and we now know with a lag of one that this model does suffer from autocorrelation.

Two Variable Model with Lag of One Period

Look at the following screenshot in which I use two variables to create my model, X v Y but this time X is NOT time. In fact, X is an income statement variable and Y is a Balance Sheet variable.

In this screenshot, you can see that I have used Other Operating Expenses from the Income Statement and Total Assets from the Balance Sheet as my X and Y variables. The formula I used, starting in cell R20 to find my predictions of Other Operating Expenses is =INTERCEPT($R$18:$AA$18,$R$19:$AA$19)+SLOPE($R$18:$AA$18,$R$19:$AA$19)*R19. see my file to appreciate what I did there.

When you review my spreadsheet file, you will appreciate that I can use ANY of the Income Statement line items and ANY of the Balance Sheet item on the amzn_fins worksheet. I used Data Validation and VLOOKUP() to manage that aspect of this model.

Once I have found the predicted values for my Income Statement chosen variable and my Balance Sheet chosen variable, the rest of the analysis we need to carry out is EXACTLY the same as the first example that we have already discussed. Let me confirm that, in this case, my regression equation is found by using this formula in R20 =INTERCEPT($R$18:$AA$18,$R$19:$AA$19)+SLOPE($R$18:$AA$18,$R$19:$AA$19)*R19, then fill right to AA20

In conclusion, here, we find that there is no lag one autocorrelation for the model using Other Operating Expenses and total Assets. Download and open the file and change the variables, say, to Revenue and Total Stockholders' Equity and confirm that there is autocorrelation in that case.

Two Variable Model with Lag of One Period

Finally, look at the following screenshot and see how we can make the example even more useful by working with three variables: one from each of the Income Statement, the Balance Sheet and the Cash Flow Statement. You need to know how I created my regression equation in this case: I used the LINEST() function, starting in cell R53. I have not labelled the outputs of this function so that is a small and useful bit of research for you: to find out what all of those numbers mean. Please note, there are three cells that say #N/A ... it is not an accident: it means LINEST() has nothing to report in those cells.

What this third model says is that it is not clear whether there is autocorrelation in this model. However, I have not mentioned the Residuals Sign Change entries that you might have seen in rows 9, 25 and 42. This is a method of assessing whether there is an autocorrelation problem with a model by finding how many times the residuals move from being positive to negative and vice versa. The more those signs change, the greater the chance that there is autocorrelation present. In this third example, that technique says it is not present.

Download the Excel file that accompanies this page blogger autocorrelation



Duncan Williamson

19th November 2020

No comments: