Skip to main content
Engineering LibreTexts

5.3: Time Series Forecasting Methods

  • Page ID
    118203
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives

    By the end of this section, you should be able to:

    • 5.3.1 Analyze a time series by decomposing it into its components, including trend, seasonal and cyclic variation, and residual noise.
    • 5.3.2 Compute various kinds of moving averages, weighted moving averages, and related smoothing techniques.
    • 5.3.3 Describe the ARIMA procedure and how autocorrelation functions can be used to find appropriate parameters.
    • 5.3.4 Use ARIMA to make forecasts, with the aid of Python.

    This section focuses on analysis of a time series into its components for the purpose of building a robust model for forecasting. Although much of the analysis can be done by hand, numerous computer software applications have been developed to handle this task. Ultimately, we will see how Python can be used to decompose a time series and create a general-purpose model called the autoregressive integrated moving average model (commonly known as ARIMA) to make forecasts.

    Detrending the Data and Smoothing Techniques

    Detrending a time series refers to the process of separating the underlying trend component tntn from the time series as a whole through one of two complementary operations: isolating the trend and then removing it or smoothing out the trend leaving only the seasonal and error components. Methods that reduce the error term and flatten out seasonal variations result in a trend-cycle component tntn that still closely follows the long-term ups and downs in the time series. In this case, the other components of the time series can be found by removing the trend component:

    xntn=sn+εnxntn=sn+εn

    On the other hand, there are detrending methods that transform the data into a new time series with a trendline closer to a constant line at the level of 0. In other words, after applying such a transformation, the new time series, znzn, would consist of only the seasonal and noise components:

    zn=sn+εnzn=sn+εn

    Common methods for isolating or removing the trend or trend-cycle component include the following:

    1. Simple moving average (SMA). The simple moving average (SMA) is found by taking the average (arithmetic mean) of a fixed number of consecutive data points. This has the effect of smoothing out short-term variations in the data and isolating the trend. If the data has seasonal variation, then an SMA of the same length can effectively remove the seasonal component completely—there is more about this in Time Series Forecasting Methods.
    2. Weighted moving average (WMA). There are several variations on the moving average idea, in which previous terms of the series are given different weights, thus termed weighted moving average (WMA). One of the most common weighted moving averages is the exponential moving average (EMA), in which the weights follow exponential decay, giving more recent data points much more weight than those further away in time. These techniques are more often used in forecasting as discussed later in this section, but they may also be employed as a smoothing method to isolate a trend in the data.
    3. Differencing. The (first-order) difference of a time series is found by differencing, or taking differences of consecutive terms—that is, the nth term of the difference would be xn+1xnxn+1xn. If the original time series had a linear trend, then the difference would display a relatively constant trend with only noise and seasonal variation present. Similarly, the difference of the difference, or second-order difference, can transform a quadratic trend into a constant trend. Higher-order polynomial trends can be identified using higher-order differences. Thus, we can use differencing to check for the overall trend of a certain type (linear, quadratic, etc.) while at the same time removing the original trend from the time series.
    4. Regression. As mentioned in Example 5.1, linear regression may be applied to a time series that seems to display a uniform upward or downward trend. Using differencing to identify whether there is a trend of some particular order, then linear, quadratic, or higher-order polynomial regression could be used to find that trend.
    5. Seasonal-Trend decomposition using LOESS (STL). Seasonal-Trend decomposition using LOESS (STL) is a powerful tool that decomposes a time series into the trend-cycle, seasonal, and noise components. Later in this section we will find out how to perform STL using Python. (Note: LOESS is a regression technique the fits a low-order polynomial—typically linear or quadratic—to small portions of the data at a time. A full understanding of LOESS is not required for using STL.)

    There are, of course, more complex methods for analyzing and isolating the trends of time series, but in this text we will stick to the basics and make use of technology when appropriate.

    Simple Moving Averages (SMA)

    Recall that we use the term window to refer to consecutive terms of a time series. In what follows, assume that the size of the window is a constant, TT. Given a time series with values (xn)(xn), that is, a sequence (x1, x2, x3,x4,,xN)(x1,x2,x3,x4,,xN), the simple moving average with window size TT is defined as a new series (yn)(yn) with values given by the formula

    yn=xn+xn1+xn2++xnT+1T=1Tk=1TxnT+kyn=xn+xn1+xn2++xnT+1T=1Tk=1TxnT+k

    This formula is not as difficult as it looks. Just think of it as the average of the last TT terms. For example, given the sequence (5, 4, 2, 6, 2, 3, 1, 7), the SMA of length 4 would be found by averaging 4 consecutive terms at a time. Note that the first term of the SMA is y4y4 because at least 4 terms are needed in the average.

    Term k=4k=4:(5,4,2,6,2,3,1,7)y4=5+4+2+64=174=4.25(5,4,2,6,2,3,1,7)y4=5+4+2+64=174=4.25

    Term k=5k=5:(5,4,2,6,2,3,1,7)y5=4+2+6+24=144=3.5(5,4,2,6,2,3,1,7)y5=4+2+6+24=144=3.5

    Term k=6k=6: (5,4,2,6,2,3,1,7)y6=2+6+2+34=134=3.25(5,4,2,6,2,3,1,7)y6=2+6+2+34=134=3.25

    Term k=7k=7: (5,4,2,6,2,3,1,7)y7=6+2+3+14=124=3(5,4,2,6,2,3,1,7)y7=6+2+3+14=124=3

    Term k=8k=8: (5,4,2,6,2,3,1,7)y8=2+3+1+74=134=3.25(5,4,2,6,2,3,1,7)y8=2+3+1+74=134=3.25

    Thus, the SMA of length 4 is the sequence (4.25, 3.5, 3.25, 3, 3.25). Note how the values of the SMA do not jump around as much as the values of the original sequence.

    The number of terms to use in a simple moving average depends on the dataset and desired outcome. If the intent is to reduce noise, then a small number of terms (say, 3 to 5) may be sufficient. Sometimes inconsistencies in data collection can be mitigated by a judicious use of SMA. For example, suppose data is collected on the number of walk-in customers at the department of motor vehicles, which only operates Monday through Friday. No data is collected on the weekends, while the number of walk-ins on Mondays may be disproportionately high because people have had to wait until Monday to use their services. Here, an SMA of 7 days would smooth out the data, spreading the weekly walk-in data more evenly over all the days of the week.

    Example 5.3

    Find the simple moving average of window size 3 for the data from Table 5.1 and compare with the original line graph.

    Answer

    The SMA will begin at index k=3k=3.

    y 3 = 1,848.36 + 2,058.9 + 2,043.94 3 = 1,983.73 y 3 = 1,848.36 + 2,058.9 + 2,043.94 3 = 1,983.73

    y 4 = 2,058.9 + 2,043.94 + 2,238.83 3 = 2,113.89 y 4 = 2,058.9 + 2,043.94 + 2,238.83 3 = 2,113.89

    y 5 = 2,043.94 + 2,238.83 + 2,673.61 3 = 2,318.79 y 5 = 2,043.94 + 2,238.83 + 2,673.61 3 = 2,318.79

    y 6 = 2,238.83 + 2,673.61 + 2,506.85 3 = 2,473.1 y 6 = 2,238.83 + 2,673.61 + 2,506.85 3 = 2,473.1

    y 7 = 2,673.61 + 2,506.85 + 3,230.78 3 = 2,803.75 y 7 = 2,673.61 + 2,506.85 + 3,230.78 3 = 2,803.75

    y 8 = 2,506.85 + 3,230.78 + 3,756.07 3 = 3,164.57 y 8 = 2,506.85 + 3,230.78 + 3,756.07 3 = 3,164.57

    y 9 = 3,230.78 + 3,756.07 + 4,766.18 3 = 3,917.68 y 9 = 3,230.78 + 3,756.07 + 4,766.18 3 = 3,917.68

    y 10 = 3,756.07 + 4,766.18 + 3,839.5 3 = 4,120.58 y 10 = 3,756.07 + 4,766.18 + 3,839.5 3 = 4,120.58

    y 11 = 4,766.18 + 3,839.5 + 4,769.83 3 = 4,458.5 y 11 = 4,766.18 + 3,839.5 + 4,769.83 3 = 4,458.5

    The SMA is graphed together with the original time series in Figure 5.10.

    A line chart titled S&P Index at the End of Year 2013-2023. The X axis has years from 2013 to 2023 and the Y axis ranges from 0 to 6,000. A jagged blue line representing the SP 500 shows a general upward trend  over the past decade. An orange line representing the simple moving average of length 3 starts at 2015 and rises along with the SP 500 line. Both lines show an upward trend, with the S&P 500 experiencing greater volatility. The S&P 500 starts at around 1,500 in 2013, rises to around 2,500 in 2016, then fluctuates between 2,000 and 3,000 until 2021, when it rises sharply to 5,000 in 2023. The SMA (3) line shows a smoother upward trend, starting at around 1,500 in 2013 and reaching around 4,500 in 2023.
    Figure 5.10 Line Chart of the S&P 500 Index Together with the Simple Moving Average of Length 3 (data source: adapted from S&P 500 [SPX] Historical Data)

    Simple moving averages provide an easy way to identify a trend-cycle curve in the time series. In effect, the SMA can serve as an estimate of the tntn component of the time series. In practice, when using SMA to isolate the trend-cycle component, the terms of the SMA are centered, which means that when computing the terms of the SMA, ykyk, the xkxk term shows up right in the middle of the window, with the same number of terms to the left and to the right.

    For example, the SMA found in Example 5.3 may be shifted so that yk=xk1+xk+xk+13yk=xk1+xk+xk+13. This has the effect of shifting the index of the SMA terms back 1 so that the SMA begins at index k=2k=2.

    y2=x1+x2+x33=1,848.36+2,058.9+2,043.943=1,983.73y2=x1+x2+x33=1,848.36+2,058.9+2,043.943=1,983.73

    This procedure works well when the window size is odd, but when it is even, a common practice is to compute two SMAs in which xkExample 5.3, if we wanted to compute a centered SMA of length 4, then the first term would be y3y3, and we would find two averages.

    x1+x2+x3+x44=1,848.36+2,058.9+2,043.94+2,238.834=2,047.51x1+x2+x3+x44=1,848.36+2,058.9+2,043.94+2,238.834=2,047.51

    x2+x3+x4+x54=2,058.9+2,043.94+2,238.83+2,673.614=2,253.82x2+x3+x4+x54=2,058.9+2,043.94+2,238.83+2,673.614=2,253.82

    y3=2,047.51+2,253.822=2,150.67y3=2,047.51+2,253.822=2,150.67

    The average of the two averages is equivalent to the following more efficient formula. First consider the previous example.

    y3=12(x1+x2+x3+x44+x2+x3+x4+x54)=x18+x24+x34+x44+x58y3=12(x1+x2+x3+x44+x2+x3+x4+x54)=x18+x24+x34+x44+x58

    This suggests a general pattern for even window sizes. In summary, we have two closely related formulas for centered SMA, depending on whether TT is even or odd.

    yk=1T(xk(T1)/2++xk+(T1)/2),forToddyk=1T(xk(T1)/2++xk+(T1)/2),forTodd

    yk=xkT/22T+1T2(xkT/2+1++xk+T/21)+xk+T/22T,forTevenyk=xkT/22T+1T2(xkT/2+1++xk+T/21)+xk+T/22T,forTeven

    Of course, there are functions in statistical software that can work out the centered SMA automatically for you.

    Differencing

    In mathematics, the first-order difference of a sequence (xn)=(x1,x2,x3,x4,,xN)(xn)=(x1,x2,x3,x4,,xN) is another sequence, denoted Δ(xk)Δ(xk), consisting of the differences of consecutive terms. That is,

    Δ(xn)=(xn+1xn)=(x2x1,x3x2,x4x3,,xNxN1)Δ(xn)=(xn+1xn)=(x2x1,x3x2,x4x3,,xNxN1)

    Note that there is one fewer term in the difference Δ(xn)Δ(xn) compared to the original time series (xn)(xn). Differencing has the effect of leveling out a uniform upward or downward trend without affecting other factors such as seasonal variation and noise.

    Moreover, the average (arithmetic mean) of the terms of difference series measures of the rate of change of the original time series, much as the slope of the linear regression does. (Indeed, if you have seen some calculus, you might notice that there is a strong connection between differencing and computing the derivative of a continuous function.)

    Example 5.4

    Find the first-order difference time series for the data from Table 5.1 and compare with the original line graph. Find the average of the difference series and compare the value to the slope of the linear regression found in Example 5.1.

    Answer

    Since the original time series from Table 5.1 has N=10Table 5.3.

    Term n S&P Index Term xnxn nthnth Term of Δ(xn)Δ(xn)
    1 1848.36 N/A
    2 2058.9 2058.91848.36=210.542058.91848.36=210.54
    3 2043.94 2043.942058.9=14.962043.942058.9=14.96
    4 2238.83 2238.832043.94=194.892238.832043.94=194.89
    5 2673.61 2673.612238.84=434.782673.612238.84=434.78
    6 2506.85 2506.852673.61=166.762506.852673.61=166.76
    7 3230.78 3230.782506.85=723.933230.782506.85=723.93
    8 3756.07 3756.073230.78=525.293756.073230.78=525.29
    9 4766.18 4766.183756.07=1010.114766.183756.07=1010.11
    10 3839.5 3839.54766.18=926.683839.54766.18=926.68
    11 4769.83 4769.833839.5=930.334769.833839.5=930.33
    Table 5.3 First-Order Difference Time Series for the Data in Table 5.1 (source: adapted from www.nasdaq.com/market-activi...spx/historical)

    Figure 5.11 shows the graph of the original data together with its difference series.

    A line chart titled S&P Index at the End of Year 2013-2023. The X axis has years from 2013 to 2023 and the Y axis ranges from -2,000 to 6,000. The S&P 500 index starts at around 1,900 in 2013, rises to around 2,200 in 2016, then fluctuates between 2,000 and 4,000 until 2020; it rises  to 4,800 in 2021, falls back down to 4,000 in 2022, and rises again to 4,800 in 2023. The first-order difference shows the change in the S&P 500 index from one year to the next, with positive values indicating an increase and negative values indicating a decrease. The first-order difference fluctuates between -2,000 and 1,000, with a general trend of increasing volatility over time.
    Figure 5.11 Comparison of S&P Index Values and Their Difference Series (data source: adapted from S&P 500 [SPX] Historical Data)

    The average value of Δ(xn)Δ(xn) is:

    1 10 ( 210.54 + ( 14.96 ) + 194.89 + 434.78 + ( 166.76 ) + 723.93 + 525.29 + 1,010.11 + ( 926.68 ) + 930.33 ) = 292.15 1 10 ( 210.54 + ( 14.96 ) + 194.89 + 434.78 + ( 166.76 ) + 723.93 + 525.29 + 1,010.11 + ( 926.68 ) + 930.33 ) = 292.15

    Comparing with the slope of the linear regression, which is about 304 (recall Example 5.1), we find the two values are relatively close to each other.

    Differencing can be used to check whether a time series trend curve is essentially linear, quadratic, or polynomial of higher order. First, we need to recall a definition from algebra that is typically applied to polynomial functions but can also be applied to sequences. If a sequence (xn)(xn) is given by a polynomial formula of the form

    xn=f(n)=arnr+ar1nr1++a1n+a0xn=f(n)=arnr+ar1nr1++a1n+a0

    where rr is a whole number, each of the coefficients akak (for k=1,2,3,,nk=1,2,3,,n) is a constant, and the leading term is nonzero (ar0ar0), then the sequence has order or degree rr. (Order or degree refers to the number of terms or lags used in a model to describe the time series.)

    For example, any sequence of the form xn=an+bxn=an+b has order 1 (and may be called linear). A sequence of the form xn=an2+bn+cxn=an2+bn+c has order 2 (and is called quadratic). Quadratic data fits well to a parabola, which would indicate a curving trend in the data.

    If a sequence (xn)(xn) has order r>0r>0, then the difference sequence Δ(xn)Δ(xn) has order r1r1. If the original sequence already is of order 0, which means the sequence is a constant sequence (all terms being the same constant), then the difference sequence simply produces another constant sequence (order 0) in which every term is equal to 0. This important result can be proved mathematically, but we will not present the proof in this text.

    As an example, consider the sequence (xn)=(2,5,8,11,14,17,20)(xn)=(2,5,8,11,14,17,20), which has order 1, since you can find the values using the formula xn=3n1xn=3n1. The difference sequence is Δ(xn)=(3,3,3,3,3,3)Δ(xn)=(3,3,3,3,3,3), which is a constant sequence, so it has order 0. Notice that taking the difference one more time yields the constant 0 sequence, Δ(Δ(xn))=(0,0,0,0,0)Δ(Δ(xn))=(0,0,0,0,0), and taking further differences would only reduce the number of terms, but each term would remain at the value 0. (For simplicity, we will assume that there are always enough terms in a sequence so that the difference is also a non-empty sequence.)

    The second-order difference, Δ2(xn)=Δ(Δ(xn))Δ2(xn)=Δ(Δ(xn)), or difference of the difference, would reduce the order of an rr-order sequence twice to r2r2 if r2r2 or to the constant zero sequence if r1r1. In general, a dd-order difference, defined by Δd(xn)=Δ(Δd1(xn))Δd(xn)=Δ(Δd1(xn)), reduces the order of an rr-order sequence to rdrd if rdrd or to order 0 otherwise. With a little thought, you may realize that we now have a simple procedure to determine the order of a sequence that does not require us to have the formula for xnxn in hand.

    If a sequence (xn)(xn) has a well-defined order (in other words, there is a polynomial formula that is a good model for the terms of the sequence, even if we do not know what that formula is yet), then the order rr is the smallest whole number such that the rr-order difference, Δr+1(xn)Δr+1(xn), is equal to either a constant 0 sequence or white noise.

    Example 5.5

    Determine the order of the sequence (10, 9, 6, 7, 18, 45, 94, 171) using differencing.

    Answer

    Let (xn)(xn) stand for the given sequence.

    First-order difference: Δ(xn)=(1,3,1,11,27,49,77)Δ(xn)=(1,3,1,11,27,49,77)

    Second-order difference: Δ2(xn)=(2,4,10,16,22,28)Δ2(xn)=(2,4,10,16,22,28)

    Third-order difference: Δ3(xn)=(6,6,6,6,6)Δ3(xn)=(6,6,6,6,6)

    Fourth-order difference: Δ4(xn)=(0,0,0,0)Δ4(xn)=(0,0,0,0)

    The given sequence has order 3 since the (3+1)(3+1)-order difference resulted in all zeros.

    We cannot hope to encounter real-world time series data this is exactly linear, quadratic, or any given degree. Every time series will have some noise in addition to possible seasonal patterns. However, the trend-cycle curve may still be well approximated by a polynomial model of some order.

    SMA and Differencing in Excel

    Excel can easily do simple moving averages and differencing. The ability to write one equation in a cell and then drag it down so that it works on all the rows of data is a very powerful and convenient feature. By way of example, let’s explore the dataset MonthlyCoalConsumption.xlsx. First, you will need to save the file as an Excel worksheet, as CSV files do not support equations.

    Let’s include a centered 13-term window SMA (see Figure 5.12). In cell C8, type “=AVERAGE(B2:B14)” as shown in Figure 5.12. Note: There should be exactly 6 rows above and 6 rows below the center row at which the formula is entered.

    A screenshot of an Excel worksheet with two columns of text labeled Month and Value. 13 data points in the Value column (B) are selected and the formula =AVERAGE(B2:B14) is applied in the formula bar.
    Figure 5.12 Screenshot of a 13-term Excel Window SMA (Used with permission from Microsoft)

    Next, click on the square in the lower right corner of cell C8 and drag it all the way down to cell C173, which is 6 rows before the final data value. Column C now contains the SMA time series for your data, as shown in Figure 5.13.

    A screenshot of an Excel worksheet with three columns of text labeled Month, Value, and SMA. Columns A and B have 19 rows of data. Column C, starting in row 8, shows the SMA time series.
    Figure 5.13 Screenshot Showing the SMA Time Series in Column C (Used with permission from Microsoft)

    You can generate the graph of the time series and the SMA together, which is shown in Figure 5.14. Compare your computed SMA with the trendline created by Excel in Figure 5.7.

    A line chart titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. The X axis has months from January 2016 to May 2022. The Y axis ranges from 0 to 80,000. The blue line represents the actual coal consumption, which fluctuates seasonally with peaks in winter and valleys in summer. The orange line represents the trend of coal consumption over time, showing a general downward trend from around 80,000 tons per month in 2016 to around 40,000 tons per month in 2022.
    Figure 5.14 Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022. The SMA of window length 13 has been added to the graph in orange. (data source: https://www.statista.com/statistics/...ion-in-the-us/)

    Let’s now work out the first-order difference. The idea is very similar. Put the formula for the difference, which is nothing more than “=B3-B2,” into cell D3, as in Figure 5.15.

    A screenshot of an Excel worksheet with four columns labeled Month, Value, SMA, and Difference. Cells B2 and B3 are selected and the Difference formula =B3-B2 is applied in the formula bar.
    Figure 5.15 Screenshot of Excel Showing Formula for Difference in Column D (Used with permission from Microsoft)

    Then drag the formula down to the end of the column. The result is shown in Figure 5.16.

    A screenshot of an Excel worksheet with four columns labeled Month, Value, SMA, and Difference. Row D shows that the Difference formula =B3-B2 has been applied to the whole column.
    Figure 5.16 Screenshot of Excel Showing Difference in Column D (Used with permission from Microsoft)

    If you need to compute the second-order differences, just do the same procedure to find the difference of column D. Higher-order differences can easily be computed in the analogous way.

    SMA and Differencing in Python

    Computing an SMA in Python takes only a single line of code. Here is how it works on the Coal Consumption dataset (found in MonthlyCoalConsumption.xlsx), using a window size of 12. (Before, we used a window of 13 because odd-length windows are easier to work with than even-length ones. However, Python easily handles all cases automatically.) The key here is the function rolling() that creates a DataFrame of rolling windows of specified length. Then the mean() command computes the average of each window of data. Computing the first-order difference is just as easy, using the diff() command.

    Python Code

        
        # Import libraries
        import pandas as pd # for dataset management
        import matplotlib.pyplot as plt # for visualizations
        from matplotlib.ticker import MaxNLocator, FuncFormatter # for graph formatting and y-axis formatting
        import matplotlib.dates as mdates # for date formatting
        
        # Read the Excel file into a Pandas DataFrame
        df = pd.read_excel('MonthlyCoalConsumption.xlsx')
        
        # Convert 'Month' column to datetime format
        df['Month'] = pd.to_datetime(df['Month'], format='%m/%d/%Y')
        
        # Calculate a simple moving average with a window size of 12
        df['SMA'] = df['Value'].rolling(window=12).mean()
        
        # Calculate the first-order difference
        df['Diff'] = df['Value'].diff()
        
        # Function to format the Y-axis values
        def y_format(value, tick_number):
          return f'{int(round(value)):,}'
        
        # Create the plot
        plt.figure(figsize=(12, 6))
        
        # Plot the time series, SMA, and difference
        plt.plot(df['Month'], df['Value'], marker='', linestyle='-', color='b', label='Value')
        plt.plot(df['Month'], df['SMA'], marker='', linestyle='-', color='r', label='SMA')
        plt.plot(df['Month'], df['Diff'], marker='', linestyle='-', color='g', label='Diff')
        
        # Set the x-axis limits to start from the first date and end at the maximum date in the data
        plt.xlim(pd.to_datetime('01/01/2016', format='%m/%d/%Y'), df['Month'].max())
        
        # Format the x-axis to show dates as MM/DD/YYYY and set the locator for a fixed number of ticks
        ax = plt.gca()
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
        ax.xaxis.set_major_locator(MaxNLocator(nbins=12)) # Set the number of ticks to a reasonable value
        
        # Manually set the rotation of date labels to 45 degrees
        plt.xticks(rotation=45)
        
        # Title and labels
        plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
        plt.xlabel('Month')
        plt.legend()
        
        # Apply the formatter to the Y-axis
        ax.yaxis.set_major_formatter(FuncFormatter(y_format))
        
        plt.tight_layout() # Adjust layout to ensure labels fit well
        plt.show() 
        

    The resulting output will look like this:

    A screenshot of a line graph Python output labeled A line chart titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. The X axis has months from July 2016 to January 2022. The Y axis ranges from -20,000 to 60,000. The blue line represents the actual coal consumption, which fluctuates seasonally. The red line represents the 3-month simple moving average (SMA) of coal consumption, which smooths out the seasonal fluctuations. The green line represents the difference between the actual consumption and the 3-month SMA, highlighting the seasonal variations. Coal consumption shows a general downward trend from around 80,000 tons per month in 2016 to around 40,000 tons per month in 2022.
    Figure 5.17 Pure Seasonal Variation. The same pattern repeats after 64 time steps, and so the period is 64.

    While it is easy to spot the repeating pattern in a time series that does not have noise, the same cannot be said for the general case. There are several tools available for isolating the seasonal component of a detrended time series.

    Identifying the Period

    First we need to identify the period of the seasonal data. The simplest method is to use prior knowledge about the time series data, along with a visual inspection. Often, a seasonal pattern is exactly that—a pattern that repeats over the course of one year. Thus, the period would be P=1Example 5.2.

    If repetition of a seasonal pattern is not apparent from the data, then it may be discovered using standard statistical measures such as autocorrelation. Autocorrelation is the measure of correlation between a time series and the same time series shifted by some number of time steps. (Correlation and Linear Regression Analysis discussed correlation between two sets of data. Autocorrelation is the same idea applied to the same dataset with a shifted copy of itself.) The size of the shift is called the lag.

    The autocorrelation function (ACF) measures the correlation at each lag starting with 0 or 1. Note, lag 0 always produces an autocorrelation of 1 since the time series is being compared to an exact copy of itself. Thus, we are only interested in the correlation coefficient at positive lags. However, small lags generally show high correlation with the original time series as well, simply because data measured within shorter time periods are generally similar to each other. So, even if the ACF shows relatively large correlation at lags of 1, 2, 3, etc., there may not be a true seasonal component present. The positive lag corresponding to the first significant peak, or autocorrelation value that is much larger than the previous values, may indicate a seasonal component with that period.

    Later, we will see how to compute ACF of a time series using Python, but first let’s explore further topics regarding seasonal components.

    Seasonally Adjusted Data

    Once you have determined the period of the seasonal component, you can then extract snsn from the time series. As usual, there are a number of different ways to proceed.

    Let’s assume that you have detrended the time series so that what remains consists of a seasonal component and noise term:

    zn=xntn=sn+εnzn=xntn=sn+εn

    Moreover, suppose that the period of snsn is PP. Of course, with the noise term still present in the time series, we cannot expect that sn+P=snsn+P=sn holds, but it should be the case that sn+Psnsn+Psn for every nn. In other words, the data should look approximately the same when shifted by PP time steps. Our job now is to isolate one complete period of data that models the seasonal variation throughout the time series. In other words, we want to produce an estimate s^ns^n that is exactly periodic (s^n+P=s^ns^n+P=s^n, for all nn) and models the underlying seasonal component as closely as possible.

    For any given index n<Pn<P, we expect the data points (zn,zn+P,zn+2P,zn+3P,)(zn,zn+P,zn+2P,zn+3P,) to hover around the same value or, in statistical language, to have rather low variance (much lower than the time series as a whole, that is). So it seems reasonable to use the mean of this slice of data to define the value of s^ns^n. For simplicity, assume that the time series has exactly M·PM·P data points (if not, then the data can be padded in an appropriate way). Then the estimate for the seasonal component would be defined for 1n<P1n<P by the formula

    s^n=1M(zn+zn+P+zn+2P+zn+3P++zn+(M1)P)s^n=1M(zn+zn+P+zn+2P+zn+3P++zn+(M1)P)

    As usual, the formula looks more intimidating than the concept that it’s based upon. As previously mentioned, this represents nothing more than the average of all the values of the time series that occur every PP time steps.

    Once s^nForecast Evaluation Methods.)

    Finally, the estimate s^ns^n can then be used in conjunction with the original time series (xn)(xn) to smooth out the seasonal variation. Note that s^ns^n essentially measures the fluctuations of observed data xnxn above or below the trend. We expect the mean of s^ns^n to be close to zero in any window of time. Define the seasonally adjusted data as the time series

    xns^nxns^n

    A seasonally adjusted time series is closely related to the underlying trend-cycle curve, tntn, though generally it contains the noise and other random factors that tntn does not generally include.

    Example 5.6

    The revenue data from Walt’s Water Adventures (see Table 5.2) seems to display seasonality of period 4. Using the level as the trend, detrend the data, compute the seasonal component s^ns^n, and find the seasonally adjusted time series.

    Answer

    First, we need to detrend the data. The level is the average of all 12 observations. Using a calculator or software, the level is found to be L=2,275.42Table 5.4). For example, the first term of the detrended data would be z1=x1L=1,7992,275.42=476.42z1=x1L=1,7992,275.42=476.42.

    There are a total of 12 observations, so there will be M=12/4=3M=12/4=3 periods of size P=4P=4. Using the values of znzn in the formula for s^ns^n, we find:

    s ^ 1 = 1 3 ( z 1 + z 5 + z 9 ) = 1 3 ( ( 476.42 ) + 8.58 + 428.58 ) = 13.08 s ^ 2 = 1 3 ( z 2 + z 6 + z 10 ) = 1 3 ( 1,028.58 + 1,183.58 + 1,573.58 ) = 1,261.92 s ^ 3 = 1 3 ( z 3 + z 7 + z 11 ) = 1 3 ( ( 996.42 ) + ( 311.42 ) + ( 144.42 ) ) = 484.08 s ^ 4 = 1 3 ( z 4 + z 8 + z 12 ) = 1 3 ( ( 1,276.42 ) + ( 681.42 ) + ( 336.42 ) ) = 764.75 s ^ 1 = 1 3 ( z 1 + z 5 + z 9 ) = 1 3 ( ( 476.42 ) + 8.58 + 428.58 ) = 13.08 s ^ 2 = 1 3 ( z 2 + z 6 + z 10 ) = 1 3 ( 1,028.58 + 1,183.58 + 1,573.58 ) = 1,261.92 s ^ 3 = 1 3 ( z 3 + z 7 + z 11 ) = 1 3 ( ( 996.42 ) + ( 311.42 ) + ( 144.42 ) ) = 484.08 s ^ 4 = 1 3 ( z 4 + z 8 + z 12 ) = 1 3 ( ( 1,276.42 ) + ( 681.42 ) + ( 336.42 ) ) = 764.75

    Then repeat this block of 4 values as needed to obtain the entire seasonal component. Finally, the seasonally adjusted time series is the difference between the observed data and seasonal component (see Table 5.4).

    k Quarter Revenues Detrended
    zkzk
    Seas. Comp. s^ks^k Seasonally Adj. xns^nxns^n
    1 2021 Spring 1799 −­476.42 −­13.08 1812.08
    2 2021 Summer 3304 1028.58 1261.92 2042.08
    3 2021 Fall 1279 −­996.42 −­484.08 1763.08
    4 2021 Winter 999 −­1276.42 ­−764.75 1763.75
    5 2022 Spring 2284 8.58 −­13.08 2297.08
    6 2022 Summer 3459 1183.58 1261.92 2197.08
    7 2022 Fall 1964 ­−311.42 ­−484.08 2448.08
    8 2022 Winter 1594 −­681.42 −­764.75 2358.75
    9 2023 Spring 2704 428.58 −­13.08 2717.08
    10 2023 Summer 3849 1573.58 1261.92 2587.08
    11 2023 Fall 2131 ­−144.42 ­−484.08 2615.08
    12 2023 Winter 1939 ­−336.42 ­−764.75 2703.75
    Table 5.4 Quarterly Revenues for Kayak and Canoe Rentals (all columns in $). Columns for detrended data, seasonal component estimate, and seasonally adjusted data are included.

    Figure 5.18 shows the data along with the seasonal component and seasonally adjusted data.

    A line chart titled Quarterly Revenues for Kayak and Canoe Rentals. The X axis has dates from Spring 2021 to Winter 2023. The Y axis ranges from -$1,000 to $5,000. A blue revenues line shows ups and downs from $1,000 to $4,000 with peaks in the summers. A green line with the same shape as the blue line represents the seasonal component, from -$1,000 to a $1,000. An orange seasonal adjustment line is relatively flat from $2,000 to $3,000.
    Figure 5.18 Quarterly Revenues for Kayak and Canoe Rentals. Seasonal component and seasonally adjusted time series are included on the graph.

    Decomposing a Time Series into Components Using Python

    There is a powerful Python library known as statsmodels that collects many useful statistical functions together. We will use the STL module to decompose the time series MonthlyCoalConsumption.xlsx into its components. First, we plot an ACF (autocorrelation function) to determine the best value for periodicity.

    Python Code

        # Import libraries
        from statsmodels.graphics.tsaplots import plot_acf
        
        # Read the Excel file into a Pandas Dataframe
        df = pd.read_excel('MonthlyCoalConsumption.xlsx')
        
        # Plot the ACF
        plot_acf(df['Value'], lags=20, title='ACF');
        

    The resulting output will look like this:

    ACF plot with 21 lags. The y-axis ranges from -1 to 1 and the x-axis from 0 to 20. The plot shows autocorrelation values for each lag, with confidence intervals indicated by shaded areas.
    Table 5.5 EMA Models (source: adapted from www.nasdaq.com/market-activi...spx/historical)

    Therefore, we find estimates of 4,349 and 4,541 for the S&P value at the end of 2024, by EMA models with α=12α=12 and α=0.7α=0.7, respectively. Note that larger values of αα emphasize the most recent data, while smaller values allow more of the past data points to influence the average. If the time series trend experiences more variability (due to external factors that we have no control over), then an EMA with a relatively large αα-value would be more appropriate than one with a small αα-value.

    EMA models may also serve as smoothing techniques, isolating the trend-cycle component. In this context, it’s better to use an equivalent form of the EMA model:

    x^n+1=αxn+(1α)x^nx^n+1=αxn+(1α)x^n

    In this form, we build a series of estimates (x^n)(x^n) recursively by incorporating the terms of the original time series one at a time. The idea is that the next estimate, x^n+1x^n+1, is found by weighted average of the current data point, xnxn, and the current estimate, x^nx^n. Note, we may set x^1=x1x^1=x1 since we need to begin the process somewhere. It can be shown mathematically that this procedure is essentially the same as working out the sum of exponentially weight terms directly and is much more efficient.

    Example 5.8

    Use the recursive EMA formula with α=0.75Table 5.1), and then use the EMA model to estimate the value of the S&P Index at the end of the 2024. Plot the EMA together with the original time series.

    Answer

    With α=0.75Table 5.6 shows the results of the computation.

    Year S&P Index at Year-End EMA Estimate
    2013 x1=1848.36x1=1848.36 x^1=x1=1848.36x^1=x1=1848.36
    2014 x2=2058.9x2=2058.9 x^2=0.75x1+0.25x^1=0.75(1848.36)+0.25(1848.36)=1848.36x^2=0.75x1+0.25x^1=0.75(1848.36)+0.25(1848.36)=1848.36
    2015 x3=2043.94x3=2043.94 x^3=0.75x2+0.25x^2=0.75(2058.9)+0.25(1848.36)=2006.27x^3=0.75x2+0.25x^2=0.75(2058.9)+0.25(1848.36)=2006.27
    2016 x4=2238.83x4=2238.83 x^4=0.75x3+0.25x^3=0.75(2043.94)+0.25(2006.27)=2034.52x^4=0.75x3+0.25x^3=0.75(2043.94)+0.25(2006.27)=2034.52
    2017 x5=2673.61x5=2673.61 x^5=0.75x4+0.25x^4=0.75(2238.83)+0.25(2034.52)=2187.75x^5=0.75x4+0.25x^4=0.75(2238.83)+0.25(2034.52)=2187.75
    2018 x6=2506.85x6=2506.85 x^6=0.75x5+0.25x^5=0.75(2673.61)+0.25(2187.75)=2552.15x^6=0.75x5+0.25x^5=0.75(2673.61)+0.25(2187.75)=2552.15
    2019 x7=3230.78x7=3230.78 x^7=0.75x6+0.25x^6=0.75(2506.85)+0.25(2552.15)=2518.17x^7=0.75x6+0.25x^6=0.75(2506.85)+0.25(2552.15)=2518.17
    2020 x8=3756.07x8=3756.07 x^8=0.75x7+0.25x^7=0.75(3230.78)+0.25(2518.17)=3052.63x^8=0.75x7+0.25x^7=0.75(3230.78)+0.25(2518.17)=3052.63
    2021 x9=4766.18x9=4766.18 x^9=0.75x8+0.25x^8=0.75(3756.07)+0.25(3052.63)=3580.21x^9=0.75x8+0.25x^8=0.75(3756.07)+0.25(3052.63)=3580.21
    2022 x10=3839.5x10=3839.5 x^10=0.75x9+0.25x^9=0.75(4766.18)+0.25(3580.21)=4469.69x^10=0.75x9+0.25x^9=0.75(4766.18)+0.25(3580.21)=4469.69
    2023 x11=4769.83x11=4769.83 x^11=0.75x10+0.25x^10=0.75(3839.5)+0.25(4469.69)=3997.05x^11=0.75x10+0.25x^10=0.75(3839.5)+0.25(4469.69)=3997.05
    2024 N/A x^12=0.75x11+0.25x^11=0.75(4769.83)+0.25)3997.05)=4576.63x^12=0.75x11+0.25x^11=0.75(4769.83)+0.25)3997.05)=4576.63
    Table 5.6 EMA Smoothing for the S&P Index Time Series (source: adapted from www.nasdaq.com/market-activi...spx/historical)

    According to this model, the S&P Index will be about 4577 at the end of 2024. Figure 5.19 shows the time series together with the exponential smoothing model.

    A line chart titled S&P Index at the End of Year 2013-2023. The X axis has years from 2013 to 2024 and the Y axis ranges from 0 to 6,000. A blue line representing the SP 500 shows a general upward trend  over the past decade. An orange line representing the exponential smoothing model follows a similar but slightly delayed track.
    Figure 5.19 Exponential Smoothing Model for the S&P Index (data source: adapted from S&P 500 [SPX] Historical Data)

    Note, EMA smoothing can be done very easily in Excel. Simply type in the formulas =B2 into cell C2 and =0.75*B2+0.25*C2 into cell C3. Then drag cell C3 down to the end of the data to compute the EMA. You can also do EMA smoothing in Python using the ewm() function applied to a dataset. Note, the “span” value should be set to 2α12α1, as shown in this code.

    Python Code

        # Import libraries
        import pandas as pd ## for dataset management
        
        # Creating a dictionary with the given data
        data = {
          'Year': [2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023],
          'SP500': [1848.36, 2058.9, 2043.94, 2238.83, 2673.61, 2506.85, 3230.78, 3756.07, 4766.18, 3839.5, 4769.83]
        }
        # Creating DataFrame
        df = pd.DataFrame(data)
        
        alpha = 0.75
        df['ema']=df['SP500'].ewm(span=2/alpha-1, adjust=False).mean()
        
        print(df['ema'])
        

    The resulting output will look like this:

    0   1848.360000
    1   2006.265000
    2   2034.521250
    3   2187.752812
    4   2552.145703
    5   2518.173926
    6   3052.628481
    7   3580.209620
    8   4469.687405
    9   3997.046851
    10  4576.634213
    Name: ema, dtype: float64

    Autoregressive Integrated Moving Average (ARIMA)

    In this section, we develop a very powerful tool that combines multiple models together and is able to capture trend and seasonality. The method is called autoregressive integrated moving average, or ARIMA for short, and it consists of three main components:

    1. Autoregressive (AR) model. This component captures patterns in the time series that are repeated, including seasonal components as well as correlation with recent values.
    2. Integrative (I) component. This component represents the differencing operation discussed in the section on detrending a time series.
    3. Moving average (MA) model. Despite the name, this component is not the same as the moving average discussed previously, although there is a mathematical connection. Instead of focusing on averages of existing data points, MA models incorporate the influence of past forecast errors to predict the next value of the time series.

    Before getting into the specifics of each component of ARIMA, we need to talk about the term stationary, which refers to a time series in which the variance is relatively constant over time, an overall upward or downward trend cannot be found, and no seasonal patterns exist. White noise is an example of a stationary time series; however, the term stationary is a bit more general, as it includes the possibility of a cyclic component in the data. Recall, a cyclic component (despite its name) is a longer-term rise or fall in the values of a time series that may occur from time to time, but the occurrences are unpredictable.

    The AR and MA models work best on stationary time series. The integrative component is the tool that can detrend the data, transforming a nonstationary time series into a stationary one so that AR and/or MA may be used.

    Autoregressive Models

    The autoregressive models take into account a fixed number of previous values of the time series in order to predict the next value. In this way, AR models are much like WMA models previously discussed, in the sense that the next term of the time series depends on a sum of weighted terms of previous values; however, in the AR model, the terms used are the terms of the model itself.

    Suppose we want to model a time series, (xn)(xn), using an AR model. First, we should pick a value pp, called the order of the model, which specifies how many previous terms to include in the model. Then we may call the model an AR(p)AR(p) model. If there is a strong seasonal component of periodpp, then it makes sense to choose that value as the order.

    An AR(p)AR(p) model is a recursive formula of the form

    AR(p)n=w1yn1+w2yn2+w3yn3++wpynp+εnAR(p)n=w1yn1+w2yn2+w3yn3++wpynp+εn

    Here, the parameters or weights, w1,w2,w3,wpw1,w2,w3,wp, are constants that must be chosen so that the values of the time series (yn)(yn) fit the known data as closely as possible (i.e., ynxnynxn for all known observations). There are also certain restrictions on the possible values of each weight c, but we do not need to delve into those specifics in this text as we will rely on software to do the work of finding the parameters for us.

    Integrative Component

    The “I” in ARIMA stands for integrative, which is closely related to the word integral in calculus. Now if you have seen some calculus, you might recall that the integral is just the opposite of a derivative. The derivative, in turn, is closely related to the idea of differencing in a time series. Of course, you don’t have to know calculus to understand the idea of an integrative model.

    Rather than producing another component of the time series, the integrative component of ARIMA represents the procedure of taking differences, Δd(xn)Δd(xn) (up to some specific order dd), until the time series becomes stationary. Often, only the first-order difference is needed, and you can check for stationarity visually. A more sophisticated method, called the Augmented Dickey-Fuller (ADF) test, sets up a test statistic that measures the presence of non-stationarity in a time series. Using ADF, a p-value is computed. If the p-value is below a set threshold (such as p<0.01p<0.01), then the null hypothesis of non-stationarity can be rejected. Next, we will show how to run the ADF test on a time series and its differences Δd(xn)Δd(xn) to find out the best value for dd in ARIMA.

    Then the AR and MA models can be produced on the resulting stationary time series. Integrating (which is the opposite of differencing) builds the model back up so that it can ultimately be used to predict values of the original time series (xn)(xn).

    Moving Average Models

    An MA model of order qq, or MA(q)MA(q), takes the form

    MA(q)n=L+εn+b1εn1+b2εn2+b3εn3++bqεnqMA(q)n=L+εn+b1εn1+b2εn2+b3εn3++bqεnq

    Here, the εnεn terms are residuals (forecast errors), and LL is the level of the time series (mean value of all the terms). Again, there are a multitude of parameters that need to be chosen in order for the model to make sense. We will rely on software to handle this in general.

    ARIMA Model in Python

    The full ARIMA model, ARIMA(p,d,q)ARIMA(p,d,q), incorporates all three components—autoregressive of order pp, integrative with d-order differencing, and moving average of order qq. For the time series MonthlyCoalConsumption.xlsx, it turns out that the second-order difference is stationary, as shown by a series of ADF tests.

    Python Code

        # Import libraries
        import pandas as pd ## for dataset management
        # Import adfuller from the statsmodels library
        from statsmodels.tsa.stattools import adfuller
        
        # Read data
        df = pd.read_excel('MonthlyCoalConsumption.xlsx')
        
        print('d = 0')
        result = adfuller(df['Value'])
        print('ADF Statistic:', result[0])
        print('p-value:', result[1])
        
        ## Get the first difference
        print('\n d = 1')
        df1 = df['Value'].diff().dropna()
        result = adfuller(df1)
        print('ADF Statistic:', result[0])
        print('p-value:', result[1])
        
        ## Get the second difference
        print('\n d = 2')
        df2 = df1.diff().dropna()
        result = adfuller(df2)
        print('ADF Statistic:', result[0])
        print('p-value:', result[1])
        

    The resulting output will look like this:

    d = 0
    ADF Statistic: -1.174289795062662
    p-value: 0.6845566772896323
    
    d = 1
    ADF Statistic: -1.7838215415905252
    p-value: 0.38852291349101137
    
    d = 2
    ADF Statistic: -8.578323839498081
    p-value: 7.852009937900099e-14

    Here, a p-value of 7.852×10147.852×1014 indicates that the second-order difference is stationary, so we will set d=2d=2 in the ARIMA model. An ACF plot of the second difference can be used to determine pp.

    Python Code

        # Import libraries
        from statsmodels.graphics.tsaplots import plot_acf
        
        # Plot the ACF of the second difference
        plot_acf(df2, lags=20, title='ACF for d = 2 Difference');
         
    ACF plot labeled ACF for d = 2 Difference with 21 lags showing autocorrelation values for differenced data. The y-axis ranges from -1 to 1 and the x-axis from 0 to 20. Confidence intervals are indicated by shaded areas.

    We see a peak at lag 12, corresponding to the expected yearly seasonal variation, so we will set p=12p=12 in the ARIMA model. For this time series, positive values of qq did not change the accuracy of the model at all, so we set q=0q=0 (MA model component unnecessary). The model was used to forecast 24 time steps (2 years) into the future. We are using the Python library statsmodels.tsa.arima.model to create the ARIMA model.

    Python Code

        # Import ARIMA model from statsmodels.tsa library
        from statsmodels.tsa.arima.model import ARIMA
        import matplotlib.pyplot as plt ## for plotting graphs
        from matplotlib.ticker import MaxNLocator ## for graph formatting
        from matplotlib.ticker import FuncFormatter ## for formatting y-axis
        
        # Fit an ARIMA model
        p = 12; d = 2; q = 0 # ARIMA component orders
        model = ARIMA(df['Value'], order=(p, d, q))
        results = model.fit()
        
        # Make forecasts
        fc = results.forecast(steps=24)
        
        # Plot the results
        plt.figure(figsize=(10, 6))
        
        # Plot original time series
        plt.plot(df['Value'], label='Original Time Series')
        
        # Plot fitted and forecasted values
        plt.plot(results.fittedvalues, color='red', label='Fitted Values')
        plt.plot(fc, color='red', linestyle='dashed', label='Forecasted Values') ## Forecasted values
        
        # Set labels and legend
        plt.xlabel('Months')
        plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
        plt.legend()
        
        # Function to format the Y-axis values
        def y_format(value, tick_number):
          return f'{value:,.0f}'
        
        # Apply the formatter to the Y-axis
        plt.gca().yaxis.set_major_formatter(FuncFormatter(y_format))
        plt.show() 
        

    The resulting output will look like this:

    Time series plot titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. Y-axis ranges from 0 to 100,000, x-axis from 0 to 100. Original data, fitted values, and forecasted values are plotted. The blue line represents the actual coal consumption, which fluctuates seasonally. The red line represents the fitted values, which smooth out the seasonal fluctuations, and the dashed red line represents the forecasted values. Coal consumption shows a general downward trend from around 100,000 tons per month in 2016 to around 20,000 tons per month in 2022.

    Here, the blue curve is the original time series, and the red curve represents the ARIMA model, with forecasted values indicated by the dashed part of the curve. Although there is considerable error near the beginning of the time period, the model seems to fit the known values of the data fairly well, and so we expect that the future forecasts are reasonable.


    This page titled 5.3: Time Series Forecasting Methods is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.