Time-Series Models for Financial Data: AR, MA & ARIMA

Michael BrenndoerferDecember 9, 202553 min read

Master autoregressive and moving average models for financial time-series. Learn stationarity, ACF/PACF diagnostics, ARIMA estimation, and forecasting.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Time-Series Models for Financial Data

Financial markets generate vast streams of sequential observations: prices, returns, interest rates, volatility measures, and economic indicators. Understanding the temporal structure of these data is essential for forecasting, risk management, and trading strategy development. Time-series models provide a principled framework for capturing the statistical dependencies that exist between observations across time.

Unlike the stochastic differential equations we explored in earlier chapters on Brownian motion and Itô calculus, the models in this chapter operate in discrete time and focus on the autocorrelation structure of observed data. While Black-Scholes assumes returns are independent and identically distributed, empirical evidence from our study of stylized facts reveals that financial returns exhibit serial correlation in their magnitudes (volatility clustering) and occasional predictable patterns in their levels. Time-series models give us the tools to detect, quantify, and exploit these patterns.

This chapter introduces the foundational autoregressive (AR) and moving average (MA) models, then combines them into the ARMA framework for stationary series. We extend to ARIMA models for handling non-stationary data like trending economic indicators or asset prices. Throughout, we develop practical skills for model identification using autocorrelation analysis, parameter estimation, and forecasting with proper uncertainty quantification.

Stationarity: The Foundation of Time-Series Analysis

Before building any time-series model, we must understand stationarity, a property that determines which modeling techniques apply and whether our statistical estimates are meaningful. The concept of stationarity addresses a fundamental challenge in time-series analysis: how can we learn about a process when we observe only a single realization of it? If the statistical properties of a series change arbitrarily over time, then observations from the past tell us nothing reliable about the future. Stationarity provides the stability assumption that makes statistical inference possible.

Think of stationarity as a form of temporal homogeneity. Just as we assume in cross-sectional statistics that observations come from the same underlying distribution, stationarity assumes that the time series comes from a process whose statistical character remains consistent across time. This does not mean the series is constant or predictable. Rather, it means the rules governing its behavior do not change.

Strict and Weak Stationarity

A time series {Yt}\{Y_t\} is strictly stationary if its joint probability distribution is invariant to time shifts. For any set of time indices t1,t2,,tkt_1, t_2, \ldots, t_k and any shift hh, the joint distribution of (Yt1,Yt2,,Ytk)(Y_{t_1}, Y_{t_2}, \ldots, Y_{t_k}) equals that of (Yt1+h,Yt2+h,,Ytk+h)(Y_{t_1+h}, Y_{t_2+h}, \ldots, Y_{t_k+h}). This is a strong requirement rarely verified in practice. Strict stationarity essentially demands that if you were to examine any collection of observations from the series, you could not tell what time period they came from based solely on their statistical properties. The distribution of Monday's return and the following Wednesday's return must be identical to the distribution of any other Monday-Wednesday pair, regardless of when those observations occur.

More useful is weak stationarity (or covariance stationarity), which requires only:\n\n1. Constant mean: E[Yt]=μE[Y_t] = \mu for all tt 2. Constant variance: Var(Yt)=σ2\text{Var}(Y_t) = \sigma^2 for all tt 3. Autocovariance depends only on lag: Cov(Yt,Ytk)=γk\text{Cov}(Y_t, Y_{t-k}) = \gamma_k for all tt

Weak stationarity relaxes the strict requirement by focusing only on the first two moments of the distribution. We do not require the entire distribution to be time-invariant, only the mean, variance, and the covariance structure. This makes weak stationarity much easier to test and work with in practice. The third condition is particularly important: it states that the covariance between observations depends only on how far apart they are in time (the lag kk), not on their absolute position in the series. The covariance between observations 5 periods apart must be the same whether we are looking at periods 1 and 6 or periods 100 and 105.

The autocovariance function γk\gamma_k measures how observations separated by kk periods covary. This function captures the memory of the process, telling us how much information about the current value is contained in values from kk periods ago. The autocorrelation function (ACF) normalizes this to a scale that is easier to interpret:

ρk=γkγ0=Cov(Yt,Ytk)Var(Yt)\rho_k = \frac{\gamma_k}{\gamma_0} = \frac{\text{Cov}(Y_t, Y_{t-k})}{\text{Var}(Y_t)}

where:

  • YtY_t: value of the series at time tt
  • ρk\rho_k: autocorrelation at lag kk
  • γk\gamma_k: autocovariance at lag kk
  • γ0\gamma_0: variance of the series (Var(Yt)\text{Var}(Y_t))
  • Cov(Yt,Ytk)\text{Cov}(Y_t, Y_{t-k}): covariance between observations kk periods apart

The normalization by γ0\gamma_0 (the variance) ensures that autocorrelations are bounded between -1 and 1, just like ordinary correlation coefficients. For a stationary series, ρ0=1\rho_0 = 1 always holds because any observation is perfectly correlated with itself. The constraint ρk1|\rho_k| \leq 1 for all lags follows from the Cauchy-Schwarz inequality. The ACF provides a complete summary of the linear dependence structure of a stationary time series and serves as our primary diagnostic tool for model identification.

Stationarity in Financial Data

Asset prices are almost never stationary: they exhibit trends, drift, and variance that changes over time. The S&P 500 today is systematically higher than it was 50 years ago, violating the constant mean requirement. This non-stationarity is not a modeling failure but reflects genuine economic reality. Over long horizons, stock prices tend to rise because companies generate earnings and reinvest profits, creating compound growth in value. A statistical model that assumed stock prices fluctuate around a fixed mean would fundamentally mischaracterize how equity markets work.

However, returns are often approximately stationary:

rt=PtPt1Pt1(simple return)rt=log(Pt)log(Pt1)(log return)\begin{aligned} r_t &= \frac{P_t - P_{t-1}}{P_{t-1}} && \text{(simple return)} \\ r_t &= \log(P_t) - \log(P_{t-1}) && \text{(log return)} \end{aligned}

where:

  • rtr_t: return at time tt
  • PtP_t: price at time tt
  • Pt1P_{t-1}: price at time t1t-1

The transformation from prices to returns removes the trending behavior. Log returns fluctuate around a relatively stable mean (often close to zero for daily data) with variance that, while exhibiting clustering, maintains a long-run average. Intuitively, while the price level can wander arbitrarily far from any fixed reference point, the percentage changes in price tend to stay within a bounded range. This is why most time-series modeling in finance focuses on returns rather than prices. The transformation to returns is not just a mathematical convenience; it reflects the economic intuition that investors care about percentage gains and losses, not absolute price levels.

Unit Root and Integration

A series is integrated of order dd, written I(d)I(d), if it must be differenced dd times to achieve stationarity. Asset prices are typically I(1)I(1): one difference (the return) yields a stationary series. Economic series like GDP may be I(1)I(1) or I(2)I(2).

Autoregressive (AR) Models

Autoregressive models capture the idea that today's value depends linearly on past values. This concept reflects a fundamental intuition about how many processes evolve: the current state of a system depends on where it was recently. An AR(p) model expresses the current observation as a weighted sum of the previous pp observations plus a noise term. The term "autoregressive" means the series is regressed on its own past values, unlike ordinary regression which uses external variables.

The AR(1) Model

To capture the linear dependence of the current observation on its immediate predecessor, the simplest autoregressive model is AR(1). This model embodies the idea that today's value is pulled toward its previous value, with some random perturbation added. Mathematically:

Yt=c+ϕYt1+εtY_t = c + \phi Y_{t-1} + \varepsilon_t

where:

  • YtY_t: value of the series at time tt
  • Yt1Y_{t-1}: value of the series at time t1t-1
  • cc: constant (intercept)
  • ϕ\phi: autoregressive coefficient
  • εt\varepsilon_t: white noise term with mean 0 and variance σ2\sigma^2

The white noise terms are assumed independent: Cov(εt,εs)=0\text{Cov}(\varepsilon_t, \varepsilon_s) = 0 for tst \neq s. This independence assumption is crucial because it means all the predictable structure in the series comes from the autoregressive relationship, not from patterns in the shocks themselves. The parameter ϕ\phi controls the strength and direction of persistence. When ϕ\phi is positive, high values tend to be followed by high values and low values by low values. When ϕ\phi is negative, the series tends to oscillate, with high values followed by low values and vice versa. The magnitude of ϕ\phi determines how strong this tendency is.

Stationarity Conditions

For AR(1) to be stationary, we require ϕ<1|\phi| < 1. This condition ensures that the influence of past shocks decays over time rather than persisting indefinitely or exploding. To see why this condition is necessary, assume stationarity holds and take expectations of both sides of the AR(1) equation:

E[Yt]=c+ϕE[Yt1]+E[εt]E[Y_t] = c + \phi E[Y_{t-1}] + E[\varepsilon_t]

where:

  • E[]E[\cdot]: expectation operator
  • YtY_t: value of the series at time tt
  • cc: constant term
  • ϕ\phi: autoregressive coefficient
  • εt\varepsilon_t: white noise term

Since E[Yt]=E[Yt1]=μE[Y_t] = E[Y_{t-1}] = \mu under stationarity (the mean must be the same at all time periods), and E[εt]=0E[\varepsilon_t] = 0 by the white noise assumption, we can solve for the unconditional mean:

μ=c+ϕμ(stationarity condition)μ(1ϕ)=c(rearrange terms)μ=c1ϕ(solve for μ)\begin{aligned} \mu &= c + \phi \mu && \text{(stationarity condition)} \\ \mu(1 - \phi) &= c && \text{(rearrange terms)} \\ \mu &= \frac{c}{1-\phi} && \text{(solve for } \mu \text{)} \end{aligned}

where:

  • μ\mu: mean of the process
  • cc: constant term
  • ϕ\phi: autoregressive coefficient

This solution exists only if ϕ1\phi \neq 1. If ϕ=1\phi = 1, the denominator becomes zero and no finite mean exists, indicating a non-stationary process known as a random walk. Moreover, computing the variance requires using the independence of Yt1Y_{t-1} and εt\varepsilon_t. This independence follows from the assumption that εt\varepsilon_t is unpredictable given past information:

Var(Yt)=Var(c+ϕYt1+εt)=ϕ2Var(Yt1)+Var(εt)(independent terms)=ϕ2Var(Yt1)+σ2(simplify)\begin{aligned} \text{Var}(Y_t) &= \text{Var}(c + \phi Y_{t-1} + \varepsilon_t) \\ &= \phi^2 \text{Var}(Y_{t-1}) + \text{Var}(\varepsilon_t) && \text{(independent terms)} \\ &= \phi^2 \text{Var}(Y_{t-1}) + \sigma^2 && \text{(simplify)} \end{aligned}

where:

  • Var(Yt)\text{Var}(Y_t): variance at time tt
  • cc: constant term
  • ϕ\phi: autoregressive coefficient
  • εt\varepsilon_t: white noise term
  • σ2\sigma^2: variance of the white noise term

Under stationarity with Var(Yt)=Var(Yt1)=γ0\text{Var}(Y_t) = \text{Var}(Y_{t-1}) = \gamma_0, we can substitute and solve for the unconditional variance:

γ0=ϕ2γ0+σ2(stationarity condition)γ0(1ϕ2)=σ2(rearrange terms)γ0=σ21ϕ2(solve for γ0)\begin{aligned} \gamma_0 &= \phi^2 \gamma_0 + \sigma^2 && \text{(stationarity condition)} \\ \gamma_0(1 - \phi^2) &= \sigma^2 && \text{(rearrange terms)} \\ \gamma_0 &= \frac{\sigma^2}{1-\phi^2} && \text{(solve for } \gamma_0 \text{)} \end{aligned}

where:

  • γ0\gamma_0: variance of the process (Var(Yt)\text{Var}(Y_t))
  • σ2\sigma^2: variance of the white noise term
  • ϕ\phi: autoregressive coefficient

This expression is positive and finite only if ϕ2<1\phi^2 < 1, meaning ϕ<1|\phi| < 1. When ϕ1|\phi| \geq 1, the variance formula would yield either a negative number (impossible) or infinity (the series explodes). The stationarity condition thus emerges naturally from requiring that both the mean and variance be well-defined finite constants.

ACF of AR(1)

The autocorrelation function of AR(1) decays geometrically, a signature pattern that helps us identify AR processes in empirical data. To derive the autocovariance, we work with the mean-centered series to simplify calculations. The deviation from the mean satisfies Ytμ=ϕ(Yt1μ)+εtY_t - \mu = \phi(Y_{t-1} - \mu) + \varepsilon_t. Multiplying both sides by (Ytkμ)(Y_{t-k} - \mu) for k1k \geq 1 and taking expectations yields the autocovariance at lag kk:

γk=E[(Ytμ)(Ytkμ)](definition of autocovariance)=E[(ϕ(Yt1μ)+εt)(Ytkμ)](substitute deviation form)=ϕE[(Yt1μ)(Ytkμ)]+E[εt(Ytkμ)](linearity of expectation)=ϕγk1+0(uncorrelated error)\begin{aligned} \gamma_k &= E[(Y_t - \mu)(Y_{t-k} - \mu)] && \text{(definition of autocovariance)} \\ &= E[(\phi(Y_{t-1} - \mu) + \varepsilon_t)(Y_{t-k} - \mu)] && \text{(substitute deviation form)} \\ &= \phi E[(Y_{t-1} - \mu)(Y_{t-k} - \mu)] + E[\varepsilon_t(Y_{t-k} - \mu)] && \text{(linearity of expectation)} \\ &= \phi \gamma_{k-1} + 0 && \text{(uncorrelated error)} \end{aligned}

where:

  • γk\gamma_k: autocovariance at lag kk
  • E[]E[\cdot]: expectation operator
  • YtY_t: value of the series at time tt
  • μ\mu: mean of the process
  • ϕ\phi: autoregressive coefficient
  • εt\varepsilon_t: white noise term
  • γk1\gamma_{k-1}: autocovariance at lag k1k-1

The key insight is that E[εt(Ytkμ)]=0E[\varepsilon_t(Y_{t-k} - \mu)] = 0 for k1k \geq 1 because the current shock εt\varepsilon_t is uncorrelated with past values of the series. This recursion γk=ϕγk1\gamma_k = \phi \gamma_{k-1} yields γk=ϕkγ0\gamma_k = \phi^k \gamma_0 by repeated substitution, so the autocorrelation function has an elegant closed form:

ρk=ϕk\rho_k = \phi^k

where:

  • ρk\rho_k: autocorrelation at lag kk
  • ϕ\phi: autoregressive coefficient
  • kk: lag number

When ϕ>0\phi > 0, correlations decay exponentially but remain positive at all lags. This positive persistence means high values tend to be followed by somewhat high values, with the memory gradually fading. When ϕ<0\phi < 0, correlations alternate in sign while decaying in magnitude. This creates an oscillating pattern where high values tend to be followed by low values, then moderately high values, and so on. The geometric decay pattern is the hallmark of AR(1) and provides a clear diagnostic signature in sample ACF plots.

Out[2]:
Visualization
Two plots showing monotonic and alternating ACF decay.
Autocorrelation functions for AR(1) processes with positive and negative coefficients. The left panel displays monotonic geometric decay for positive phi values, indicating persistence, while the right panel shows alternating signs for negative phi values, reflecting oscillatory behavior.
Notebook output

General AR(p) Models

The AR(p) model extends the first-order case to include pp lags, allowing for more complex dependence structures where observations depend on multiple past values:

Yt=c+ϕ1Yt1+ϕ2Yt2++ϕpYtp+εtY_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \cdots + \phi_p Y_{t-p} + \varepsilon_t

where:

  • YtY_t: value of the series at time tt
  • cc: constant term
  • ϕi\phi_i: autoregressive coefficient at lag ii
  • εt\varepsilon_t: white noise error term

The multiple lags can capture more nuanced dynamics. For instance, an AR(2) model might exhibit cyclical behavior if the coefficients are chosen appropriately, which a simple AR(1) cannot produce. Using the lag operator LL where LYt=Yt1L Y_t = Y_{t-1}, we can express the AR(p) model compactly:

(1ϕ1Lϕ2L2ϕpLp)Yt=c+εt(1 - \phi_1 L - \phi_2 L^2 - \cdots - \phi_p L^p) Y_t = c + \varepsilon_t

where:

  • YtY_t: value of the series at time tt
  • LL: lag operator such that LkYt=YtkL^k Y_t = Y_{t-k}
  • ϕi\phi_i: autoregressive coefficient at lag ii
  • cc: constant term
  • εt\varepsilon_t: white noise error term

We write this as Φ(L)Yt=c+εt\Phi(L) Y_t = c + \varepsilon_t where Φ(L)=1ϕ1LϕpLp\Phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p is called the autoregressive polynomial. The lag operator notation provides not just compactness but also mathematical power, allowing us to manipulate time-series equations using algebraic techniques.

Stationarity requires all roots of the characteristic polynomial Φ(z)=0\Phi(z) = 0 to lie outside the unit circle in the complex plane. This generalizes the AR(1) condition ϕ<1|\phi| < 1 to higher-order models. For AR(2) with Yt=c+ϕ1Yt1+ϕ2Yt2+εtY_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \varepsilon_t, the stationarity conditions can be expressed as three inequalities:

ϕ1+ϕ2<1ϕ2ϕ1<1ϕ2<1\begin{aligned} \phi_1 + \phi_2 &< 1 \\ \phi_2 - \phi_1 &< 1 \\ |\phi_2| &< 1 \end{aligned}

where:

  • ϕ1,ϕ2\phi_1, \phi_2: autoregressive coefficients of the AR(2) model

These conditions define a triangular region in the (ϕ1,ϕ2)(\phi_1, \phi_2) parameter space where stationary AR(2) processes exist. Outside this region, the process either explodes or exhibits non-stationary wandering behavior.

Out[3]:
Visualization
Triangular stationarity region in the phi1-phi2 parameter space for AR(2) models.
Stationarity region for AR(2) models defined by the parameters φ₁ and φ₂. The shaded triangular area represents the set of coefficients yielding stationary processes, bounded by φ₁ + φ₂ < 1, φ₂ - φ₁ < 1, and |φ₂| < 1.

Partial Autocorrelation Function (PACF)

The partial autocorrelation at lag kk, denoted αk\alpha_k, measures the correlation between YtY_t and YtkY_{t-k} after removing the linear effects of intermediate lags Yt1,,Ytk+1Y_{t-1}, \ldots, Y_{t-k+1}. This "partial" correlation isolates the direct relationship between observations kk periods apart, controlling for everything in between. The intuition is similar to partial correlation in regression analysis: we want to know how much additional information YtkY_{t-k} provides about YtY_t beyond what is already captured by the intervening observations.

For an AR(p) process, a key identification property holds:

αk={0for kp=0for k>p\alpha_k = \begin{cases} \neq 0 & \text{for } k \leq p \\ = 0 & \text{for } k > p \end{cases}

where:

  • αk\alpha_k: partial autocorrelation at lag kk
  • pp: order of the autoregressive process

This cutoff property follows from the structure of the AR(p) model. Since YtY_t depends directly only on Yt1Y_{t-1} through YtpY_{t-p}, once we account for these intermediate values, YtkY_{t-k} for k>pk > p provides no additional predictive information. The PACF cuts off sharply after lag pp, while the ACF decays gradually. This stark contrast makes the PACF the primary tool for identifying AR order: we look for the lag after which all partial autocorrelations become insignificant.

Moving Average (MA) Models

While AR models express current values as functions of past values, moving average models take a fundamentally different approach by expressing current values as functions of past shocks. The terminology can be confusing because we are not computing averages in the usual sense. Instead, the current observation is modeled as a weighted combination of recent random innovations. This perspective views the time series as arising from the accumulation of shocks, with each shock having both an immediate impact and lingering effects over subsequent periods.

The MA(1) Model

The MA(1) model expresses the current value as the mean plus the current shock and a fraction of the previous shock:

Yt=μ+εt+θεt1Y_t = \mu + \varepsilon_t + \theta \varepsilon_{t-1}

where:

  • YtY_t: value of the series at time tt
  • μ\mu: mean of the process
  • εt\varepsilon_t: white noise shock at time tt
  • θ\theta: moving average coefficient
  • εt1\varepsilon_{t-1}: white noise shock at time t1t-1

Unlike AR models, MA models are always stationary regardless of parameter values. This happens because the model is defined as a finite linear combination of independent random variables, which always has well-defined finite moments. However, we typically impose invertibility conditions for meaningful interpretation and to ensure unique parameter identification, as we will discuss later.

Properties of MA(1)

The mean of the MA(1) process is straightforward to compute by taking expectations:

E[Yt]=E[μ+εt+θεt1]=μ+E[εt]+θE[εt1](linearity of expectation)=μ+0+0(zero mean noise)=μ\begin{aligned} E[Y_t] &= E[\mu + \varepsilon_t + \theta \varepsilon_{t-1}] \\ &= \mu + E[\varepsilon_t] + \theta E[\varepsilon_{t-1}] && \text{(linearity of expectation)} \\ &= \mu + 0 + 0 && \text{(zero mean noise)} \\ &= \mu \end{aligned}

where:

  • E[Yt]E[Y_t]: expected value of the series
  • μ\mu: mean parameter
  • εt\varepsilon_t: white noise shock at time tt
  • θ\theta: moving average coefficient

The variance calculation uses the independence of the white noise terms at different times. Since εt\varepsilon_t and εt1\varepsilon_{t-1} are uncorrelated, their contributions to the variance simply add:

Var(Yt)=Var(μ+εt+θεt1)=Var(εt)+θ2Var(εt1)(independence of shocks)=σ2+θ2σ2(constant variance)=σ2(1+θ2)(factor out σ2)\begin{aligned} \text{Var}(Y_t) &= \text{Var}(\mu + \varepsilon_t + \theta \varepsilon_{t-1}) \\ &= \text{Var}(\varepsilon_t) + \theta^2 \text{Var}(\varepsilon_{t-1}) && \text{(independence of shocks)} \\ &= \sigma^2 + \theta^2 \sigma^2 && \text{(constant variance)} \\ &= \sigma^2(1 + \theta^2) && \text{(factor out } \sigma^2 \text{)} \end{aligned}

where:

  • Var(Yt)\text{Var}(Y_t): variance of the series
  • σ2\sigma^2: variance of the white noise shock εt\varepsilon_t
  • θ\theta: moving average coefficient

For autocovariances, the MA(1) model exhibits a striking feature: only lag 1 is nonzero. To see why, we expand the covariance between adjacent observations:

γ1=E[(Ytμ)(Yt1μ)]=E[(εt+θεt1)(εt1+θεt2)](substitute definition)=E[εtεt1+θεt12+θεtεt2+θ2εt1εt2](expand product)=0+θE[εt12]+0+0(cross-terms are zero)=θσ2\begin{aligned} \gamma_1 &= E[(Y_t - \mu)(Y_{t-1} - \mu)] \\ &= E[(\varepsilon_t + \theta \varepsilon_{t-1})(\varepsilon_{t-1} + \theta \varepsilon_{t-2})] && \text{(substitute definition)} \\ &= E[\varepsilon_t \varepsilon_{t-1} + \theta \varepsilon_{t-1}^2 + \theta \varepsilon_t \varepsilon_{t-2} + \theta^2 \varepsilon_{t-1} \varepsilon_{t-2}] && \text{(expand product)} \\ &= 0 + \theta E[\varepsilon_{t-1}^2] + 0 + 0 && \text{(cross-terms are zero)} \\ &= \theta \sigma^2 \end{aligned}

where:

  • γ1\gamma_1: autocovariance at lag 1
  • θ\theta: moving average coefficient
  • σ2\sigma^2: variance of the white noise shock
  • YtY_t: value of the series at time tt
  • μ\mu: mean of the process
  • εt,εt1,εt2\varepsilon_t, \varepsilon_{t-1}, \varepsilon_{t-2}: white noise shocks

The cross-terms vanish because white noise shocks at different times are uncorrelated by assumption. The only surviving term is θE[εt12]=θσ2\theta E[\varepsilon_{t-1}^2] = \theta \sigma^2, which arises from the single shock εt1\varepsilon_{t-1} that appears in both YtY_t and Yt1Y_{t-1}. For k2k \geq 2, there is no overlap in the ε\varepsilon terms entering YtY_t and YtkY_{t-k}, so γk=0\gamma_k = 0.

The ACF is therefore:

ρk={θ1+θ2if k=10if k>1\rho_k = \begin{cases} \frac{\theta}{1+\theta^2} & \text{if } k = 1 \\ 0 & \text{if } k > 1 \end{cases}

where:

  • ρk\rho_k: autocorrelation at lag kk
  • θ\theta: moving average coefficient

This cutoff property of the ACF is the diagnostic signature of MA models. An MA(q) process has ρk=0\rho_k = 0 for all k>qk > q, meaning the autocorrelation function drops abruptly to zero after lag qq. This sharp cutoff contrasts with the gradual decay of AR models and provides the basis for identifying MA order from sample ACF plots.

Out[4]:
Visualization
Two bar charts showing sharp ACF cutoff for MA(1) and MA(2) processes.
Autocorrelation functions for MA(1) and MA(2) processes. The MA(1) process (left) exhibits significant autocorrelation only at lag 1, while the MA(2) process (right) shows significant values up to lag 2, demonstrating the characteristic cutoff property used for model identification.
Notebook output

General MA(q) Models

The MA(q) model extends to include qq lagged shock terms:

Yt=μ+εt+θ1εt1+θ2εt2++θqεtqY_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \cdots + \theta_q \varepsilon_{t-q}

where:

  • YtY_t: value of the series at time tt
  • μ\mu: constant mean
  • εtk\varepsilon_{t-k}: white noise shock at lag kk
  • θk\theta_k: moving average coefficient at lag kk

Using the lag operator:

Yt=μ+Θ(L)εtY_t = \mu + \Theta(L) \varepsilon_t

where:

  • YtY_t: value of the series at time tt
  • Θ(L)=1+θ1L+θ2L2++θqLq\Theta(L) = 1 + \theta_1 L + \theta_2 L^2 + \cdots + \theta_q L^q: moving average polynomial
  • μ\mu: constant mean
  • εt\varepsilon_t: white noise series

The moving average polynomial Θ(L)\Theta(L) captures the distributed impact of shocks over time. The ACF cuts off after lag qq, while the PACF decays gradually. This is the opposite pattern from AR models. When we observe sample ACF with a sharp cutoff and PACF with gradual decay, this strongly suggests an MA process.

Invertibility

An MA model is invertible if it can be expressed as an AR(\infty) process, meaning we can write current shocks in terms of past observations. For MA(1), invertibility requires θ<1|\theta| < 1. The importance of invertibility goes beyond mathematical elegance. Non-invertible MA models create identification problems: an MA(1) with parameter θ\theta and another with parameter 1/θ1/\theta produce identical autocorrelation structures, making the models observationally equivalent. By convention, we restrict attention to invertible representations to ensure unique parameter estimates.

ARMA Models: Combining AR and MA

Real financial time series often exhibit both autoregressive and moving average characteristics. Neither pure AR nor pure MA models may adequately capture the dependence structure. The ARMA(p,q) model combines both components:

Yt=c+ϕ1Yt1++ϕpYtp+εt+θ1εt1++θqεtqY_t = c + \phi_1 Y_{t-1} + \cdots + \phi_p Y_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \cdots + \theta_q \varepsilon_{t-q}

where:

  • YtY_t: value of the series at time tt
  • cc: constant term
  • ϕi\phi_i: autoregressive coefficients
  • θj\theta_j: moving average coefficients
  • εt\varepsilon_t: white noise terms

In lag operator notation:

Φ(L)Yt=c+Θ(L)εt\Phi(L) Y_t = c + \Theta(L) \varepsilon_t

where:

  • Φ(L)\Phi(L): autoregressive polynomial
  • cc: constant term
  • Θ(L)\Theta(L): moving average polynomial

Why Combine AR and MA?

The principle of parsimony motivates ARMA. Sometimes a pure AR model requires many lags to capture the autocorrelation structure, but a mixed ARMA model achieves the same fit with fewer parameters. This parsimony matters for both statistical and practical reasons. Models with fewer parameters are easier to interpret, more stable to estimate, and less prone to overfitting.

Consider an AR(1) process. Its ACF decays geometrically. An MA(\infty) could replicate this pattern, but that would require infinitely many parameters. Conversely, an MA(1) has an ACF that cuts off after lag 1, but an AR(\infty) could approximate it. ARMA provides flexibility to match complex autocorrelation patterns economically, using a small number of AR and MA terms to capture structures that would require many terms in either pure representation.

ARMA(1,1) Example

The ARMA(1,1) specification combines the first-order properties of both models, providing a rich but still tractable framework:

Yt=c+ϕYt1+εt+θεt1Y_t = c + \phi Y_{t-1} + \varepsilon_t + \theta \varepsilon_{t-1}

where:

  • YtY_t: value of the series at time tt
  • Yt1Y_{t-1}: value of the series at time t1t-1
  • cc: constant term
  • ϕ\phi: autoregressive coefficient
  • θ\theta: moving average coefficient
  • εt\varepsilon_t: white noise error term
  • εt1\varepsilon_{t-1}: white noise error term at time t1t-1

Its ACF starts at:

ρ1=(1+ϕθ)(ϕ+θ)1+θ2+2ϕθ\rho_1 = \frac{(1+\phi\theta)(\phi+\theta)}{1+\theta^2+2\phi\theta}

where:

  • ρ1\rho_1: autocorrelation at lag 1
  • ϕ\phi: AR coefficient
  • θ\theta: MA coefficient

and then decays geometrically following the AR structure:

ρk=ϕρk1for k2\rho_k = \phi \rho_{k-1} \quad \text{for } k \geq 2

where:

  • ρk\rho_k: autocorrelation at lag kk
  • ϕ\phi: autoregressive coefficient
  • ρk1\rho_{k-1}: autocorrelation at lag k1k-1

The MA component affects only the starting value of the autocorrelation sequence, while the AR component determines the rate of decay. The PACF also decays rather than cutting off, which is characteristic of processes with MA components. This makes ARMA models harder to identify than pure AR or MA processes, since neither the ACF nor PACF shows a clean cutoff pattern.

Stationarity and Invertibility

An ARMA(p,q) model is:

  • Stationary if all roots of Φ(z)=0\Phi(z) = 0 lie outside the unit circle
  • Invertible if all roots of Θ(z)=0\Theta(z) = 0 lie outside the unit circle

We typically work only with stationary and invertible representations. Stationarity ensures the process has stable statistical properties, while invertibility ensures unique parameter identification and allows us to recover the underlying shocks from observed data.

ARIMA Models: Handling Non-Stationarity

When the original series is non-stationary, such as asset prices or GDP, we can often achieve stationarity through differencing. Rather than modeling the original series directly, we model the changes. The ARIMA(p,d,q) model applies ARMA(p,q) to a dd-times differenced series, combining differencing to achieve stationarity with ARMA to model the resulting stationary dynamics.

The Differencing Operator

Define the difference operator =1L\nabla = 1 - L, which computes the change from one period to the next:

Yt=YtYt1\nabla Y_t = Y_t - Y_{t-1}

where:

  • \nabla: difference operator
  • YtY_t: value at time tt
  • Yt1Y_{t-1}: value at time t1t-1

For financial prices, first differencing yields returns. If the original price series has a unit root (is I(1)), then returns will be stationary. Second differencing applies the operator twice:

2Yt=(Yt)=Yt2Yt1+Yt2\nabla^2 Y_t = \nabla(\nabla Y_t) = Y_t - 2Y_{t-1} + Y_{t-2}

where:

  • 2Yt\nabla^2 Y_t: second difference of the series
  • YtY_t: value at time tt
  • Yt1Y_{t-1}: value at time t1t-1
  • Yt2Y_{t-2}: value at time t2t-2

Second differencing is occasionally needed for economic series with both a unit root and a time-varying trend, though over-differencing can introduce artificial patterns and should be avoided.

ARIMA(p,d,q) Specification

An ARIMA(p,d,q) model states that dYt\nabla^d Y_t follows an ARMA(p,q). The integrated component handles the transformation to stationarity, while the ARMA component models the resulting stationary dynamics. We can express this using the difference operator or the lag operator explicitly:

Φ(L)dYt=c+Θ(L)εtΦ(L)(1L)dYt=c+Θ(L)εt(substituting =1L)\begin{aligned} \Phi(L) \nabla^d Y_t &= c + \Theta(L) \varepsilon_t \\ \Phi(L)(1-L)^d Y_t &= c + \Theta(L) \varepsilon_t && \text{(substituting } \nabla = 1-L \text{)} \end{aligned}

where:

  • YtY_t: value of the series at time tt
  • \nabla: difference operator
  • Φ(L)\Phi(L): AR polynomial
  • (1L)d(1-L)^d: differencing operator of order dd
  • cc: constant term
  • Θ(L)\Theta(L): MA polynomial
  • εt\varepsilon_t: white noise error term

Common specifications include:

  • ARIMA(0,1,0): Random walk Yt=Yt1+εtY_t = Y_{t-1} + \varepsilon_t
  • ARIMA(0,1,1): Random walk plus MA(1) on returns
  • ARIMA(1,1,0): AR(1) on returns (first differences)
  • ARIMA(1,1,1): ARMA(1,1) on returns

Drift in ARIMA Models

When d1d \geq 1, the constant cc induces a deterministic trend in the original series. For ARIMA(0,1,0) with constant:

YtYt1=c+εtY_t - Y_{t-1} = c + \varepsilon_t

where:

  • YtY_t: value of the series at time tt
  • Yt1Y_{t-1}: value at time t1t-1
  • cc: constant term (drift)
  • εt\varepsilon_t: white noise error term

This says that each period, the series increases by cc on average plus a random shock. Summing the increments from t=1t=1 to TT reveals the cumulative effect of the drift and shocks:

t=1T(YtYt1)=t=1T(c+εt)YTY0=cT+t=1Tεt(telescoping sum)YT=Y0+cT+t=1Tεt(rearrange)\begin{aligned} \sum_{t=1}^T (Y_t - Y_{t-1}) &= \sum_{t=1}^T (c + \varepsilon_t) \\ Y_T - Y_0 &= cT + \sum_{t=1}^T \varepsilon_t && \text{(telescoping sum)} \\ Y_T &= Y_0 + cT + \sum_{t=1}^T \varepsilon_t && \text{(rearrange)} \end{aligned}

where:

  • YTY_T: value at time TT
  • Y0Y_0: initial value
  • cTcT: deterministic trend component
  • εt\sum \varepsilon_t: stochastic trend component

The term cTcT is a linear trend that grows steadily with time. The accumulated shocks εt\sum \varepsilon_t form a stochastic trend that wanders randomly. This is the random walk with drift model, commonly used for log stock prices. It captures the empirical observation that stock prices tend to rise over time (positive drift) while exhibiting substantial random variation around this trend.

Model Identification

Identifying the appropriate (p,d,q)(p,d,q) specification is both art and science. The Box-Jenkins methodology provides a systematic approach.

Step 1: Assess Stationarity

First, examine the series visually and statistically:

  • Visual inspection: Does the series trend or wander? Does variance appear stable?
  • ACF analysis: For non-stationary series, the ACF decays very slowly
  • Unit root tests: The Augmented Dickey-Fuller (ADF) test formally tests for stationarity

If the series is non-stationary, difference it and repeat. Most financial price series require d=1d=1 (returns), while some economic series may need d=2d=2.

Step 2: Examine ACF and PACF

For the stationary (possibly differenced) series:

ACF and PACF patterns for model identification.
ModelACF PatternPACF Pattern
AR(p)Decays graduallyCuts off after lag pp
MA(q)Cuts off after lag qqDecays gradually
ARMA(p,q)Decays graduallyDecays gradually

The ACF and PACF of financial returns often suggest low-order models. Daily stock returns typically show little autocorrelation in levels but significant autocorrelation in squared returns (addressed in the next chapter on volatility modeling).

Out[5]:
Visualization
Six plots arranged in a grid showing ACF and PACF for AR, MA, and ARMA models.
Diagnostic ACF and PACF patterns for AR(2), MA(2), and ARMA(1,1) processes. AR models show gradual ACF decay with sharp PACF cutoff, MA models show sharp ACF cutoff with gradual PACF decay, and ARMA models exhibit gradual decay in both functions.
Notebook output
Notebook output
Notebook output
Notebook output
Notebook output

Step 3: Estimate Candidate Models

Fit several plausible specifications and compare using information criteria:

AIC=2ln(L^)+2k(Akaike Information Criterion)BIC=2ln(L^)+kln(n)(Bayesian Information Criterion)\begin{aligned} \text{AIC} &= -2 \ln(\hat{L}) + 2k && \text{(Akaike Information Criterion)} \\ \text{BIC} &= -2 \ln(\hat{L}) + k \ln(n) && \text{(Bayesian Information Criterion)} \end{aligned}

where:

  • AIC\text{AIC}: Akaike Information Criterion
  • BIC\text{BIC}: Bayesian Information Criterion
  • L^\hat{L}: maximized likelihood
  • kk: number of parameters
  • nn: sample size

Lower values indicate better models. BIC penalizes complexity more heavily and tends to select simpler models.

Step 4: Diagnostic Checking

After estimation, verify that residuals behave like white noise:\n\n- Ljung-Box test: Tests whether residual autocorrelations are jointly zero

  • Residual ACF: Should show no significant autocorrelations
  • Normality tests: Check if residuals are approximately normal (though financial residuals often have fat tails)

If diagnostics fail, revisit the specification.

Forecasting with ARIMA Models

Time-series models are widely used for forecasting. Using observations up to time TT, we predict YT+hY_{T+h} for horizons h=1,2,h = 1, 2, \ldots Forecast quality determines a model's practical value.

Point Forecasts

The optimal forecast (minimizing mean squared error) is the conditional expectation:

Y^T+hT=E[YT+hYT,YT1,]\hat{Y}_{T+h|T} = E[Y_{T+h} | Y_T, Y_{T-1}, \ldots]

where:

  • Y^T+hT\hat{Y}_{T+h|T}: forecast for time T+hT+h made at time TT
  • E[]E[\cdot]: conditional expectation operator
  • YT+hY_{T+h}: future value at horizon hh
  • YT,Y_T, \ldots: information set available at time TT

This conditional expectation uses all available information up to time TT to form the best prediction of the future value. For an AR(1) model Yt=c+ϕYt1+εtY_t = c + \phi Y_{t-1} + \varepsilon_t, the one-step-ahead forecast is simply:

Y^T+1T=c+ϕYT\hat{Y}_{T+1|T} = c + \phi Y_T

where:

  • Y^T+1T\hat{Y}_{T+1|T}: one-step-ahead forecast
  • cc: constant term
  • ϕ\phi: autoregressive coefficient
  • YTY_T: last observed value

For longer horizons, we iterate the forecasting equation, replacing unknown future values with their forecasts:

Y^T+2T=c+ϕY^T+1T(iterate AR relation)=c+ϕ(c+ϕYT)(substitute 1-step forecast)=c(1+ϕ)+ϕ2YT(simplify)\begin{aligned} \hat{Y}_{T+2|T} &= c + \phi \hat{Y}_{T+1|T} && \text{(iterate AR relation)} \\ &= c + \phi(c + \phi Y_T) && \text{(substitute 1-step forecast)} \\ &= c(1+\phi) + \phi^2 Y_T && \text{(simplify)} \end{aligned}

where:

  • Y^T+2T\hat{Y}_{T+2|T}: two-step-ahead forecast
  • Y^T+1T\hat{Y}_{T+1|T}: one-step-ahead forecast
  • cc: constant term
  • ϕ\phi: autoregressive coefficient
  • YTY_T: last observed value

More generally, the forecast formula reveals how the current observation influences predictions at any horizon:

Y^T+hT=μ+ϕh(YTμ)\hat{Y}_{T+h|T} = \mu + \phi^h (Y_T - \mu)

where:

  • Y^T+hT\hat{Y}_{T+h|T}: forecast for T+hT+h given information at TT
  • μ=c/(1ϕ)\mu = c/(1-\phi): unconditional mean
  • ϕ\phi: AR coefficient
  • hh: forecast horizon

The forecast is a weighted average of the unconditional mean μ\mu and the current deviation from that mean. The weight on the current deviation is ϕh\phi^h, which shrinks toward zero as the horizon increases. As hh \to \infty, forecasts converge to the unconditional mean: Y^T+hTμ\hat{Y}_{T+h|T} \to \mu. This reflects the fundamental property that for stationary series, distant future values are increasingly disconnected from current conditions.

Forecast Intervals

The forecast error at horizon hh is the difference between the realized value and our prediction:

eT+hT=YT+hY^T+hTe_{T+h|T} = Y_{T+h} - \hat{Y}_{T+h|T}

where:

  • eT+hTe_{T+h|T}: forecast error at horizon hh
  • YT+hY_{T+h}: realized value at time T+hT+h
  • Y^T+hT\hat{Y}_{T+h|T}: forecast made at time TT

For AR(1), the forecast error variance grows with horizon as uncertainty accumulates:

Var(eT+hT)=σ21ϕ2h1ϕ2\text{Var}(e_{T+h|T}) = \sigma^2 \frac{1 - \phi^{2h}}{1 - \phi^2}

where:

  • Var(eT+hT)\text{Var}(e_{T+h|T}): variance of the forecast error at horizon hh
  • σ2\sigma^2: innovation variance
  • ϕ\phi: AR coefficient

As hh \to \infty, this converges to the unconditional variance σ2/(1ϕ2)\sigma^2/(1-\phi^2). The uncertainty approaches the unconditional variance because at very long horizons, our conditional forecast converges to the unconditional mean, and the forecast error becomes essentially the deviation of the future observation from that mean.

A 95% forecast interval is:

Y^T+hT±1.96Var(eT+hT)\hat{Y}_{T+h|T} \pm 1.96 \sqrt{\text{Var}(e_{T+h|T})}

where:

  • Y^T+hT\hat{Y}_{T+h|T}: forecast for time T+hT+h given information at TT
  • 1.961.96: z-score for 95% confidence
  • Var(eT+hT)\text{Var}(e_{T+h|T}): forecast error variance

This widening of forecast intervals reflects increasing uncertainty at longer horizons, a fundamental feature of time-series prediction. The honest acknowledgment of forecast uncertainty is crucial for risk management and decision-making under uncertainty.

Out[6]:
Visualization
Line plot showing how forecast variance increases with horizon for different AR coefficients.
Forecast error variance as a function of horizon for AR(1) models with varying persistence ($\phi$). Higher persistence levels delay the convergence of forecast uncertainty to the unconditional variance limit (dashed lines).

Implementation: Analyzing S&P 500 Returns

Let's apply these concepts to real financial data, analyzing daily S&P 500 returns.

In[7]:
Code
import numpy as np

np.random.seed(42)

Simulating Financial Data

We'll work with a simulated dataset that mimics S&P 500 characteristics, then show how to identify and fit ARIMA models.

In[8]:
Code
import numpy as np
import pandas as pd

# Simulate a price series with realistic characteristics
n_days = 1000
mu_daily = 0.0003  # ~7.5% annual return
sigma_daily = 0.012  # ~19% annual volatility

# Generate returns with slight AR(1) structure (common in short-term returns)
returns = np.zeros(n_days)
phi = 0.05  # Small positive autocorrelation
for t in range(1, n_days):
    returns[t] = (
        mu_daily
        + phi * (returns[t - 1] - mu_daily)
        + sigma_daily * np.random.randn()
    )

# Generate price series
prices = 100 * np.exp(np.cumsum(returns))

# Create date index
dates = pd.date_range(start="2020-01-01", periods=n_days, freq="B")
price_series = pd.Series(prices, index=dates, name="Price")
return_series = pd.Series(returns, index=dates, name="Returns")

Visualizing Prices vs. Returns

Out[9]:
Visualization
Two stacked plots showing trending prices and mean-reverting returns.
Simulated asset price and return trajectories over 1000 days. The price series (top) exhibits non-stationary random walk behavior with drift, while the corresponding returns (bottom) show stationary fluctuations around a constant mean.
Simulated asset price and return trajectories over 1000 days. The price series (top) exhibits non-stationary random walk behavior with drift, while the corresponding returns (bottom) show stationary fluctuations around a constant mean.
Simulated asset price and return trajectories over 1000 days. The price series (top) exhibits non-stationary random walk behavior with drift, while the corresponding returns (bottom) show stationary fluctuations around a constant mean.

The price series shows a clear upward trend, violating the constant mean requirement for stationarity. Returns, however, fluctuate around a stable mean near zero, making them suitable for ARMA modeling.

Computing ACF and PACF

In[10]:
Code
def compute_acf(series, max_lag=20):
    """Compute sample autocorrelation function."""
    n = len(series)
    mean = series.mean()
    var = series.var()
    acf = np.zeros(max_lag + 1)

    for k in range(max_lag + 1):
        cov = (
            np.sum((series[k:] - mean) * (series[:-k] - mean)) / n
            if k > 0
            else var
        )
        acf[k] = cov / var

    return acf


def compute_pacf(series, max_lag=20):
    """Compute sample partial autocorrelation using Durbin-Levinson."""
    acf = compute_acf(series, max_lag)
    pacf = np.zeros(max_lag + 1)
    pacf[0] = 1.0
    pacf[1] = acf[1]

    phi = np.zeros((max_lag + 1, max_lag + 1))
    phi[1, 1] = acf[1]

    for k in range(2, max_lag + 1):
        num = acf[k] - sum(phi[k - 1, j] * acf[k - j] for j in range(1, k))
        den = 1 - sum(phi[k - 1, j] * acf[j] for j in range(1, k))
        phi[k, k] = num / den
        for j in range(1, k):
            phi[k, j] = phi[k - 1, j] - phi[k, k] * phi[k - 1, k - j]
        pacf[k] = phi[k, k]

    return pacf
In[11]:
Code
max_lag = 20
acf_returns = compute_acf(returns, max_lag)
pacf_returns = compute_pacf(returns, max_lag)

# Confidence bounds (approximate 95% under null of white noise)
conf_bound = 1.96 / np.sqrt(n_days)
Out[12]:
Visualization
Side by side bar charts showing ACF and PACF with confidence bands.
Sample Autocorrelation (ACF) and Partial Autocorrelation (PACF) functions for simulated returns. The significant spike at lag 1 in the PACF, combined with the decay in the ACF, identifies the underlying process as an AR(1) model.
Notebook output

The ACF shows the lag-1 autocorrelation is significant, with subsequent lags decaying. The PACF shows a significant spike at lag 1 followed by values within the confidence bands. This pattern, with PACF cutting off after lag 1 and ACF decaying, strongly suggests an AR(1) model.

Estimating an AR(1) Model

We'll estimate the AR(1) parameters using ordinary least squares, which is consistent for stationary AR models.

In[13]:
Code
# Prepare data for AR(1) estimation
Y = returns[1:]  # Y_t
Y_lag = returns[:-1]  # Y_{t-1}

# OLS estimation: Y_t = c + phi * Y_{t-1} + epsilon_t
X = np.column_stack([np.ones(len(Y_lag)), Y_lag])
beta = np.linalg.lstsq(X, Y, rcond=None)[0]
c_hat, phi_hat = beta

# Compute residuals
Y_hat = X @ beta
residuals = Y - Y_hat
sigma_hat = np.std(residuals)

# Compute standard errors
XtX_inv = np.linalg.inv(X.T @ X)
var_beta = sigma_hat**2 * XtX_inv
se_c = np.sqrt(var_beta[0, 0])
se_phi = np.sqrt(var_beta[1, 1])
Out[14]:
Console
AR(1) Model Estimation Results
========================================
Constant (c):     0.000514  (SE: 0.000372)
AR(1) coef (φ):   0.0429  (SE: 0.0316)
Residual std:     0.011748

Implied mean:     0.000537
Stationarity:     Yes (|φ| = 0.0429)

The estimated AR(1) coefficient is close to our simulated value of 0.05. The model is stationary since ϕ^<1|\hat{\phi}| < 1.

Diagnostic Checking

In[15]:
Code
from scipy import stats


# Ljung-Box test for residual autocorrelation
def ljung_box_test(residuals, max_lag=10):
    """Compute Ljung-Box Q statistic."""
    n = len(residuals)
    acf_resid = compute_acf(residuals, max_lag)

    Q = (
        n
        * (n + 2)
        * sum(acf_resid[k] ** 2 / (n - k) for k in range(1, max_lag + 1))
    )
    p_value = 1 - stats.chi2.cdf(Q, df=max_lag)

    return Q, p_value


Q_stat, p_val = ljung_box_test(residuals)
acf_residuals = compute_acf(residuals, max_lag)
Out[16]:
Console
Residual Diagnostics
========================================
Ljung-Box Q(10):  7.44
p-value:          0.6832

Residual mean:    -0.000000
Residual std:     0.011748
Skewness:         0.1188
Kurtosis:         0.0647

The Ljung-Box test checks whether residual autocorrelations are collectively significant. A high p-value suggests the residuals behave like white noise, indicating an adequate model.

Out[17]:
Visualization
Two panel diagnostic plot with residuals over time and residual ACF.
Diagnostic plots for AR(1) model residuals. The residual time series (top) shows no obvious patterns or trends, and the residual ACF (bottom) confirms the absence of significant serial correlation, validating the model fit.
Diagnostic plots for AR(1) model residuals. The residual time series (top) shows no obvious patterns or trends, and the residual ACF (bottom) confirms the absence of significant serial correlation, validating the model fit.
Diagnostic plots for AR(1) model residuals. The residual time series (top) shows no obvious patterns or trends, and the residual ACF (bottom) confirms the absence of significant serial correlation, validating the model fit.

Forecasting

With the estimated model, we can generate forecasts with confidence intervals.

In[18]:
Code
# Generate forecasts
forecast_horizon = 20
Y_T = returns[-1]  # Last observed return
mu_unconditional = c_hat / (1 - phi_hat)

forecasts = np.zeros(forecast_horizon)
forecast_var = np.zeros(forecast_horizon)

for h in range(forecast_horizon):
    # Point forecast
    forecasts[h] = mu_unconditional + (phi_hat ** (h + 1)) * (
        Y_T - mu_unconditional
    )

    # Forecast variance
    forecast_var[h] = (
        sigma_hat**2 * (1 - phi_hat ** (2 * (h + 1))) / (1 - phi_hat**2)
    )

# 95% confidence intervals
lower_95 = forecasts - 1.96 * np.sqrt(forecast_var)
upper_95 = forecasts + 1.96 * np.sqrt(forecast_var)

# 80% confidence intervals
lower_80 = forecasts - 1.28 * np.sqrt(forecast_var)
upper_80 = forecasts + 1.28 * np.sqrt(forecast_var)
Out[19]:
Visualization
Line plot showing point forecasts converging to mean with widening confidence bands.
Twenty-day return forecasts from the AR(1) model with 80% and 95% confidence intervals. The point forecasts (red line) rapidly converge to the unconditional mean (green dashed line), while the confidence intervals widen to the unconditional variance limit.

The forecasts converge quickly to the unconditional mean because the AR(1) coefficient is small. The confidence intervals widen initially but stabilize as they approach the unconditional variance.

Using statsmodels for ARIMA

In practice, we use established libraries for ARIMA estimation. The statsmodels package provides comprehensive time-series functionality.

In[20]:
Code
from statsmodels.tsa.stattools import adfuller

# Augmented Dickey-Fuller test for stationarity
adf_result = adfuller(returns)
Out[21]:
Console
Augmented Dickey-Fuller Test for Returns
=============================================
ADF Statistic:    -30.2452
p-value:          0.000000
Critical values:
  1%:         -3.4369
  5%:         -2.8644
  10%:         -2.5683

Conclusion: Stationary (reject unit root)

The ADF test strongly rejects the null hypothesis of a unit root, confirming that returns are stationary and suitable for ARMA modeling without differencing.

In[22]:
Code
# Fit ARIMA models with different specifications
from statsmodels.tsa.arima.model import ARIMA

models = {}
for p in range(3):
    for q in range(3):
        if p == 0 and q == 0:
            continue  # Skip trivial model
        try:
            model = ARIMA(returns, order=(p, 0, q))
            fitted = model.fit()
            models[(p, 0, q)] = {
                "aic": fitted.aic,
                "bic": fitted.bic,
                "model": fitted,
            }
        except:
            pass  # Skip failed fits

# Sort models by BIC for comparison
sorted_models = sorted(models.items(), key=lambda x: x[1]["bic"])
Out[23]:
Console
Model Comparison (ARIMA)
==================================================
Model           AIC             BIC            
--------------------------------------------------
ARIMA(1,0,0)      -6045.19        -6030.47       
ARIMA(0,0,1)      -6045.19        -6030.47       
ARIMA(1,0,1)      -6043.19        -6023.56       
ARIMA(2,0,0)      -6043.19        -6023.56       
ARIMA(0,0,2)      -6043.19        -6023.56       
ARIMA(1,0,2)      -6042.83        -6018.29       
ARIMA(2,0,1)      -6041.19        -6016.65       
ARIMA(2,0,2)      -6040.26        -6010.81       

Information criteria help select the best model balancing fit and parsimony. The BIC, with its heavier penalty for complexity, often selects simpler models appropriate for financial data.

In[24]:
Code
# Fit best model (AR(1) based on BIC)
if sorted_models:
    best_order = sorted_models[0][0]
    best_model = models[best_order]["model"]
else:
    # Fallback to AR(1) if no models were successfully fit
    from statsmodels.tsa.arima.model import ARIMA

    model = ARIMA(returns, order=(1, 0, 0))
    best_model = model.fit()
    best_order = (1, 0, 0)
Out[25]:
Console

Best Model: ARIMA(1, 0, 0)
==================================================
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0005      0.000      1.374      0.169      -0.000       0.001
ar.L1          0.0429      0.032      1.326      0.185      -0.020       0.106
sigma2         0.0001   6.08e-06     22.672      0.000       0.000       0.000
==============================================================================

The model summary confirms that the AR(1) coefficient is statistically significant and close to the true value of 0.05. The low p-values indicate strong evidence against zero coefficients.

ARIMA for Price Series

When working with prices directly, we need ARIMA with d=1d=1. This models returns (first differences) using ARMA.

In[26]:
Code
# Fit ARIMA(1,1,0) to prices, which is equivalent to AR(1) on returns
price_model = ARIMA(prices, order=(1, 1, 0))
price_fitted = price_model.fit()

# Generate price forecasts
price_forecast = price_fitted.get_forecast(steps=20)
price_pred = price_forecast.predicted_mean
price_ci = price_forecast.conf_int(alpha=0.05)
Out[27]:
Visualization
Price forecast plot with confidence bands widening over forecast horizon.
Price forecasts from an ARIMA(1,1,0) model over a 20-day horizon. The confidence intervals (red shaded region) widen significantly faster than for returns, illustrating the accumulation of uncertainty in integrated I(1) processes.

The confidence interval for price forecasts widens dramatically compared to return forecasts. This reflects the accumulation of return forecast errors: uncertainty compounds over time when predicting integrated series.

Financial Applications of ARIMA Models

ARIMA models find practical application across several areas of quantitative finance. This section examines three key applications: exploiting mean-reverting behavior in spreads, quantifying risk through conditional forecasts, and projecting macroeconomic variables.

Mean Reversion Trading

Interest rate spreads and basis trades often exhibit mean-reverting behavior well-captured by AR models. If a spread StS_t follows AR(1) with ϕ<1\phi < 1:

St=μ+ϕ(St1μ)+εtS_t = \mu + \phi(S_{t-1} - \mu) + \varepsilon_t

where:

  • StS_t: spread at time tt
  • μ\mu: long-run mean spread
  • ϕ\phi: autoregressive coefficient (determines speed of reversion; must be <1< 1)
  • εt\varepsilon_t: random shock

When the spread deviates from μ\mu, the expected deviation decays geometrically. The half-life measures the time τ\tau required for this deviation to shrink by 50% (E[St+τμ]=12(Stμ)E[S_{t+\tau}-\mu] = \frac{1}{2}(S_t-\mu)). Since the expected deviation at τ\tau is ϕτ(Stμ)\phi^\tau(S_t - \mu), we solve:

ϕτ=0.5(decay condition)τln(ϕ)=ln(0.5)(take logs)τ=ln(2)ln(ϕ)(solve for τ)\begin{aligned} \phi^\tau &= 0.5 && \text{(decay condition)} \\ \tau \ln(\phi) &= \ln(0.5) && \text{(take logs)} \\ \tau &= \frac{-\ln(2)}{\ln(\phi)} && \text{(solve for } \tau \text{)} \end{aligned}

where:

  • τ\tau: half-life of the mean reversion process
  • ϕ\phi: autoregressive coefficient

For ϕ=0.95\phi = 0.95, the half-life is approximately 13.5 periods.

Out[28]:
Visualization
Line plot showing exponential relationship between AR coefficient and mean reversion half-life.
Relationship between AR(1) persistence (phi) and mean reversion half-life. As phi approaches 1, the half-life increases exponentially, indicating that highly persistent processes take much longer to return to their long-run mean after a shock.

Risk Management

ARIMA forecasts provide inputs for Value-at-Risk calculations. The conditional distribution of future returns:

rT+hTN(r^T+hT,σhT2)r_{T+h|T} \sim N(\hat{r}_{T+h|T}, \sigma^2_{h|T})

where:

  • rT+hTr_{T+h|T}: distribution of returns at T+hT+h conditional on time TT
  • r^T+hT\hat{r}_{T+h|T}: conditional expected return
  • σhT2\sigma^2_{h|T}: conditional forecast variance

allows computation of forward-looking VaR that accounts for current conditions rather than using unconditional distributions.

Economic Forecasting

Central bank policy analysis uses ARIMA models for inflation, GDP growth, and employment forecasts. These series often require d=1d=1 or d=2d=2 and seasonal components (SARIMA) that extend the basic ARIMA framework.

Limitations and Practical Considerations

Time-series models help analyze serial dependence, but they have limitations with financial data.

Linear structure assumption: ARMA models capture only linear dependencies. As we observed in the chapter on stylized facts, financial returns exhibit significant nonlinear features. While squared returns may be highly autocorrelated (indicating predictable volatility patterns), raw returns often show minimal linear autocorrelation. This means ARMA models may suggest returns are nearly unpredictable when substantial predictability exists in higher moments. The next chapter on GARCH models addresses this limitation by modeling the conditional variance process.

Stationarity requirements: The assumption that the data-generating process remains stable over time is often violated in financial markets. Structural breaks occur due to regulatory changes, market crises, and regime shifts. An ARIMA model estimated on pre-2008 data would perform poorly during the financial crisis. Adaptive estimation windows and regime-switching models partially address this, but no mechanical solution exists for genuine structural change.

Gaussian innovations: The standard ARIMA framework assumes normally distributed innovations, yet financial returns consistently exhibit fat tails. This matters most for risk applications: forecast intervals based on normal distributions systematically underestimate tail risk. Using Student-t or other fat-tailed distributions for the innovation process improves this, though at the cost of added complexity.

Limited forecasting horizon: For typical equity returns with autocorrelation coefficients near zero, ARIMA models provide little forecasting value. The optimal forecast quickly converges to the unconditional mean. Where ARIMA models excel is in series with genuine persistent dynamics: interest rate spreads, volatility measures (the square of returns), and macroeconomic indicators.

Model selection uncertainty: The Box-Jenkins methodology requires judgment calls about differencing order, lag selection, and handling of outliers. Different practitioners may arrive at different specifications from the same data. Information criteria help but don't eliminate subjectivity. In high-frequency trading applications, this model uncertainty itself becomes a risk factor.

Despite these limitations, ARIMA models established the foundation for modern financial time-series analysis. The concepts of stationarity, autocorrelation structure, and systematic model identification remain central even as we extend to more sophisticated approaches.

Summary

This chapter developed the theory and practice of time-series modeling for financial data. We established stationarity as the foundational property enabling meaningful statistical analysis and showed that while asset prices are non-stationary, returns typically achieve approximate stationarity.

Autoregressive models express current values as linear functions of past values, with the AR coefficient controlling the persistence of shocks. The stationarity condition ϕ<1|\phi| < 1 ensures the series remains bounded, while the PACF's cutoff property identifies AR order. Moving average models instead express observations as functions of past innovations, with the ACF cutting off after lag qq. ARMA combines both structures for parsimonious modeling of complex autocorrelation patterns.

For non-stationary series, ARIMA extends the framework through differencing. The integrated component handles trends and unit roots common in economic and price data. The Box-Jenkins methodology provides a systematic approach: assess stationarity, examine ACF/PACF patterns, estimate candidate models, compare using information criteria, and verify through residual diagnostics.

Forecasting from ARIMA models produces point predictions and uncertainty intervals that widen with horizon. This quantification of forecast uncertainty is essential for risk management applications. The tools developed here, particularly ACF analysis and model diagnostics, will prove valuable throughout the remainder of this book as we explore volatility modeling, regression methods, and machine learning approaches to financial prediction.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about time-series models for financial data.

Loading component...

Reference

BIBTEXAcademic
@misc{timeseriesmodelsforfinancialdataarmaarima, author = {Michael Brenndoerfer}, title = {Time-Series Models for Financial Data: AR, MA & ARIMA}, year = {2025}, url = {https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-01-01} }
APAAcademic
Michael Brenndoerfer (2025). Time-Series Models for Financial Data: AR, MA & ARIMA. Retrieved from https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting
MLAAcademic
Michael Brenndoerfer. "Time-Series Models for Financial Data: AR, MA & ARIMA." 2026. Web. today. <https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting>.
CHICAGOAcademic
Michael Brenndoerfer. "Time-Series Models for Financial Data: AR, MA & ARIMA." Accessed today. https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Time-Series Models for Financial Data: AR, MA & ARIMA'. Available at: https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting (Accessed: today).
SimpleBasic
Michael Brenndoerfer (2025). Time-Series Models for Financial Data: AR, MA & ARIMA. https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting