Time-Series Models for Financial Data: AR, MA & ARIMA

Michael Brenndoerfer

Quantitative Finance Data, Analytics & AI

Master autoregressive and moving average models for financial time-series. Learn stationarity, ACF/PACF diagnostics, ARIMA estimation, and forecasting.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Time-Series Models for Financial DataLink Copied

Financial markets generate vast streams of sequential observations: prices, returns, interest rates, volatility measures, and economic indicators. Understanding the temporal structure of these data is essential for forecasting, risk management, and trading strategy development. Time-series models provide a principled framework for capturing the statistical dependencies that exist between observations across time.

Unlike the stochastic differential equations we explored in earlier chapters on Brownian motion and Itô calculus, the models in this chapter operate in discrete time and focus on the autocorrelation structure of observed data. While Black-Scholes assumes returns are independent and identically distributed, empirical evidence from our study of stylized facts reveals that financial returns exhibit serial correlation in their magnitudes (volatility clustering) and occasional predictable patterns in their levels. Time-series models give us the tools to detect, quantify, and exploit these patterns.

This chapter introduces the foundational autoregressive (AR) and moving average (MA) models, then combines them into the ARMA framework for stationary series. We extend to ARIMA models for handling non-stationary data like trending economic indicators or asset prices. Throughout, we develop practical skills for model identification using autocorrelation analysis, parameter estimation, and forecasting with proper uncertainty quantification.

Stationarity: The Foundation of Time-Series AnalysisLink Copied

Before building any time-series model, we must understand stationarity, a property that determines which modeling techniques apply and whether our statistical estimates are meaningful. The concept of stationarity addresses a fundamental challenge in time-series analysis: how can we learn about a process when we observe only a single realization of it? If the statistical properties of a series change arbitrarily over time, then observations from the past tell us nothing reliable about the future. Stationarity provides the stability assumption that makes statistical inference possible.

Think of stationarity as a form of temporal homogeneity. Just as we assume in cross-sectional statistics that observations come from the same underlying distribution, stationarity assumes that the time series comes from a process whose statistical character remains consistent across time. This does not mean the series is constant or predictable. Rather, it means the rules governing its behavior do not change.

Strict and Weak StationarityLink Copied

A time series $\{Y_t\}$ is strictly stationary if its joint probability distribution is invariant to time shifts. For any set of time indices $t_1, t_2, \ldots, t_k$ and any shift $h$ , the joint distribution of $(Y_{t_1}, Y_{t_2}, \ldots, Y_{t_k})$ equals that of $(Y_{t_1+h}, Y_{t_2+h}, \ldots, Y_{t_k+h})$ . This is a strong requirement rarely verified in practice. Strict stationarity essentially demands that if you were to examine any collection of observations from the series, you could not tell what time period they came from based solely on their statistical properties. The distribution of Monday's return and the following Wednesday's return must be identical to the distribution of any other Monday-Wednesday pair, regardless of when those observations occur.

More useful is weak stationarity (or covariance stationarity), which requires only:\n\n1. Constant mean: $E[Y_t] = \mu$ for all $t$ 2. Constant variance: $\text{Var}(Y_t) = \sigma^2$ for all $t$ 3. Autocovariance depends only on lag: $\text{Cov}(Y_t, Y_{t-k}) = \gamma_k$ for all $t$

Weak stationarity relaxes the strict requirement by focusing only on the first two moments of the distribution. We do not require the entire distribution to be time-invariant, only the mean, variance, and the covariance structure. This makes weak stationarity much easier to test and work with in practice. The third condition is particularly important: it states that the covariance between observations depends only on how far apart they are in time (the lag $k$ ), not on their absolute position in the series. The covariance between observations 5 periods apart must be the same whether we are looking at periods 1 and 6 or periods 100 and 105.

The autocovariance function $\gamma_k$ measures how observations separated by $k$ periods covary. This function captures the memory of the process, telling us how much information about the current value is contained in values from $k$ periods ago. The autocorrelation function (ACF) normalizes this to a scale that is easier to interpret:

\rho_k = \frac{\gamma_k}{\gamma_0} = \frac{\text{Cov}(Y_t, Y_{t-k})}{\text{Var}(Y_t)}

where:

$Y_t$ : value of the series at time $t$
$\rho_k$ : autocorrelation at lag $k$
$\gamma_k$ : autocovariance at lag $k$
$\gamma_0$ : variance of the series ( $\text{Var}(Y_t)$ )
$\text{Cov}(Y_t, Y_{t-k})$ : covariance between observations $k$ periods apart

The normalization by $\gamma_0$ (the variance) ensures that autocorrelations are bounded between -1 and 1, just like ordinary correlation coefficients. For a stationary series, $\rho_0 = 1$ always holds because any observation is perfectly correlated with itself. The constraint $|\rho_k| \leq 1$ for all lags follows from the Cauchy-Schwarz inequality. The ACF provides a complete summary of the linear dependence structure of a stationary time series and serves as our primary diagnostic tool for model identification.

Stationarity in Financial DataLink Copied

Asset prices are almost never stationary: they exhibit trends, drift, and variance that changes over time. The S&P 500 today is systematically higher than it was 50 years ago, violating the constant mean requirement. This non-stationarity is not a modeling failure but reflects genuine economic reality. Over long horizons, stock prices tend to rise because companies generate earnings and reinvest profits, creating compound growth in value. A statistical model that assumed stock prices fluctuate around a fixed mean would fundamentally mischaracterize how equity markets work.

However, returns are often approximately stationary:

\begin{aligned} r_t &= \frac{P_t - P_{t-1}}{P_{t-1}} && \text{(simple return)} \\ r_t &= \log(P_t) - \log(P_{t-1}) && \text{(log return)} \end{aligned}

where:

$r_t$ : return at time $t$
$P_t$ : price at time $t$
$P_{t-1}$ : price at time $t-1$

The transformation from prices to returns removes the trending behavior. Log returns fluctuate around a relatively stable mean (often close to zero for daily data) with variance that, while exhibiting clustering, maintains a long-run average. Intuitively, while the price level can wander arbitrarily far from any fixed reference point, the percentage changes in price tend to stay within a bounded range. This is why most time-series modeling in finance focuses on returns rather than prices. The transformation to returns is not just a mathematical convenience; it reflects the economic intuition that investors care about percentage gains and losses, not absolute price levels.

Unit Root and Integration

A series is integrated of order $d$ , written $I(d)$ , if it must be differenced $d$ times to achieve stationarity. Asset prices are typically $I(1)$ : one difference (the return) yields a stationary series. Economic series like GDP may be $I(1)$ or $I(2)$ .

Autoregressive (AR) ModelsLink Copied

Autoregressive models capture the idea that today's value depends linearly on past values. This concept reflects a fundamental intuition about how many processes evolve: the current state of a system depends on where it was recently. An AR(p) model expresses the current observation as a weighted sum of the previous $p$ observations plus a noise term. The term "autoregressive" means the series is regressed on its own past values, unlike ordinary regression which uses external variables.

The AR(1) ModelLink Copied

To capture the linear dependence of the current observation on its immediate predecessor, the simplest autoregressive model is AR(1). This model embodies the idea that today's value is pulled toward its previous value, with some random perturbation added. Mathematically:

Y_t = c + \phi Y_{t-1} + \varepsilon_t

where:

$Y_t$ : value of the series at time $t$
$Y_{t-1}$ : value of the series at time $t-1$
$c$ : constant (intercept)
$\phi$ : autoregressive coefficient
$\varepsilon_t$ : white noise term with mean 0 and variance $\sigma^2$

The white noise terms are assumed independent: $\text{Cov}(\varepsilon_t, \varepsilon_s) = 0$ for $t \neq s$ . This independence assumption is crucial because it means all the predictable structure in the series comes from the autoregressive relationship, not from patterns in the shocks themselves. The parameter $\phi$ controls the strength and direction of persistence. When $\phi$ is positive, high values tend to be followed by high values and low values by low values. When $\phi$ is negative, the series tends to oscillate, with high values followed by low values and vice versa. The magnitude of $\phi$ determines how strong this tendency is.

Stationarity ConditionsLink Copied

For AR(1) to be stationary, we require $|\phi| < 1$ . This condition ensures that the influence of past shocks decays over time rather than persisting indefinitely or exploding. To see why this condition is necessary, assume stationarity holds and take expectations of both sides of the AR(1) equation:

E[Y_t] = c + \phi E[Y_{t-1}] + E[\varepsilon_t]

where:

$E[\cdot]$ : expectation operator
$Y_t$ : value of the series at time $t$
$c$ : constant term
$\phi$ : autoregressive coefficient
$\varepsilon_t$ : white noise term

Since $E[Y_t] = E[Y_{t-1}] = \mu$ under stationarity (the mean must be the same at all time periods), and $E[\varepsilon_t] = 0$ by the white noise assumption, we can solve for the unconditional mean:

\begin{aligned} \mu &= c + \phi \mu && \text{(stationarity condition)} \\ \mu(1 - \phi) &= c && \text{(rearrange terms)} \\ \mu &= \frac{c}{1-\phi} && \text{(solve for } \mu \text{)} \end{aligned}

where:

$\mu$ : mean of the process
$c$ : constant term
$\phi$ : autoregressive coefficient

This solution exists only if $\phi \neq 1$ . If $\phi = 1$ , the denominator becomes zero and no finite mean exists, indicating a non-stationary process known as a random walk. Moreover, computing the variance requires using the independence of $Y_{t-1}$ and $\varepsilon_t$ . This independence follows from the assumption that $\varepsilon_t$ is unpredictable given past information:

\begin{aligned} \text{Var}(Y_t) &= \text{Var}(c + \phi Y_{t-1} + \varepsilon_t) \\ &= \phi^2 \text{Var}(Y_{t-1}) + \text{Var}(\varepsilon_t) && \text{(independent terms)} \\ &= \phi^2 \text{Var}(Y_{t-1}) + \sigma^2 && \text{(simplify)} \end{aligned}

where:

$\text{Var}(Y_t)$ : variance at time $t$
$c$ : constant term
$\phi$ : autoregressive coefficient
$\varepsilon_t$ : white noise term
$\sigma^2$ : variance of the white noise term

Under stationarity with $\text{Var}(Y_t) = \text{Var}(Y_{t-1}) = \gamma_0$ , we can substitute and solve for the unconditional variance:

\begin{aligned} \gamma_0 &= \phi^2 \gamma_0 + \sigma^2 && \text{(stationarity condition)} \\ \gamma_0(1 - \phi^2) &= \sigma^2 && \text{(rearrange terms)} \\ \gamma_0 &= \frac{\sigma^2}{1-\phi^2} && \text{(solve for } \gamma_0 \text{)} \end{aligned}

where:

$\gamma_0$ : variance of the process ( $\text{Var}(Y_t)$ )
$\sigma^2$ : variance of the white noise term
$\phi$ : autoregressive coefficient

This expression is positive and finite only if $\phi^2 < 1$ , meaning $|\phi| < 1$ . When $|\phi| \geq 1$ , the variance formula would yield either a negative number (impossible) or infinity (the series explodes). The stationarity condition thus emerges naturally from requiring that both the mean and variance be well-defined finite constants.

ACF of AR(1)Link Copied

The autocorrelation function of AR(1) decays geometrically, a signature pattern that helps us identify AR processes in empirical data. To derive the autocovariance, we work with the mean-centered series to simplify calculations. The deviation from the mean satisfies $Y_t - \mu = \phi(Y_{t-1} - \mu) + \varepsilon_t$ . Multiplying both sides by $(Y_{t-k} - \mu)$ for $k \geq 1$ and taking expectations yields the autocovariance at lag $k$ :

\begin{aligned} \gamma_k &= E[(Y_t - \mu)(Y_{t-k} - \mu)] && \text{(definition of autocovariance)} \\ &= E[(\phi(Y_{t-1} - \mu) + \varepsilon_t)(Y_{t-k} - \mu)] && \text{(substitute deviation form)} \\ &= \phi E[(Y_{t-1} - \mu)(Y_{t-k} - \mu)] + E[\varepsilon_t(Y_{t-k} - \mu)] && \text{(linearity of expectation)} \\ &= \phi \gamma_{k-1} + 0 && \text{(uncorrelated error)} \end{aligned}

where:

$\gamma_k$ : autocovariance at lag $k$
$E[\cdot]$ : expectation operator
$Y_t$ : value of the series at time $t$
$\mu$ : mean of the process
$\phi$ : autoregressive coefficient
$\varepsilon_t$ : white noise term
$\gamma_{k-1}$ : autocovariance at lag $k-1$

The key insight is that $E[\varepsilon_t(Y_{t-k} - \mu)] = 0$ for $k \geq 1$ because the current shock $\varepsilon_t$ is uncorrelated with past values of the series. This recursion $\gamma_k = \phi \gamma_{k-1}$ yields $\gamma_k = \phi^k \gamma_0$ by repeated substitution, so the autocorrelation function has an elegant closed form:

\rho_k = \phi^k

where:

$\rho_k$ : autocorrelation at lag $k$
$\phi$ : autoregressive coefficient
$k$ : lag number

When $\phi > 0$ , correlations decay exponentially but remain positive at all lags. This positive persistence means high values tend to be followed by somewhat high values, with the memory gradually fading. When $\phi < 0$ , correlations alternate in sign while decaying in magnitude. This creates an oscillating pattern where high values tend to be followed by low values, then moderately high values, and so on. The geometric decay pattern is the hallmark of AR(1) and provides a clear diagnostic signature in sample ACF plots.

Out[2]:

Visualization

Two plots showing monotonic and alternating ACF decay. — Autocorrelation functions for AR(1) processes with positive and negative coefficients. The left panel displays monotonic geometric decay for positive phi values, indicating persistence, while the right panel shows alternating signs for negative phi values, reflecting oscillatory behavior.

General AR(p) ModelsLink Copied

The AR(p) model extends the first-order case to include $p$ lags, allowing for more complex dependence structures where observations depend on multiple past values:

Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \cdots + \phi_p Y_{t-p} + \varepsilon_t

where:

$Y_t$ : value of the series at time $t$
$c$ : constant term
$\phi_i$ : autoregressive coefficient at lag $i$
$\varepsilon_t$ : white noise error term

The multiple lags can capture more nuanced dynamics. For instance, an AR(2) model might exhibit cyclical behavior if the coefficients are chosen appropriately, which a simple AR(1) cannot produce. Using the lag operator $L$ where $L Y_t = Y_{t-1}$ , we can express the AR(p) model compactly:

(1 - \phi_1 L - \phi_2 L^2 - \cdots - \phi_p L^p) Y_t = c + \varepsilon_t

where:

$Y_t$ : value of the series at time $t$
$L$ : lag operator such that $L^k Y_t = Y_{t-k}$
$\phi_i$ : autoregressive coefficient at lag $i$
$c$ : constant term
$\varepsilon_t$ : white noise error term

We write this as $\Phi(L) Y_t = c + \varepsilon_t$ where $\Phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p$ is called the autoregressive polynomial. The lag operator notation provides not just compactness but also mathematical power, allowing us to manipulate time-series equations using algebraic techniques.

Stationarity requires all roots of the characteristic polynomial $\Phi(z) = 0$ to lie outside the unit circle in the complex plane. This generalizes the AR(1) condition $|\phi| < 1$ to higher-order models. For AR(2) with $Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \varepsilon_t$ , the stationarity conditions can be expressed as three inequalities:

\begin{aligned} \phi_1 + \phi_2 &< 1 \\ \phi_2 - \phi_1 &< 1 \\ |\phi_2| &< 1 \end{aligned}

where:

$\phi_1, \phi_2$ : autoregressive coefficients of the AR(2) model

These conditions define a triangular region in the $(\phi_1, \phi_2)$ parameter space where stationary AR(2) processes exist. Outside this region, the process either explodes or exhibits non-stationary wandering behavior.

Out[3]:

Visualization

Triangular stationarity region in the phi1-phi2 parameter space for AR(2) models. — Stationarity region for AR(2) models defined by the parameters φ₁ and φ₂. The shaded triangular area represents the set of coefficients yielding stationary processes, bounded by φ₁ + φ₂ < 1, φ₂ - φ₁ < 1, and |φ₂| < 1.

Partial Autocorrelation Function (PACF)Link Copied

The partial autocorrelation at lag $k$ , denoted $\alpha_k$ , measures the correlation between $Y_t$ and $Y_{t-k}$ after removing the linear effects of intermediate lags $Y_{t-1}, \ldots, Y_{t-k+1}$ . This "partial" correlation isolates the direct relationship between observations $k$ periods apart, controlling for everything in between. The intuition is similar to partial correlation in regression analysis: we want to know how much additional information $Y_{t-k}$ provides about $Y_t$ beyond what is already captured by the intervening observations.

For an AR(p) process, a key identification property holds:

\alpha_k = \begin{cases} \neq 0 & \text{for } k \leq p \\ = 0 & \text{for } k > p \end{cases}

where:

$\alpha_k$ : partial autocorrelation at lag $k$
$p$ : order of the autoregressive process

This cutoff property follows from the structure of the AR(p) model. Since $Y_t$ depends directly only on $Y_{t-1}$ through $Y_{t-p}$ , once we account for these intermediate values, $Y_{t-k}$ for $k > p$ provides no additional predictive information. The PACF cuts off sharply after lag $p$ , while the ACF decays gradually. This stark contrast makes the PACF the primary tool for identifying AR order: we look for the lag after which all partial autocorrelations become insignificant.

Moving Average (MA) ModelsLink Copied

While AR models express current values as functions of past values, moving average models take a fundamentally different approach by expressing current values as functions of past shocks. The terminology can be confusing because we are not computing averages in the usual sense. Instead, the current observation is modeled as a weighted combination of recent random innovations. This perspective views the time series as arising from the accumulation of shocks, with each shock having both an immediate impact and lingering effects over subsequent periods.

The MA(1) ModelLink Copied

The MA(1) model expresses the current value as the mean plus the current shock and a fraction of the previous shock:

Y_t = \mu + \varepsilon_t + \theta \varepsilon_{t-1}

where:

$Y_t$ : value of the series at time $t$
$\mu$ : mean of the process
$\varepsilon_t$ : white noise shock at time $t$
$\theta$ : moving average coefficient
$\varepsilon_{t-1}$ : white noise shock at time $t-1$

Unlike AR models, MA models are always stationary regardless of parameter values. This happens because the model is defined as a finite linear combination of independent random variables, which always has well-defined finite moments. However, we typically impose invertibility conditions for meaningful interpretation and to ensure unique parameter identification, as we will discuss later.

Properties of MA(1)Link Copied

The mean of the MA(1) process is straightforward to compute by taking expectations:

\begin{aligned} E[Y_t] &= E[\mu + \varepsilon_t + \theta \varepsilon_{t-1}] \\ &= \mu + E[\varepsilon_t] + \theta E[\varepsilon_{t-1}] && \text{(linearity of expectation)} \\ &= \mu + 0 + 0 && \text{(zero mean noise)} \\ &= \mu \end{aligned}

where:

$E[Y_t]$ : expected value of the series
$\mu$ : mean parameter
$\varepsilon_t$ : white noise shock at time $t$
$\theta$ : moving average coefficient

The variance calculation uses the independence of the white noise terms at different times. Since $\varepsilon_t$ and $\varepsilon_{t-1}$ are uncorrelated, their contributions to the variance simply add:

\begin{aligned} \text{Var}(Y_t) &= \text{Var}(\mu + \varepsilon_t + \theta \varepsilon_{t-1}) \\ &= \text{Var}(\varepsilon_t) + \theta^2 \text{Var}(\varepsilon_{t-1}) && \text{(independence of shocks)} \\ &= \sigma^2 + \theta^2 \sigma^2 && \text{(constant variance)} \\ &= \sigma^2(1 + \theta^2) && \text{(factor out } \sigma^2 \text{)} \end{aligned}

where:

$\text{Var}(Y_t)$ : variance of the series
$\sigma^2$ : variance of the white noise shock $\varepsilon_t$
$\theta$ : moving average coefficient

For autocovariances, the MA(1) model exhibits a striking feature: only lag 1 is nonzero. To see why, we expand the covariance between adjacent observations:

\begin{aligned} \gamma_1 &= E[(Y_t - \mu)(Y_{t-1} - \mu)] \\ &= E[(\varepsilon_t + \theta \varepsilon_{t-1})(\varepsilon_{t-1} + \theta \varepsilon_{t-2})] && \text{(substitute definition)} \\ &= E[\varepsilon_t \varepsilon_{t-1} + \theta \varepsilon_{t-1}^2 + \theta \varepsilon_t \varepsilon_{t-2} + \theta^2 \varepsilon_{t-1} \varepsilon_{t-2}] && \text{(expand product)} \\ &= 0 + \theta E[\varepsilon_{t-1}^2] + 0 + 0 && \text{(cross-terms are zero)} \\ &= \theta \sigma^2 \end{aligned}

where:

$\gamma_1$ : autocovariance at lag 1
$\theta$ : moving average coefficient
$\sigma^2$ : variance of the white noise shock
$Y_t$ : value of the series at time $t$
$\mu$ : mean of the process
$\varepsilon_t, \varepsilon_{t-1}, \varepsilon_{t-2}$ : white noise shocks

The cross-terms vanish because white noise shocks at different times are uncorrelated by assumption. The only surviving term is $\theta E[\varepsilon_{t-1}^2] = \theta \sigma^2$ , which arises from the single shock $\varepsilon_{t-1}$ that appears in both $Y_t$ and $Y_{t-1}$ . For $k \geq 2$ , there is no overlap in the $\varepsilon$ terms entering $Y_t$ and $Y_{t-k}$ , so $\gamma_k = 0$ .

The ACF is therefore:

\rho_k = \begin{cases} \frac{\theta}{1+\theta^2} & \text{if } k = 1 \\ 0 & \text{if } k > 1 \end{cases}

where:

$\rho_k$ : autocorrelation at lag $k$
$\theta$ : moving average coefficient

This cutoff property of the ACF is the diagnostic signature of MA models. An MA(q) process has $\rho_k = 0$ for all $k > q$ , meaning the autocorrelation function drops abruptly to zero after lag $q$ . This sharp cutoff contrasts with the gradual decay of AR models and provides the basis for identifying MA order from sample ACF plots.

Out[4]:

Visualization

Two bar charts showing sharp ACF cutoff for MA(1) and MA(2) processes. — Autocorrelation functions for MA(1) and MA(2) processes. The MA(1) process (left) exhibits significant autocorrelation only at lag 1, while the MA(2) process (right) shows significant values up to lag 2, demonstrating the characteristic cutoff property used for model identification.

General MA(q) ModelsLink Copied

The MA(q) model extends to include $q$ lagged shock terms:

Y_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \cdots + \theta_q \varepsilon_{t-q}

where:

$Y_t$ : value of the series at time $t$
$\mu$ : constant mean
$\varepsilon_{t-k}$ : white noise shock at lag $k$
$\theta_k$ : moving average coefficient at lag $k$

Using the lag operator:

Y_t = \mu + \Theta(L) \varepsilon_t

where:

$Y_t$ : value of the series at time $t$
$\Theta(L) = 1 + \theta_1 L + \theta_2 L^2 + \cdots + \theta_q L^q$ : moving average polynomial
$\mu$ : constant mean
$\varepsilon_t$ : white noise series

The moving average polynomial $\Theta(L)$ captures the distributed impact of shocks over time. The ACF cuts off after lag $q$ , while the PACF decays gradually. This is the opposite pattern from AR models. When we observe sample ACF with a sharp cutoff and PACF with gradual decay, this strongly suggests an MA process.

InvertibilityLink Copied

An MA model is invertible if it can be expressed as an AR( $\infty$ ) process, meaning we can write current shocks in terms of past observations. For MA(1), invertibility requires $|\theta| < 1$ . The importance of invertibility goes beyond mathematical elegance. Non-invertible MA models create identification problems: an MA(1) with parameter $\theta$ and another with parameter $1/\theta$ produce identical autocorrelation structures, making the models observationally equivalent. By convention, we restrict attention to invertible representations to ensure unique parameter estimates.

ARMA Models: Combining AR and MALink Copied

Real financial time series often exhibit both autoregressive and moving average characteristics. Neither pure AR nor pure MA models may adequately capture the dependence structure. The ARMA(p,q) model combines both components:

Y_t = c + \phi_1 Y_{t-1} + \cdots + \phi_p Y_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \cdots + \theta_q \varepsilon_{t-q}

where:

$Y_t$ : value of the series at time $t$
$c$ : constant term
$\phi_i$ : autoregressive coefficients
$\theta_j$ : moving average coefficients
$\varepsilon_t$ : white noise terms

In lag operator notation:

\Phi(L) Y_t = c + \Theta(L) \varepsilon_t

where:

$\Phi(L)$ : autoregressive polynomial
$c$ : constant term
$\Theta(L)$ : moving average polynomial

Why Combine AR and MA?Link Copied

The principle of parsimony motivates ARMA. Sometimes a pure AR model requires many lags to capture the autocorrelation structure, but a mixed ARMA model achieves the same fit with fewer parameters. This parsimony matters for both statistical and practical reasons. Models with fewer parameters are easier to interpret, more stable to estimate, and less prone to overfitting.

Consider an AR(1) process. Its ACF decays geometrically. An MA( $\infty$ ) could replicate this pattern, but that would require infinitely many parameters. Conversely, an MA(1) has an ACF that cuts off after lag 1, but an AR( $\infty$ ) could approximate it. ARMA provides flexibility to match complex autocorrelation patterns economically, using a small number of AR and MA terms to capture structures that would require many terms in either pure representation.

ARMA(1,1) ExampleLink Copied

The ARMA(1,1) specification combines the first-order properties of both models, providing a rich but still tractable framework:

Y_t = c + \phi Y_{t-1} + \varepsilon_t + \theta \varepsilon_{t-1}

where:

$Y_t$ : value of the series at time $t$
$Y_{t-1}$ : value of the series at time $t-1$
$c$ : constant term
$\phi$ : autoregressive coefficient
$\theta$ : moving average coefficient
$\varepsilon_t$ : white noise error term
$\varepsilon_{t-1}$ : white noise error term at time $t-1$

Its ACF starts at:

\rho_1 = \frac{(1+\phi\theta)(\phi+\theta)}{1+\theta^2+2\phi\theta}

where:

$\rho_1$ : autocorrelation at lag 1
$\phi$ : AR coefficient
$\theta$ : MA coefficient

and then decays geometrically following the AR structure:

\rho_k = \phi \rho_{k-1} \quad \text{for } k \geq 2

where:

$\rho_k$ : autocorrelation at lag $k$
$\phi$ : autoregressive coefficient
$\rho_{k-1}$ : autocorrelation at lag $k-1$

The MA component affects only the starting value of the autocorrelation sequence, while the AR component determines the rate of decay. The PACF also decays rather than cutting off, which is characteristic of processes with MA components. This makes ARMA models harder to identify than pure AR or MA processes, since neither the ACF nor PACF shows a clean cutoff pattern.

Stationarity and InvertibilityLink Copied

An ARMA(p,q) model is:

Stationary if all roots of $\Phi(z) = 0$ lie outside the unit circle
Invertible if all roots of $\Theta(z) = 0$ lie outside the unit circle

We typically work only with stationary and invertible representations. Stationarity ensures the process has stable statistical properties, while invertibility ensures unique parameter identification and allows us to recover the underlying shocks from observed data.

ARIMA Models: Handling Non-StationarityLink Copied

When the original series is non-stationary, such as asset prices or GDP, we can often achieve stationarity through differencing. Rather than modeling the original series directly, we model the changes. The ARIMA(p,d,q) model applies ARMA(p,q) to a $d$ -times differenced series, combining differencing to achieve stationarity with ARMA to model the resulting stationary dynamics.

The Differencing OperatorLink Copied

Define the difference operator $\nabla = 1 - L$ , which computes the change from one period to the next:

\nabla Y_t = Y_t - Y_{t-1}

where:

$\nabla$ : difference operator
$Y_t$ : value at time $t$
$Y_{t-1}$ : value at time $t-1$

For financial prices, first differencing yields returns. If the original price series has a unit root (is I(1)), then returns will be stationary. Second differencing applies the operator twice:

\nabla^2 Y_t = \nabla(\nabla Y_t) = Y_t - 2Y_{t-1} + Y_{t-2}

where:

$\nabla^2 Y_t$ : second difference of the series
$Y_t$ : value at time $t$
$Y_{t-1}$ : value at time $t-1$
$Y_{t-2}$ : value at time $t-2$

Second differencing is occasionally needed for economic series with both a unit root and a time-varying trend, though over-differencing can introduce artificial patterns and should be avoided.

ARIMA(p,d,q) SpecificationLink Copied

An ARIMA(p,d,q) model states that $\nabla^d Y_t$ follows an ARMA(p,q). The integrated component handles the transformation to stationarity, while the ARMA component models the resulting stationary dynamics. We can express this using the difference operator or the lag operator explicitly:

\begin{aligned} \Phi(L) \nabla^d Y_t &= c + \Theta(L) \varepsilon_t \\ \Phi(L)(1-L)^d Y_t &= c + \Theta(L) \varepsilon_t && \text{(substituting } \nabla = 1-L \text{)} \end{aligned}

where:

$Y_t$ : value of the series at time $t$
$\nabla$ : difference operator
$\Phi(L)$ : AR polynomial
$(1-L)^d$ : differencing operator of order $d$
$c$ : constant term
$\Theta(L)$ : MA polynomial
$\varepsilon_t$ : white noise error term

Common specifications include:

ARIMA(0,1,0): Random walk $Y_t = Y_{t-1} + \varepsilon_t$
ARIMA(0,1,1): Random walk plus MA(1) on returns
ARIMA(1,1,0): AR(1) on returns (first differences)
ARIMA(1,1,1): ARMA(1,1) on returns

Drift in ARIMA ModelsLink Copied

When $d \geq 1$ , the constant $c$ induces a deterministic trend in the original series. For ARIMA(0,1,0) with constant:

Y_t - Y_{t-1} = c + \varepsilon_t

where:

$Y_t$ : value of the series at time $t$
$Y_{t-1}$ : value at time $t-1$
$c$ : constant term (drift)
$\varepsilon_t$ : white noise error term

This says that each period, the series increases by $c$ on average plus a random shock. Summing the increments from $t=1$ to $T$ reveals the cumulative effect of the drift and shocks:

\begin{aligned} \sum_{t=1}^T (Y_t - Y_{t-1}) &= \sum_{t=1}^T (c + \varepsilon_t) \\ Y_T - Y_0 &= cT + \sum_{t=1}^T \varepsilon_t && \text{(telescoping sum)} \\ Y_T &= Y_0 + cT + \sum_{t=1}^T \varepsilon_t && \text{(rearrange)} \end{aligned}

where:

$Y_T$ : value at time $T$
$Y_0$ : initial value
$cT$ : deterministic trend component
$\sum \varepsilon_t$ : stochastic trend component

The term $cT$ is a linear trend that grows steadily with time. The accumulated shocks $\sum \varepsilon_t$ form a stochastic trend that wanders randomly. This is the random walk with drift model, commonly used for log stock prices. It captures the empirical observation that stock prices tend to rise over time (positive drift) while exhibiting substantial random variation around this trend.

Model IdentificationLink Copied

Identifying the appropriate $(p,d,q)$ specification is both art and science. The Box-Jenkins methodology provides a systematic approach.

Step 1: Assess StationarityLink Copied

First, examine the series visually and statistically:

Visual inspection: Does the series trend or wander? Does variance appear stable?
ACF analysis: For non-stationary series, the ACF decays very slowly
Unit root tests: The Augmented Dickey-Fuller (ADF) test formally tests for stationarity

If the series is non-stationary, difference it and repeat. Most financial price series require $d=1$ (returns), while some economic series may need $d=2$ .

Step 2: Examine ACF and PACFLink Copied

For the stationary (possibly differenced) series:

ACF and PACF patterns for model identification.

Model	ACF Pattern	PACF Pattern
AR(p)	Decays gradually	Cuts off after lag $p$
MA(q)	Cuts off after lag $q$	Decays gradually
ARMA(p,q)	Decays gradually	Decays gradually

The ACF and PACF of financial returns often suggest low-order models. Daily stock returns typically show little autocorrelation in levels but significant autocorrelation in squared returns (addressed in the next chapter on volatility modeling).

Out[5]:

Visualization

Six plots arranged in a grid showing ACF and PACF for AR, MA, and ARMA models. — Diagnostic ACF and PACF patterns for AR(2), MA(2), and ARMA(1,1) processes. AR models show gradual ACF decay with sharp PACF cutoff, MA models show sharp ACF cutoff with gradual PACF decay, and ARMA models exhibit gradual decay in both functions.

Step 3: Estimate Candidate ModelsLink Copied

Fit several plausible specifications and compare using information criteria:

\begin{aligned} \text{AIC} &= -2 \ln(\hat{L}) + 2k && \text{(Akaike Information Criterion)} \\ \text{BIC} &= -2 \ln(\hat{L}) + k \ln(n) && \text{(Bayesian Information Criterion)} \end{aligned}

where:

$\text{AIC}$ : Akaike Information Criterion
$\text{BIC}$ : Bayesian Information Criterion
$\hat{L}$ : maximized likelihood
$k$ : number of parameters
$n$ : sample size

Lower values indicate better models. BIC penalizes complexity more heavily and tends to select simpler models.

Step 4: Diagnostic CheckingLink Copied

After estimation, verify that residuals behave like white noise:\n\n- Ljung-Box test: Tests whether residual autocorrelations are jointly zero

Residual ACF: Should show no significant autocorrelations
Normality tests: Check if residuals are approximately normal (though financial residuals often have fat tails)

If diagnostics fail, revisit the specification.

Forecasting with ARIMA ModelsLink Copied

Time-series models are widely used for forecasting. Using observations up to time $T$ , we predict $Y_{T+h}$ for horizons $h = 1, 2, \ldots$ Forecast quality determines a model's practical value.

Point ForecastsLink Copied

The optimal forecast (minimizing mean squared error) is the conditional expectation:

\hat{Y}_{T+h|T} = E[Y_{T+h} | Y_T, Y_{T-1}, \ldots]

where:

$\hat{Y}_{T+h|T}$ : forecast for time $T+h$ made at time $T$
$E[\cdot]$ : conditional expectation operator
$Y_{T+h}$ : future value at horizon $h$
$Y_T, \ldots$ : information set available at time $T$

This conditional expectation uses all available information up to time $T$ to form the best prediction of the future value. For an AR(1) model $Y_t = c + \phi Y_{t-1} + \varepsilon_t$ , the one-step-ahead forecast is simply:

\hat{Y}_{T+1|T} = c + \phi Y_T

where:

$\hat{Y}_{T+1|T}$ : one-step-ahead forecast
$c$ : constant term
$\phi$ : autoregressive coefficient
$Y_T$ : last observed value

For longer horizons, we iterate the forecasting equation, replacing unknown future values with their forecasts:

\begin{aligned} \hat{Y}_{T+2|T} &= c + \phi \hat{Y}_{T+1|T} && \text{(iterate AR relation)} \\ &= c + \phi(c + \phi Y_T) && \text{(substitute 1-step forecast)} \\ &= c(1+\phi) + \phi^2 Y_T && \text{(simplify)} \end{aligned}

where:

$\hat{Y}_{T+2|T}$ : two-step-ahead forecast
$\hat{Y}_{T+1|T}$ : one-step-ahead forecast
$c$ : constant term
$\phi$ : autoregressive coefficient
$Y_T$ : last observed value

More generally, the forecast formula reveals how the current observation influences predictions at any horizon:

\hat{Y}_{T+h|T} = \mu + \phi^h (Y_T - \mu)

where:

$\hat{Y}_{T+h|T}$ : forecast for $T+h$ given information at $T$
$\mu = c/(1-\phi)$ : unconditional mean
$\phi$ : AR coefficient
$h$ : forecast horizon

The forecast is a weighted average of the unconditional mean $\mu$ and the current deviation from that mean. The weight on the current deviation is $\phi^h$ , which shrinks toward zero as the horizon increases. As $h \to \infty$ , forecasts converge to the unconditional mean: $\hat{Y}_{T+h|T} \to \mu$ . This reflects the fundamental property that for stationary series, distant future values are increasingly disconnected from current conditions.

Forecast IntervalsLink Copied

The forecast error at horizon $h$ is the difference between the realized value and our prediction:

e_{T+h|T} = Y_{T+h} - \hat{Y}_{T+h|T}

where:

$e_{T+h|T}$ : forecast error at horizon $h$
$Y_{T+h}$ : realized value at time $T+h$
$\hat{Y}_{T+h|T}$ : forecast made at time $T$

For AR(1), the forecast error variance grows with horizon as uncertainty accumulates:

\text{Var}(e_{T+h|T}) = \sigma^2 \frac{1 - \phi^{2h}}{1 - \phi^2}

where:

$\text{Var}(e_{T+h|T})$ : variance of the forecast error at horizon $h$
$\sigma^2$ : innovation variance
$\phi$ : AR coefficient

As $h \to \infty$ , this converges to the unconditional variance $\sigma^2/(1-\phi^2)$ . The uncertainty approaches the unconditional variance because at very long horizons, our conditional forecast converges to the unconditional mean, and the forecast error becomes essentially the deviation of the future observation from that mean.

A 95% forecast interval is:

\hat{Y}_{T+h|T} \pm 1.96 \sqrt{\text{Var}(e_{T+h|T})}

where:

$\hat{Y}_{T+h|T}$ : forecast for time $T+h$ given information at $T$
$1.96$ : z-score for 95% confidence
$\text{Var}(e_{T+h|T})$ : forecast error variance

This widening of forecast intervals reflects increasing uncertainty at longer horizons, a fundamental feature of time-series prediction. The honest acknowledgment of forecast uncertainty is crucial for risk management and decision-making under uncertainty.

Out[6]:

Visualization

Line plot showing how forecast variance increases with horizon for different AR coefficients. — Forecast error variance as a function of horizon for AR(1) models with varying persistence ($\phi$). Higher persistence levels delay the convergence of forecast uncertainty to the unconditional variance limit (dashed lines).

Implementation: Analyzing S&P 500 ReturnsLink Copied

Let's apply these concepts to real financial data, analyzing daily S&P 500 returns.

In[7]:

Code

import numpy as np

np.random.seed(42)

import numpy as np

np.random.seed(42)

Simulating Financial DataLink Copied

We'll work with a simulated dataset that mimics S&P 500 characteristics, then show how to identify and fit ARIMA models.

In[8]:

Code

import numpy as np
import pandas as pd

# Simulate a price series with realistic characteristics
n_days = 1000
mu_daily = 0.0003  # ~7.5% annual return
sigma_daily = 0.012  # ~19% annual volatility

# Generate returns with slight AR(1) structure (common in short-term returns)
returns = np.zeros(n_days)
phi = 0.05  # Small positive autocorrelation
for t in range(1, n_days):
    returns[t] = (
        mu_daily
        + phi * (returns[t - 1] - mu_daily)
        + sigma_daily * np.random.randn()
    )

# Generate price series
prices = 100 * np.exp(np.cumsum(returns))

# Create date index
dates = pd.date_range(start="2020-01-01", periods=n_days, freq="B")
price_series = pd.Series(prices, index=dates, name="Price")
return_series = pd.Series(returns, index=dates, name="Returns")

import numpy as np
import pandas as pd

# Simulate a price series with realistic characteristics
n_days = 1000
mu_daily = 0.0003  # ~7.5% annual return
sigma_daily = 0.012  # ~19% annual volatility

# Generate returns with slight AR(1) structure (common in short-term returns)
returns = np.zeros(n_days)
phi = 0.05  # Small positive autocorrelation
for t in range(1, n_days):
    returns[t] = (
        mu_daily
        + phi * (returns[t - 1] - mu_daily)
        + sigma_daily * np.random.randn()
    )

# Generate price series
prices = 100 * np.exp(np.cumsum(returns))

# Create date index
dates = pd.date_range(start="2020-01-01", periods=n_days, freq="B")
price_series = pd.Series(prices, index=dates, name="Price")
return_series = pd.Series(returns, index=dates, name="Returns")

Visualizing Prices vs. ReturnsLink Copied

Out[9]:

Visualization

Two stacked plots showing trending prices and mean-reverting returns. — Simulated asset price and return trajectories over 1000 days. The price series (top) exhibits non-stationary random walk behavior with drift, while the corresponding returns (bottom) show stationary fluctuations around a constant mean.

Simulated asset price and return trajectories over 1000 days. The price series (top) exhibits non-stationary random walk behavior with drift, while the corresponding returns (bottom) show stationary fluctuations around a constant mean.

The price series shows a clear upward trend, violating the constant mean requirement for stationarity. Returns, however, fluctuate around a stable mean near zero, making them suitable for ARMA modeling.

Computing ACF and PACFLink Copied

In[10]:

Code

def compute_acf(series, max_lag=20):
    """Compute sample autocorrelation function."""
    n = len(series)
    mean = series.mean()
    var = series.var()
    acf = np.zeros(max_lag + 1)

    for k in range(max_lag + 1):
        cov = (
            np.sum((series[k:] - mean) * (series[:-k] - mean)) / n
            if k > 0
            else var
        )
        acf[k] = cov / var

    return acf


def compute_pacf(series, max_lag=20):
    """Compute sample partial autocorrelation using Durbin-Levinson."""
    acf = compute_acf(series, max_lag)
    pacf = np.zeros(max_lag + 1)
    pacf[0] = 1.0
    pacf[1] = acf[1]

    phi = np.zeros((max_lag + 1, max_lag + 1))
    phi[1, 1] = acf[1]

    for k in range(2, max_lag + 1):
        num = acf[k] - sum(phi[k - 1, j] * acf[k - j] for j in range(1, k))
        den = 1 - sum(phi[k - 1, j] * acf[j] for j in range(1, k))
        phi[k, k] = num / den
        for j in range(1, k):
            phi[k, j] = phi[k - 1, j] - phi[k, k] * phi[k - 1, k - j]
        pacf[k] = phi[k, k]

    return pacf

def compute_acf(series, max_lag=20):
    """Compute sample autocorrelation function."""
    n = len(series)
    mean = series.mean()
    var = series.var()
    acf = np.zeros(max_lag + 1)

    for k in range(max_lag + 1):
        cov = (
            np.sum((series[k:] - mean) * (series[:-k] - mean)) / n
            if k > 0
            else var
        )
        acf[k] = cov / var

    return acf


def compute_pacf(series, max_lag=20):
    """Compute sample partial autocorrelation using Durbin-Levinson."""
    acf = compute_acf(series, max_lag)
    pacf = np.zeros(max_lag + 1)
    pacf[0] = 1.0
    pacf[1] = acf[1]

    phi = np.zeros((max_lag + 1, max_lag + 1))
    phi[1, 1] = acf[1]

    for k in range(2, max_lag + 1):
        num = acf[k] - sum(phi[k - 1, j] * acf[k - j] for j in range(1, k))
        den = 1 - sum(phi[k - 1, j] * acf[j] for j in range(1, k))
        phi[k, k] = num / den
        for j in range(1, k):
            phi[k, j] = phi[k - 1, j] - phi[k, k] * phi[k - 1, k - j]
        pacf[k] = phi[k, k]

    return pacf

In[11]:

Code

max_lag = 20
acf_returns = compute_acf(returns, max_lag)
pacf_returns = compute_pacf(returns, max_lag)

# Confidence bounds (approximate 95% under null of white noise)
conf_bound = 1.96 / np.sqrt(n_days)

max_lag = 20
acf_returns = compute_acf(returns, max_lag)
pacf_returns = compute_pacf(returns, max_lag)

# Confidence bounds (approximate 95% under null of white noise)
conf_bound = 1.96 / np.sqrt(n_days)

Out[12]:

Visualization

Side by side bar charts showing ACF and PACF with confidence bands. — Sample Autocorrelation (ACF) and Partial Autocorrelation (PACF) functions for simulated returns. The significant spike at lag 1 in the PACF, combined with the decay in the ACF, identifies the underlying process as an AR(1) model.

The ACF shows the lag-1 autocorrelation is significant, with subsequent lags decaying. The PACF shows a significant spike at lag 1 followed by values within the confidence bands. This pattern, with PACF cutting off after lag 1 and ACF decaying, strongly suggests an AR(1) model.

Estimating an AR(1) ModelLink Copied

We'll estimate the AR(1) parameters using ordinary least squares, which is consistent for stationary AR models.

In[13]:

Code

# Prepare data for AR(1) estimation
Y = returns[1:]  # Y_t
Y_lag = returns[:-1]  # Y_{t-1}

# OLS estimation: Y_t = c + phi * Y_{t-1} + epsilon_t
X = np.column_stack([np.ones(len(Y_lag)), Y_lag])
beta = np.linalg.lstsq(X, Y, rcond=None)[0]
c_hat, phi_hat = beta

# Compute residuals
Y_hat = X @ beta
residuals = Y - Y_hat
sigma_hat = np.std(residuals)

# Compute standard errors
XtX_inv = np.linalg.inv(X.T @ X)
var_beta = sigma_hat**2 * XtX_inv
se_c = np.sqrt(var_beta[0, 0])
se_phi = np.sqrt(var_beta[1, 1])

# Prepare data for AR(1) estimation
Y = returns[1:]  # Y_t
Y_lag = returns[:-1]  # Y_{t-1}

# OLS estimation: Y_t = c + phi * Y_{t-1} + epsilon_t
X = np.column_stack([np.ones(len(Y_lag)), Y_lag])
beta = np.linalg.lstsq(X, Y, rcond=None)[0]
c_hat, phi_hat = beta

# Compute residuals
Y_hat = X @ beta
residuals = Y - Y_hat
sigma_hat = np.std(residuals)

# Compute standard errors
XtX_inv = np.linalg.inv(X.T @ X)
var_beta = sigma_hat**2 * XtX_inv
se_c = np.sqrt(var_beta[0, 0])
se_phi = np.sqrt(var_beta[1, 1])

Out[14]:

Console

AR(1) Model Estimation Results
========================================
Constant (c):     0.000514  (SE: 0.000372)
AR(1) coef (φ):   0.0429  (SE: 0.0316)
Residual std:     0.011748

Implied mean:     0.000537
Stationarity:     Yes (|φ| = 0.0429)

The estimated AR(1) coefficient is close to our simulated value of 0.05. The model is stationary since $|\hat{\phi}| < 1$ .

Diagnostic CheckingLink Copied

In[15]:

Code

from scipy import stats


# Ljung-Box test for residual autocorrelation
def ljung_box_test(residuals, max_lag=10):
    """Compute Ljung-Box Q statistic."""
    n = len(residuals)
    acf_resid = compute_acf(residuals, max_lag)

    Q = (
        n
        * (n + 2)
        * sum(acf_resid[k] ** 2 / (n - k) for k in range(1, max_lag + 1))
    )
    p_value = 1 - stats.chi2.cdf(Q, df=max_lag)

    return Q, p_value


Q_stat, p_val = ljung_box_test(residuals)
acf_residuals = compute_acf(residuals, max_lag)

from scipy import stats


# Ljung-Box test for residual autocorrelation
def ljung_box_test(residuals, max_lag=10):
    """Compute Ljung-Box Q statistic."""
    n = len(residuals)
    acf_resid = compute_acf(residuals, max_lag)

    Q = (
        n
        * (n + 2)
        * sum(acf_resid[k] ** 2 / (n - k) for k in range(1, max_lag + 1))
    )
    p_value = 1 - stats.chi2.cdf(Q, df=max_lag)

    return Q, p_value


Q_stat, p_val = ljung_box_test(residuals)
acf_residuals = compute_acf(residuals, max_lag)

Out[16]:

Console

Residual Diagnostics
========================================
Ljung-Box Q(10):  7.44
p-value:          0.6832

Residual mean:    -0.000000
Residual std:     0.011748
Skewness:         0.1188
Kurtosis:         0.0647

The Ljung-Box test checks whether residual autocorrelations are collectively significant. A high p-value suggests the residuals behave like white noise, indicating an adequate model.

Out[17]:

Visualization

Two panel diagnostic plot with residuals over time and residual ACF. — Diagnostic plots for AR(1) model residuals. The residual time series (top) shows no obvious patterns or trends, and the residual ACF (bottom) confirms the absence of significant serial correlation, validating the model fit.

Diagnostic plots for AR(1) model residuals. The residual time series (top) shows no obvious patterns or trends, and the residual ACF (bottom) confirms the absence of significant serial correlation, validating the model fit.

ForecastingLink Copied

With the estimated model, we can generate forecasts with confidence intervals.

In[18]:

Code

# Generate forecasts
forecast_horizon = 20
Y_T = returns[-1]  # Last observed return
mu_unconditional = c_hat / (1 - phi_hat)

forecasts = np.zeros(forecast_horizon)
forecast_var = np.zeros(forecast_horizon)

for h in range(forecast_horizon):
    # Point forecast
    forecasts[h] = mu_unconditional + (phi_hat ** (h + 1)) * (
        Y_T - mu_unconditional
    )

    # Forecast variance
    forecast_var[h] = (
        sigma_hat**2 * (1 - phi_hat ** (2 * (h + 1))) / (1 - phi_hat**2)
    )

# 95% confidence intervals
lower_95 = forecasts - 1.96 * np.sqrt(forecast_var)
upper_95 = forecasts + 1.96 * np.sqrt(forecast_var)

# 80% confidence intervals
lower_80 = forecasts - 1.28 * np.sqrt(forecast_var)
upper_80 = forecasts + 1.28 * np.sqrt(forecast_var)

# Generate forecasts
forecast_horizon = 20
Y_T = returns[-1]  # Last observed return
mu_unconditional = c_hat / (1 - phi_hat)

forecasts = np.zeros(forecast_horizon)
forecast_var = np.zeros(forecast_horizon)

for h in range(forecast_horizon):
    # Point forecast
    forecasts[h] = mu_unconditional + (phi_hat ** (h + 1)) * (
        Y_T - mu_unconditional
    )

    # Forecast variance
    forecast_var[h] = (
        sigma_hat**2 * (1 - phi_hat ** (2 * (h + 1))) / (1 - phi_hat**2)
    )

# 95% confidence intervals
lower_95 = forecasts - 1.96 * np.sqrt(forecast_var)
upper_95 = forecasts + 1.96 * np.sqrt(forecast_var)

# 80% confidence intervals
lower_80 = forecasts - 1.28 * np.sqrt(forecast_var)
upper_80 = forecasts + 1.28 * np.sqrt(forecast_var)

Out[19]:

Visualization

Line plot showing point forecasts converging to mean with widening confidence bands. — Twenty-day return forecasts from the AR(1) model with 80% and 95% confidence intervals. The point forecasts (red line) rapidly converge to the unconditional mean (green dashed line), while the confidence intervals widen to the unconditional variance limit.

The forecasts converge quickly to the unconditional mean because the AR(1) coefficient is small. The confidence intervals widen initially but stabilize as they approach the unconditional variance.

Using statsmodels for ARIMALink Copied

In practice, we use established libraries for ARIMA estimation. The statsmodels package provides comprehensive time-series functionality.

In[20]:

Code

from statsmodels.tsa.stattools import adfuller

# Augmented Dickey-Fuller test for stationarity
adf_result = adfuller(returns)

from statsmodels.tsa.stattools import adfuller

# Augmented Dickey-Fuller test for stationarity
adf_result = adfuller(returns)

Out[21]:

Console

Augmented Dickey-Fuller Test for Returns
=============================================
ADF Statistic:    -30.2452
p-value:          0.000000
Critical values:
  1%:         -3.4369
  5%:         -2.8644
  10%:         -2.5683

Conclusion: Stationary (reject unit root)

The ADF test strongly rejects the null hypothesis of a unit root, confirming that returns are stationary and suitable for ARMA modeling without differencing.

In[22]:

Code

# Fit ARIMA models with different specifications
from statsmodels.tsa.arima.model import ARIMA

models = {}
for p in range(3):
    for q in range(3):
        if p == 0 and q == 0:
            continue  # Skip trivial model
        try:
            model = ARIMA(returns, order=(p, 0, q))
            fitted = model.fit()
            models[(p, 0, q)] = {
                "aic": fitted.aic,
                "bic": fitted.bic,
                "model": fitted,
            }
        except:
            pass  # Skip failed fits

# Sort models by BIC for comparison
sorted_models = sorted(models.items(), key=lambda x: x[1]["bic"])

# Fit ARIMA models with different specifications
from statsmodels.tsa.arima.model import ARIMA

models = {}
for p in range(3):
    for q in range(3):
        if p == 0 and q == 0:
            continue  # Skip trivial model
        try:
            model = ARIMA(returns, order=(p, 0, q))
            fitted = model.fit()
            models[(p, 0, q)] = {
                "aic": fitted.aic,
                "bic": fitted.bic,
                "model": fitted,
            }
        except:
            pass  # Skip failed fits

# Sort models by BIC for comparison
sorted_models = sorted(models.items(), key=lambda x: x[1]["bic"])

Out[23]:

Console

Model Comparison (ARIMA)
==================================================
Model           AIC             BIC            
--------------------------------------------------
ARIMA(1,0,0)      -6045.19        -6030.47       
ARIMA(0,0,1)      -6045.19        -6030.47       
ARIMA(1,0,1)      -6043.19        -6023.56       
ARIMA(2,0,0)      -6043.19        -6023.56       
ARIMA(0,0,2)      -6043.19        -6023.56       
ARIMA(1,0,2)      -6042.83        -6018.29       
ARIMA(2,0,1)      -6041.19        -6016.65       
ARIMA(2,0,2)      -6040.26        -6010.81

Information criteria help select the best model balancing fit and parsimony. The BIC, with its heavier penalty for complexity, often selects simpler models appropriate for financial data.

In[24]:

Code

# Fit best model (AR(1) based on BIC)
if sorted_models:
    best_order = sorted_models[0][0]
    best_model = models[best_order]["model"]
else:
    # Fallback to AR(1) if no models were successfully fit
    from statsmodels.tsa.arima.model import ARIMA

    model = ARIMA(returns, order=(1, 0, 0))
    best_model = model.fit()
    best_order = (1, 0, 0)

# Fit best model (AR(1) based on BIC)
if sorted_models:
    best_order = sorted_models[0][0]
    best_model = models[best_order]["model"]
else:
    # Fallback to AR(1) if no models were successfully fit
    from statsmodels.tsa.arima.model import ARIMA

    model = ARIMA(returns, order=(1, 0, 0))
    best_model = model.fit()
    best_order = (1, 0, 0)

Out[25]:

Console


Best Model: ARIMA(1, 0, 0)
==================================================
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0005      0.000      1.374      0.169      -0.000       0.001
ar.L1          0.0429      0.032      1.326      0.185      -0.020       0.106
sigma2         0.0001   6.08e-06     22.672      0.000       0.000       0.000
==============================================================================

The model summary confirms that the AR(1) coefficient is statistically significant and close to the true value of 0.05. The low p-values indicate strong evidence against zero coefficients.

ARIMA for Price SeriesLink Copied

When working with prices directly, we need ARIMA with $d=1$ . This models returns (first differences) using ARMA.

In[26]:

Code

# Fit ARIMA(1,1,0) to prices, which is equivalent to AR(1) on returns
price_model = ARIMA(prices, order=(1, 1, 0))
price_fitted = price_model.fit()

# Generate price forecasts
price_forecast = price_fitted.get_forecast(steps=20)
price_pred = price_forecast.predicted_mean
price_ci = price_forecast.conf_int(alpha=0.05)

# Fit ARIMA(1,1,0) to prices, which is equivalent to AR(1) on returns
price_model = ARIMA(prices, order=(1, 1, 0))
price_fitted = price_model.fit()

# Generate price forecasts
price_forecast = price_fitted.get_forecast(steps=20)
price_pred = price_forecast.predicted_mean
price_ci = price_forecast.conf_int(alpha=0.05)

Out[27]:

Visualization

Price forecast plot with confidence bands widening over forecast horizon. — Price forecasts from an ARIMA(1,1,0) model over a 20-day horizon. The confidence intervals (red shaded region) widen significantly faster than for returns, illustrating the accumulation of uncertainty in integrated I(1) processes.

The confidence interval for price forecasts widens dramatically compared to return forecasts. This reflects the accumulation of return forecast errors: uncertainty compounds over time when predicting integrated series.

Financial Applications of ARIMA ModelsLink Copied

ARIMA models find practical application across several areas of quantitative finance. This section examines three key applications: exploiting mean-reverting behavior in spreads, quantifying risk through conditional forecasts, and projecting macroeconomic variables.

Mean Reversion TradingLink Copied

Interest rate spreads and basis trades often exhibit mean-reverting behavior well-captured by AR models. If a spread $S_t$ follows AR(1) with $\phi < 1$ :

S_t = \mu + \phi(S_{t-1} - \mu) + \varepsilon_t

where:

$S_t$ : spread at time $t$
$\mu$ : long-run mean spread
$\phi$ : autoregressive coefficient (determines speed of reversion; must be $< 1$ )
$\varepsilon_t$ : random shock

When the spread deviates from $\mu$ , the expected deviation decays geometrically. The half-life measures the time $\tau$ required for this deviation to shrink by 50% ( $E[S_{t+\tau}-\mu] = \frac{1}{2}(S_t-\mu)$ ). Since the expected deviation at $\tau$ is $\phi^\tau(S_t - \mu)$ , we solve:

\begin{aligned} \phi^\tau &= 0.5 && \text{(decay condition)} \\ \tau \ln(\phi) &= \ln(0.5) && \text{(take logs)} \\ \tau &= \frac{-\ln(2)}{\ln(\phi)} && \text{(solve for } \tau \text{)} \end{aligned}

where:

$\tau$ : half-life of the mean reversion process
$\phi$ : autoregressive coefficient

For $\phi = 0.95$ , the half-life is approximately 13.5 periods.

Out[28]:

Visualization

Line plot showing exponential relationship between AR coefficient and mean reversion half-life. — Relationship between AR(1) persistence (phi) and mean reversion half-life. As phi approaches 1, the half-life increases exponentially, indicating that highly persistent processes take much longer to return to their long-run mean after a shock.

Risk ManagementLink Copied

ARIMA forecasts provide inputs for Value-at-Risk calculations. The conditional distribution of future returns:

r_{T+h|T} \sim N(\hat{r}_{T+h|T}, \sigma^2_{h|T})

where:

$r_{T+h|T}$ : distribution of returns at $T+h$ conditional on time $T$
$\hat{r}_{T+h|T}$ : conditional expected return
$\sigma^2_{h|T}$ : conditional forecast variance

allows computation of forward-looking VaR that accounts for current conditions rather than using unconditional distributions.

Economic ForecastingLink Copied

Central bank policy analysis uses ARIMA models for inflation, GDP growth, and employment forecasts. These series often require $d=1$ or $d=2$ and seasonal components (SARIMA) that extend the basic ARIMA framework.

Limitations and Practical ConsiderationsLink Copied

Time-series models help analyze serial dependence, but they have limitations with financial data.

Linear structure assumption: ARMA models capture only linear dependencies. As we observed in the chapter on stylized facts, financial returns exhibit significant nonlinear features. While squared returns may be highly autocorrelated (indicating predictable volatility patterns), raw returns often show minimal linear autocorrelation. This means ARMA models may suggest returns are nearly unpredictable when substantial predictability exists in higher moments. The next chapter on GARCH models addresses this limitation by modeling the conditional variance process.

Stationarity requirements: The assumption that the data-generating process remains stable over time is often violated in financial markets. Structural breaks occur due to regulatory changes, market crises, and regime shifts. An ARIMA model estimated on pre-2008 data would perform poorly during the financial crisis. Adaptive estimation windows and regime-switching models partially address this, but no mechanical solution exists for genuine structural change.

Gaussian innovations: The standard ARIMA framework assumes normally distributed innovations, yet financial returns consistently exhibit fat tails. This matters most for risk applications: forecast intervals based on normal distributions systematically underestimate tail risk. Using Student-t or other fat-tailed distributions for the innovation process improves this, though at the cost of added complexity.

Limited forecasting horizon: For typical equity returns with autocorrelation coefficients near zero, ARIMA models provide little forecasting value. The optimal forecast quickly converges to the unconditional mean. Where ARIMA models excel is in series with genuine persistent dynamics: interest rate spreads, volatility measures (the square of returns), and macroeconomic indicators.

Model selection uncertainty: The Box-Jenkins methodology requires judgment calls about differencing order, lag selection, and handling of outliers. Different practitioners may arrive at different specifications from the same data. Information criteria help but don't eliminate subjectivity. In high-frequency trading applications, this model uncertainty itself becomes a risk factor.

Despite these limitations, ARIMA models established the foundation for modern financial time-series analysis. The concepts of stationarity, autocorrelation structure, and systematic model identification remain central even as we extend to more sophisticated approaches.

SummaryLink Copied

This chapter developed the theory and practice of time-series modeling for financial data. We established stationarity as the foundational property enabling meaningful statistical analysis and showed that while asset prices are non-stationary, returns typically achieve approximate stationarity.

Autoregressive models express current values as linear functions of past values, with the AR coefficient controlling the persistence of shocks. The stationarity condition $|\phi| < 1$ ensures the series remains bounded, while the PACF's cutoff property identifies AR order. Moving average models instead express observations as functions of past innovations, with the ACF cutting off after lag $q$ . ARMA combines both structures for parsimonious modeling of complex autocorrelation patterns.

For non-stationary series, ARIMA extends the framework through differencing. The integrated component handles trends and unit roots common in economic and price data. The Box-Jenkins methodology provides a systematic approach: assess stationarity, examine ACF/PACF patterns, estimate candidate models, compare using information criteria, and verify through residual diagnostics.

Forecasting from ARIMA models produces point predictions and uncertainty intervals that widen with horizon. This quantification of forecast uncertainty is essential for risk management applications. The tools developed here, particularly ACF analysis and model diagnostics, will prove valuable throughout the remainder of this book as we explore volatility modeling, regression methods, and machine learning approaches to financial prediction.

QuizLink Copied

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about time-series models for financial data.

Loading component...

Comments

Back to Quantitative Finance

Previous Chapter

Valuation of Interest Rate Derivatives

Next Chapter

Modeling Volatility and GARCH Family

Coming Soon

Reference

BIBTEXAcademic

@misc{timeseriesmodelsforfinancialdataarmaarima, author = {Michael Brenndoerfer}, title = {Time-Series Models for Financial Data: AR, MA & ARIMA}, year = {2025}, url = {https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-01-01} }

APAAcademic

Michael Brenndoerfer (2025). Time-Series Models for Financial Data: AR, MA & ARIMA. Retrieved from https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting

MLAAcademic

Michael Brenndoerfer. "Time-Series Models for Financial Data: AR, MA & ARIMA." 2026. Web. today. <https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting>.

CHICAGOAcademic

Michael Brenndoerfer. "Time-Series Models for Financial Data: AR, MA & ARIMA." Accessed today. https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Time-Series Models for Financial Data: AR, MA & ARIMA'. Available at: https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting (Accessed: today).

SimpleBasic

Michael Brenndoerfer (2025). Time-Series Models for Financial Data: AR, MA & ARIMA. https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting

Direct link:

https://mbrenndoerfer.com/writing/time-series-models-arima-financial-forecasting

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, leading AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications Contact Books

Time-Series Models for Financial Data: AR, MA & ARIMA

Time-Series Models for Financial DataLink Copied

Stationarity: The Foundation of Time-Series AnalysisLink Copied

Strict and Weak StationarityLink Copied

Stationarity in Financial DataLink Copied

Autoregressive (AR) ModelsLink Copied

The AR(1) ModelLink Copied

Stationarity ConditionsLink Copied

ACF of AR(1)Link Copied

General AR(p) ModelsLink Copied

Partial Autocorrelation Function (PACF)Link Copied

Moving Average (MA) ModelsLink Copied

The MA(1) ModelLink Copied

Properties of MA(1)Link Copied

General MA(q) ModelsLink Copied

InvertibilityLink Copied

ARMA Models: Combining AR and MALink Copied

Why Combine AR and MA?Link Copied

ARMA(1,1) ExampleLink Copied

Stationarity and InvertibilityLink Copied

ARIMA Models: Handling Non-StationarityLink Copied

The Differencing OperatorLink Copied

ARIMA(p,d,q) SpecificationLink Copied

Drift in ARIMA ModelsLink Copied

Model IdentificationLink Copied

Step 1: Assess StationarityLink Copied

Step 2: Examine ACF and PACFLink Copied

Step 3: Estimate Candidate ModelsLink Copied

Step 4: Diagnostic CheckingLink Copied

Forecasting with ARIMA ModelsLink Copied

Point ForecastsLink Copied

Forecast IntervalsLink Copied

Implementation: Analyzing S&P 500 ReturnsLink Copied

Simulating Financial DataLink Copied

Visualizing Prices vs. ReturnsLink Copied

Computing ACF and PACFLink Copied

Estimating an AR(1) ModelLink Copied

Diagnostic CheckingLink Copied

ForecastingLink Copied

Using statsmodels for ARIMALink Copied

ARIMA for Price SeriesLink Copied

Financial Applications of ARIMA ModelsLink Copied

Mean Reversion TradingLink Copied

Risk ManagementLink Copied

Economic ForecastingLink Copied

Limitations and Practical ConsiderationsLink Copied

SummaryLink Copied

QuizLink Copied

Comments

Reference

About the author: Michael Brenndoerfer

Related Content

Hypothesis Testing Summary & Practical Guide: Reporting, Test Selection & scipy.stats

Multiple Comparisons: FWER, FDR, Bonferroni, Holm & Benjamini-Hochberg

Effect Sizes and Statistical Significance: Cohen's d & Practical Significance

Stay updated

Comments

About the author: Michael Brenndoerfer

Related Content

Hypothesis Testing Summary & Practical Guide: Reporting, Test Selection & scipy.stats

Multiple Comparisons: FWER, FDR, Bonferroni, Holm & Benjamini-Hochberg

Effect Sizes and Statistical Significance: Cohen's d & Practical Significance

Stay updated