Probability Distributions in Finance: Normal, Lognormal & Fat Tails

Michael BrenndoerferOctober 21, 202565 min read

Master probability distributions essential for quantitative finance: normal, lognormal, binomial, Poisson, and fat-tailed distributions with Python examples.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Common Probability Distributions in Finance

Probability distributions form the mathematical backbone of quantitative finance. Every time you estimate the risk of a portfolio, price an option, or model the likelihood of default, you are making assumptions about how random outcomes are distributed. These assumptions determine whether your models produce reasonable predictions or catastrophic failures.

The choice of distribution matters enormously. The 2008 financial crisis revealed that many risk models had dramatically underestimated the probability of extreme market moves. These models often assumed that returns followed a normal distribution, which assigns vanishingly small probabilities to the kind of large daily moves that occurred repeatedly during the crisis. Understanding the properties and limitations of different distributions has direct consequences for trading strategies, risk limits, and regulatory capital requirements.

This chapter covers the distributions you will encounter most frequently in quantitative finance: the normal distribution that underlies classical portfolio theory, the lognormal distribution that models asset prices, discrete distributions like the binomial and Poisson that appear in derivatives pricing and event modeling, and fat-tailed distributions that better capture the extreme events observed in real markets. For each distribution, we examine both the mathematical properties and the financial intuition behind its use.

The Normal Distribution

The normal distribution, also called the Gaussian distribution, is the starting point for most financial models. Its prominence stems from the Central Limit Theorem, which states that the sum of many independent random variables tends toward a normal distribution regardless of the underlying distributions. Since asset returns can be viewed as the aggregate effect of many independent pieces of information hitting the market, the normal distribution provides a natural first approximation. This explains why the normal distribution became the default assumption in finance before computers made alternative distributions practical.

Normal Distribution

A continuous probability distribution characterized by its symmetric, bell-shaped curve. It is fully specified by two parameters: the mean μ\mu (center) and variance σ2\sigma^2 (spread). Also known as the Gaussian distribution.

Mathematical Properties

To understand what it means for a random variable to follow a normal distribution, we must first grasp the concept of a probability density function. Unlike discrete random variables that take on specific values with certain probabilities, a continuous random variable like one following a normal distribution can take any value along the real line. The probability density function tells us how likely the variable is to fall in any particular region—the higher the density at a point, the more probable values near that point become.

A random variable XX follows a normal distribution with mean μ\mu and variance σ2\sigma^2, written XN(μ,σ2)X \sim N(\mu, \sigma^2), if its probability density function is:

f(x)=1σ2πexp((xμ)22σ2),<x<f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad -\infty < x < \infty

Each component serves a specific purpose. Let's examine each component:

  • xx: the value at which we evaluate the density
  • μ\mu: the mean (center of the distribution)
  • σ\sigma: the standard deviation (controls the spread)
  • 1σ2π\frac{1}{\sigma\sqrt{2\pi}}: a normalization constant ensuring the density integrates to 1
  • (xμ)2/2σ2(x-\mu)^2/2\sigma^2: measures how many standard deviations xx is from the mean, squared and scaled

The heart of the normal distribution lies in the exponential term. The quantity (xμ)2(x-\mu)^2 measures the squared distance between the point xx and the center of the distribution μ\mu. Dividing by 2σ22\sigma^2 scales this distance relative to the spread of the distribution. A value two units away from the mean is much more unusual when σ=0.5\sigma = 0.5 than when σ=10\sigma = 10, because the standardized distance is larger. The negative sign in the exponent ensures that the function decreases as we move away from the mean, and the exponential function transforms this squared distance into a probability density that decays smoothly and symmetrically.

The exponential of a negative quadratic creates the characteristic bell shape: values near the mean have high probability density, while values far from the mean have exponentially decreasing density. This rapid decay is why the normal distribution assigns negligible probability to extreme events. The probability of being more than a few standard deviations from the mean is vanishingly small, a property that proves problematic when modeling financial markets where extreme moves occur more frequently than this decay suggests. The quadratic term in the exponent is the key: because squaring grows faster than linear growth, and the exponential function amplifies this, the tails of the normal distribution thin out extraordinarily quickly.

The normalization constant 1σ2π\frac{1}{\sigma\sqrt{2\pi}} deserves special attention. This factor ensures that when we integrate the density function over all possible values from negative infinity to positive infinity, we obtain exactly 1—the total probability. The 2π\sqrt{2\pi} arises from the Gaussian integral, one of the most famous results in calculus, while the σ\sigma in the denominator accounts for how wider distributions (larger σ\sigma) must have lower peak heights to maintain the same total area under the curve.

Key parameters:

  • Mean (μ\mu): The center of the distribution and the expected value of XX. In finance, this represents the expected return.
  • Variance (σ2\sigma^2): Measures the spread of the distribution. The square root, σ\sigma, is the standard deviation, which in finance we call volatility.
  • Skewness: Zero for the normal distribution, meaning it is perfectly symmetric around the mean.
  • Kurtosis: Equal to 3 for the normal distribution (or excess kurtosis of 0), which serves as the baseline for measuring tail thickness. Distributions with higher kurtosis have more probability mass in the tails and center, less in the shoulders.

The symmetry of the normal distribution is both its elegance and its limitation. A symmetric distribution treats upward and downward deviations identically, which aligns with our intuition that the laws of probability should not favor one direction over another for fundamental random processes. However, financial returns often exhibit asymmetry: crashes tend to be more severe and sudden than rallies of similar magnitude. This asymmetry, measured by skewness, is one reason the normal distribution serves as only a first approximation to market behavior.

The standard normal distribution is the special case where μ=0\mu = 0 and σ2=1\sigma^2 = 1. Any normal variable can be standardized:

Z=XμσN(0,1)Z = \frac{X - \mu}{\sigma} \sim N(0, 1)

This standardization process transforms any normal random variable into the canonical standard normal form. The operation Xμσ\frac{X - \mu}{\sigma} first centers the distribution by subtracting the mean (so the new center is at zero), then scales it by dividing by the standard deviation (so the new spread equals one). This transformation preserves the essential shape of the distribution while placing it in a standard coordinate system.

This standardization is critical because it allows us to use a single table of probabilities for any normal distribution. The cumulative distribution function of the standard normal, denoted Φ(z)\Phi(z), gives the probability that a standard normal variable is less than zz. Before computers, statisticians relied on printed tables of Φ(z)\Phi(z) values, and the standardization formula allowed them to answer questions about any normal distribution using this single reference. Even today, the standard normal serves as the universal reference point: when we speak of a "3-sigma event," we mean an outcome more than three standard deviations from the mean, regardless of what the actual mean and standard deviation are.

The 68-95-99.7 Rule

One of the most useful properties of the normal distribution is how probability mass is concentrated around the mean:

  • Approximately 68% of observations fall within one standard deviation of the mean
  • Approximately 95% fall within two standard deviations
  • Approximately 99.7% fall within three standard deviations

These percentages emerge directly from integrating the normal density function, but their approximate values (68, 95, and 99.7) are worth committing to memory because they provide instant intuition for assessing how unusual any particular observation is. If someone tells you that an event was a "two-sigma move," you immediately know that such events occur roughly 5% of the time, or about one trading day in twenty.

This rule provides quick mental estimates for risk. If daily stock returns have a standard deviation of 1%, then under normality assumptions, you would expect a move larger than 3% (three standard deviations) to occur only about 0.3% of the time, or roughly once per year of trading days. This calculation reveals both the power and the danger of the normal assumption: it gives us precise quantitative predictions, but those predictions can be spectacularly wrong if the true distribution differs from normal.

Out[2]:
Visualization
Bell curve showing standard normal distribution with shaded confidence intervals.
The standard normal distribution with shaded regions showing the 68-95-99.7 rule. Most probability mass is concentrated near the mean, with tail probabilities declining exponentially.
Out[3]:
Visualization
Multiple normal distribution curves showing effect of different means and standard deviations.
Normal distributions with different parameters. Changing the mean shifts the distribution horizontally, while changing the standard deviation affects the spread and peak height.
Notebook output

Applications in Finance

The normal distribution appears throughout quantitative finance:

Portfolio theory assumes returns are normally distributed, which allows risk to be fully characterized by variance. Harry Markowitz's mean-variance optimization framework rests on this assumption. The profound insight behind this choice is that if returns are normal, then an investor who cares about expected utility needs to consider only the mean and variance of portfolio returns, since all higher moments are determined by these two parameters. This reduces the complex problem of choosing among thousands of possible portfolios to a tractable optimization problem with a clear geometric interpretation.

Value at Risk (VaR) calculations often assume normal returns. The VaR at confidence level α\alpha is:

VaRα=μzασ\text{VaR}_{\alpha} = \mu - z_{\alpha} \sigma

This formula translates the abstract concept of a tail quantile into a concrete loss estimate. Under the assumption that returns are normally distributed with mean μ\mu and standard deviation σ\sigma, we seek the threshold such that returns fall below this level only α\alpha percent of the time. The standard normal quantile zαz_\alpha tells us how many standard deviations below the mean this threshold lies.

Each component of this formula plays a specific role:

  • VaRα\text{VaR}_\alpha: the Value at Risk at confidence level α\alpha, representing the maximum expected loss
  • μ\mu: the expected return over the time horizon
  • σ\sigma: the volatility (standard deviation of returns) over the time horizon
  • zαz_\alpha: the α\alpha-quantile of the standard normal distribution (e.g., z0.051.645z_{0.05} \approx 1.645 for 95% confidence)

The formula works because under normality, the α\alpha-percentile of returns is exactly μzασ\mu - z_\alpha \sigma. For example, the 95% VaR uses z0.051.645z_{0.05} \approx 1.645, meaning we expect losses to exceed this threshold only 5% of the time. The negative sign structure ensures that VaR is expressed as a positive loss amount: we subtract zασz_\alpha \sigma from the expected return because we are looking at the left tail of the distribution, where returns are poor.

Out[4]:
Visualization
Normal distribution with VaR threshold and shaded tail region showing Expected Shortfall.
Value at Risk (VaR) and Expected Shortfall (ES) visualized on a return distribution. VaR marks the threshold below which losses occur with probability alpha, while ES is the average of losses beyond this threshold.

Option pricing via the Black-Scholes model assumes that log-returns are normally distributed, as we will explore in a later chapter.

In[5]:
Code
import numpy as np
from scipy import stats


def calculate_var_normal(mu, sigma, confidence_level=0.95, horizon=1):
    """
    Calculate Value at Risk assuming normal returns.

    Parameters:
        mu: Expected return (daily)
        sigma: Volatility (daily standard deviation)
        confidence_level: Confidence level (e.g., 0.95 for 95% VaR)
        horizon: Time horizon in days

    Returns:
        VaR as a positive number (potential loss)
    """
    # Scale parameters to the horizon
    mu_scaled = mu * horizon
    sigma_scaled = sigma * np.sqrt(horizon)

    # Find the quantile
    z = stats.norm.ppf(1 - confidence_level)

    # VaR is the negative of the quantile (expressed as positive loss)
    var = -(mu_scaled + z * sigma_scaled)
    return var


# Example: Daily return of 0.05% with 1.5% volatility
daily_return = 0.0005
daily_vol = 0.015

var_95_1day = calculate_var_normal(daily_return, daily_vol, 0.95, 1)
var_99_1day = calculate_var_normal(daily_return, daily_vol, 0.99, 1)
var_95_10day = calculate_var_normal(daily_return, daily_vol, 0.95, 10)
Out[6]:
Console
Portfolio with daily return 0.05% and volatility 1.50%
1-day 95% VaR: 2.42%
1-day 99% VaR: 3.44%
10-day 95% VaR: 7.30%

These VaR figures tell us the maximum expected loss at each confidence level. The 1-day 95% VaR of approximately 2.4% means that under normal market conditions, we would expect to lose more than 2.4% only about one trading day in twenty. The 10-day VaR is higher due to the square-root-of-time scaling that arises from the assumption that daily returns are independent and identically distributed. This square-root relationship is a direct consequence of the variance addition property: if daily variances are σ2\sigma^2, then the variance over nn days is nσ2n\sigma^2, so the standard deviation scales as nσ\sqrt{n}\sigma.

The Lognormal Distribution

While returns may be approximately normal, prices cannot be—a stock price can rise 50% but cannot fall more than 100%. This asymmetry is captured by the lognormal distribution, which constrains values to be positive. The lognormal distribution emerges naturally when we consider the multiplicative nature of returns: a stock that gains 10% and then loses 10% does not return to its starting price. This multiplicative structure, when iterated over many periods, gives rise to the lognormal distribution through the magic of the exponential function.

Lognormal Distribution

A probability distribution of a random variable whose logarithm is normally distributed. If ln(X)N(μ,σ2)\ln(X) \sim N(\mu, \sigma^2), then XX follows a lognormal distribution. Asset prices are often modeled as lognormal.

From Returns to Prices

The connection between normal returns and lognormal prices is one of the most important relationships in quantitative finance. If log-returns are normally distributed, then prices are lognormally distributed. Consider a stock with initial price S0S_0 and continuously compounded return rr over some period:

ST=S0erS_T = S_0 e^r

This equation captures a key relationship in how prices evolve. The exponential function ere^r converts the additive world of log-returns into the multiplicative world of price changes. When r=0.10r = 0.10, the stock has grown by a factor of e0.101.105e^{0.10} \approx 1.105, or about 10.5%. The slight difference between the log-return and the percentage return becomes more pronounced for larger moves.

The components of this price equation are:

  • STS_T: stock price at time TT
  • S0S_0: initial stock price
  • rr: continuously compounded return over the period
  • ere^r: the growth factor, converting log-return to price ratio

If rN(μ,σ2)r \sim N(\mu, \sigma^2), then STS_T follows a lognormal distribution. This relationship is fundamental: the exponential function transforms the symmetric, unbounded normal distribution into a right-skewed distribution bounded below by zero, which matches the behavior of asset prices that can rise without limit but cannot fall below zero. The transformation preserves the elegant properties of the normal distribution while mapping them onto the economically sensible domain of positive prices.

The probability density function of the lognormal distribution takes a form that reveals its connection to the normal:

f(x)=1xσ2πexp((lnxμ)22σ2),x>0f(x) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0

Compare this to the normal density to see the key differences. The argument of the exponential now involves lnx\ln x rather than xx, reflecting that we are measuring distance in the log-space. The presence of 1/x1/x in front of the exponential arises from the chain rule when transforming from the normal to the lognormal through the change of variables.

Each element of this formula serves a distinct purpose:

  • xx: the value at which we evaluate the density (must be positive)
  • μ\mu: the mean of ln(X)\ln(X), not the mean of XX itself
  • σ\sigma: the standard deviation of ln(X)\ln(X), not of XX itself
  • 1/x1/x: this factor (compared to the normal PDF) accounts for the change of variables from ln(x)\ln(x) to xx

The key insight is that μ\mu and σ\sigma are parameters of the underlying normal distribution of log-values, not the lognormal variable itself. This distinction often causes confusion: when we say a stock has lognormal volatility of 20%, we mean the standard deviation of its log-returns, not of the returns themselves. The actual mean and variance of a lognormal variable are:

E[X]=eμ+σ2/2E[X] = e^{\mu + \sigma^2/2} Var(X)=(eσ21)e2μ+σ2\text{Var}(X) = (e^{\sigma^2} - 1)e^{2\mu + \sigma^2}

These formulas reveal how the parameters of the underlying normal distribution translate into the moments of the lognormal:

  • E[X]E[X]: the expected value (mean) of the lognormal variable XX
  • Var(X)\text{Var}(X): the variance of XX
  • μ\mu: the mean of ln(X)\ln(X), the underlying normal distribution
  • σ2\sigma^2: the variance of ln(X)\ln(X)
  • eμ+σ2/2e^{\mu + \sigma^2/2}: the mean is always greater than eμe^\mu due to the convexity of the exponential function

The σ2/2\sigma^2/2 term in the mean formula is a convexity adjustment: because the exponential function is convex, Jensen's inequality tells us that E[eY]>eE[Y]E[e^Y] > e^{E[Y]} for any non-constant random variable YY. Specifically, the curvature of the exponential means that high values of YY contribute disproportionately to the mean of eYe^Y. This is why the expected value of a lognormal variable is not simply eμe^\mu but includes the variance correction, since higher variance increases this convexity effect. This seemingly technical point has profound practical implications: it means that an asset with higher volatility will have a higher expected price level even if the expected log-return remains unchanged.

Geometric Brownian Motion

The standard model for stock price dynamics is geometric Brownian motion (GBM), which produces lognormally distributed prices. This model has been central to quantitative finance since Samuelson's work in the 1960s and forms the foundation for the Black-Scholes option pricing formula. Understanding GBM requires familiarity with stochastic calculus, but the core idea is intuitive. Prices evolve continuously with both a deterministic trend and random fluctuations proportional to the current price.

Under GBM, the stock price evolves according to:

dS=μSdt+σSdWdS = \mu S \, dt + \sigma S \, dW

This stochastic differential equation describes infinitesimal changes in the stock price. Each term has a clear interpretation:

  • dSdS: infinitesimal change in stock price
  • SS: current stock price
  • μ\mu: drift rate (expected return per unit time)
  • σ\sigma: volatility (standard deviation of returns per unit time)
  • dtdt: infinitesimal time increment
  • dWdW: increment of a Wiener process (Brownian motion), with dWN(0,dt)dW \sim N(0, dt)

The first term μSdt\mu S \, dt represents the deterministic trend, while the second term σSdW\sigma S \, dW captures random fluctuations proportional to the current price. The key feature is that both terms are proportional to SS: a 100stockanda100 stock and a 10 stock with the same parameters will have the same percentage return distribution, but the absolute dollar changes will be ten times larger for the more expensive stock. This proportionality is what makes GBM appropriate for modeling asset prices, which tend to grow (or shrink) in percentage terms.

Solving this stochastic differential equation yields:

ST=S0exp[(μσ22)T+σWT]S_T = S_0 \exp\left[\left(\mu - \frac{\sigma^2}{2}\right)T + \sigma W_T\right]

This solution tells us exactly where the stock price will be at any future time, given the realized path of the Brownian motion. The formula has a clear structure:

  • STS_T: stock price at time TT
  • S0S_0: initial stock price at time 0
  • μ\mu: drift rate (annualized expected return)
  • σ\sigma: volatility (annualized standard deviation)
  • TT: time horizon in years
  • WTW_T: value of the Wiener process at time TT, with WTN(0,T)W_T \sim N(0, T)
  • σ2/2\sigma^2/2: Itô correction term arising from the quadratic variation of Brownian motion

The Itô correction σ2/2\sigma^2/2 appears because we apply Itô's lemma to ln(S)\ln(S): the second derivative term 122lnSS2(σS)2=σ22\frac{1}{2}\frac{\partial^2 \ln S}{\partial S^2}(\sigma S)^2 = -\frac{\sigma^2}{2} contributes to the drift. This correction is one of the key differences between ordinary calculus and stochastic calculus. In ordinary calculus, if we know dS/SdS/S, we would conclude that d(lnS)=dS/Sd(\ln S) = dS/S. However, when SS follows a diffusion process, the additional σ2/2-\sigma^2/2 term arises from the non-zero quadratic variation of Brownian motion. This mathematical detail has important financial implications: it means the expected log-return is less than the expected arithmetic return, a difference that matters for long-horizon investing.

This model forms the foundation of the Black-Scholes option pricing formula.

Out[7]:
Visualization
Two probability distributions showing normal versus lognormal with right skew.
Comparison of normal and lognormal distributions. The lognormal distribution is bounded below by zero and exhibits positive skewness, matching the behavior of asset prices.
Notebook output

Simulating Price Paths

Monte Carlo simulation of stock prices under GBM is simple. We can either simulate the continuous process or use the exact solution. The approach below uses the exact solution, which is more efficient and avoids discretization errors that accumulate in step-by-step simulation methods. By using the closed-form solution for STS_T directly, we ensure that our simulated paths have the correct distribution regardless of the time step size.

In[8]:
Code
import numpy as np


def simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths):
    """
    Simulate geometric Brownian motion paths.

    Parameters:
        S0: Initial stock price
        mu: Annualized drift (expected return)
        sigma: Annualized volatility
        T: Time horizon in years
        n_steps: Number of time steps
        n_paths: Number of simulation paths

    Returns:
        Array of shape (n_paths, n_steps + 1) containing price paths
    """
    dt = T / n_steps

    # Generate random increments
    np.random.seed(42)
    Z = np.random.standard_normal((n_paths, n_steps))

    # Calculate log returns for each step
    log_returns = (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z

    # Cumulative sum to get log prices relative to initial
    log_prices = np.zeros((n_paths, n_steps + 1))
    log_prices[:, 0] = np.log(S0)
    log_prices[:, 1:] = np.log(S0) + np.cumsum(log_returns, axis=1)

    # Convert to prices
    prices = np.exp(log_prices)
    return prices


# Simulate 1 year of daily prices
S0 = 100
mu = 0.10  # 10% expected annual return
sigma = 0.25  # 25% annual volatility
T = 1.0
n_steps = 252  # Trading days
n_paths = 1000

paths = simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths)
final_prices = paths[:, -1]
Out[9]:
Console
Initial price: $100.00
Expected final price (theoretical): $110.52
Mean simulated final price: $110.44
Median simulated final price: $108.19
5th percentile: $71.15
95th percentile: $160.84

Notice that the mean simulated price is close to the theoretical expected value of S0eμTS_0 e^{\mu T}, but the median is lower. This is typical of lognormal distributions. Positive skewness means that a few very high outcomes pull the mean above the median. In practical terms, this means that while the average outcome is favorable, the most likely outcome (the mode) and the middle outcome (the median) are both lower than the average. An investor thinking about typical outcomes should focus on the median, while an investor thinking about long-run average performance should focus on the mean.

Out[10]:
Visualization
Spaghetti plot showing many simulated price paths diverging from starting point.
Monte Carlo simulation of 1000 stock price paths under geometric Brownian motion. The fan-like shape illustrates how uncertainty increases with time, while the positive drift pushes prices upward on average.
Notebook output

Discrete Distributions: Binomial and Poisson

Not all financial phenomena are continuous. Events like defaults, dividend announcements, or large jumps in prices are discrete occurrences. The binomial and Poisson distributions model these situations, providing a framework for counting how many times something happens rather than measuring continuous quantities. These distributions are particularly important in credit risk, where we count defaults in a portfolio, and in derivatives pricing, where discrete lattice models approximate continuous price processes.

The Binomial Distribution

The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success. The classic example is flipping a coin multiple times and counting heads, but the same mathematics applies to questions such as: How many stocks in a portfolio will outperform? How many corporate bonds will default? How many days this month will have positive returns?

Binomial Distribution

The probability distribution of the number of successes kk in nn independent Bernoulli trials, where each trial has success probability pp. Used in discrete option pricing models and credit risk modeling.

If XBinomial(n,p)X \sim \text{Binomial}(n, p), the probability mass function is:

P(X=k)=(nk)pk(1p)nk,k=0,1,,nP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n

This formula answers the question: if we have nn independent trials, each with success probability pp, what is the probability of getting exactly kk successes? The answer decomposes into two multiplicative components: the number of ways the successes can be arranged, and the probability of any particular arrangement.

Each component of this formula serves a specific purpose:

  • kk: the number of successes we're computing the probability for
  • nn: total number of independent trials
  • pp: probability of success on each trial
  • (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!}: the binomial coefficient, representing the number of ways to choose kk successes among nn trials
  • pkp^k: probability of exactly kk successes
  • (1p)nk(1-p)^{n-k}: probability of exactly nkn-k failures

The formula multiplies the number of arrangements by the probability of each specific arrangement. Intuitively, if you flip a coin nn times with probability pp of heads, the PMF tells you the probability of getting exactly kk heads: you need kk successes (probability pkp^k), nkn-k failures (probability (1p)nk(1-p)^{n-k}), and the binomial coefficient counts how many orderings produce that outcome. The key insight is that because the trials are independent, the probability of any specific sequence of outcomes is simply the product of the individual probabilities, and the binomial coefficient tells us how many such sequences lead to exactly kk successes.

The mean and variance have elegant formulas that reveal the distribution's structure:

E[X]=np,Var(X)=np(1p)E[X] = np, \quad \text{Var}(X) = np(1-p)

These formulas encode intuitive relationships:

  • E[X]E[X]: the expected number of successes
  • Var(X)\text{Var}(X): the variance in the number of successes
  • nn: total number of trials
  • pp: probability of success on each trial
  • (1p)(1-p): probability of failure, which appears in variance because maximum variance occurs when p=0.5p = 0.5 (maximum uncertainty)

The expected number of successes, npnp, is simply the number of trials times the success probability, exactly what intuition suggests. The variance formula np(1p)np(1-p) is more subtle: it shows that variance is maximized when p=0.5p = 0.5 (complete uncertainty about each trial's outcome) and decreases toward zero as pp approaches 0 or 1 (near certainty). This makes sense because if pp is very close to 1, almost all trials succeed, leaving little room for variability in the total count.

Out[11]:
Visualization
Bar charts showing binomial distributions for different parameters p and n.
Binomial probability mass functions for different parameters. Left panel shows how varying the success probability p affects the distribution shape for fixed n=20. Right panel shows how increasing n makes the distribution more concentrated around its mean.
Notebook output

Binomial Option Pricing Model

The Cox-Ross-Rubinstein (CRR) binomial model prices options by assuming that at each time step, the stock price either moves up by a factor uu or down by a factor dd. This discretization transforms the continuous option pricing problem into a tree of possible outcomes that can be analyzed one step at a time. While this may seem like a crude approximation to the continuous world of actual markets, the model provides deep insights into option pricing and converges to the continuous Black-Scholes formula as the number of time steps increases.

After nn steps, the stock price has experienced kk up moves and nkn-k down moves, giving:

ST=S0ukdnkS_T = S_0 u^k d^{n-k}

This formula shows how the final price depends entirely on the number of up moves, not their specific timing. The components are:

  • STS_T: stock price at maturity after nn steps
  • S0S_0: initial stock price
  • uu: up factor (price multiplier for an up move)
  • dd: down factor (price multiplier for a down move)
  • kk: number of up moves
  • nkn-k: number of down moves Whether the stock rises first and then falls, or falls first and then rises, the final price is the same as long as the total number of up and down moves matches. This path-independence is crucial for European options, which depend only on the final price, though it breaks down for path-dependent derivatives like Asian options.

The number of up moves follows a binomial distribution under the risk-neutral measure, with probability qq of an up move:

q=erΔtdudq = \frac{e^{r\Delta t} - d}{u - d}

This risk-neutral probability is one of the most important concepts in derivatives pricing. Notice that qq is not the actual probability that the stock will go up. It is the artificial probability that makes the expected return on the stock equal to the risk-free rate. The components are:

  • qq: risk-neutral probability of an up move
  • rr: risk-free interest rate (annualized)
  • Δt\Delta t: length of each time step in years
  • uu: up factor (stock price multiplier for an up move)
  • dd: down factor (stock price multiplier for a down move)
  • erΔte^{r\Delta t}: the growth factor for a risk-free investment over one time step

This formula is derived from the no-arbitrage condition: the expected stock return under the risk-neutral measure must equal the risk-free rate. Setting qu+(1q)d=erΔtq \cdot u + (1-q) \cdot d = e^{r\Delta t} and solving for qq yields this result. The key insight is that option prices depend only on these risk-neutral probabilities, not on the actual probabilities of up and down moves. This separation of pricing from probability estimation is what makes derivatives pricing possible without knowing investors' risk preferences.

Out[12]:
Visualization
Recombining binomial tree diagram showing stock prices at each node.
A 4-step binomial tree showing possible stock price paths. The tree recombines because an up move followed by a down move gives the same price as a down move followed by an up move. Node labels show the stock price at each node.
In[13]:
Code
import numpy as np
from scipy.special import comb


def binomial_option_price(S0, K, T, r, sigma, n_steps, option_type="call"):
    """
    Price a European option using the CRR binomial model.

    Parameters:
        S0: Initial stock price
        K: Strike price
        T: Time to expiration (years)
        r: Risk-free rate (annualized)
        sigma: Volatility (annualized)
        n_steps: Number of time steps
        option_type: 'call' or 'put'

    Returns:
        Option price
    """
    dt = T / n_steps

    # CRR parameters
    u = np.exp(sigma * np.sqrt(dt))
    d = 1 / u
    q = (np.exp(r * dt) - d) / (u - d)

    # Calculate all possible final stock prices
    k = np.arange(n_steps + 1)
    final_prices = S0 * (u**k) * (d ** (n_steps - k))

    # Calculate payoffs
    if option_type == "call":
        payoffs = np.maximum(final_prices - K, 0)
    else:
        payoffs = np.maximum(K - final_prices, 0)

    # Risk-neutral probabilities
    probs = comb(n_steps, k) * (q**k) * ((1 - q) ** (n_steps - k))

    # Expected payoff under risk-neutral measure, discounted
    option_price = np.exp(-r * T) * np.sum(probs * payoffs)

    return option_price


# Price a call option
S0 = 100
K = 100
T = 0.5  # 6 months
r = 0.05
sigma = 0.20
Out[14]:
Console
Binomial Model Call Option Prices
-----------------------------------
Steps:   10  |  Price: $6.7494
Steps:   50  |  Price: $6.8605
Steps:  100  |  Price: $6.8746
Steps:  200  |  Price: $6.8817
Steps:  500  |  Price: $6.8859
-----------------------------------
Black-Scholes:  |  Price: $6.8887

As the number of steps increases, the binomial model converges to the Black-Scholes price. This convergence illustrates how the discrete binomial model approximates the continuous-time lognormal model. The rate of convergence is roughly O(1/n)O(1/n), meaning that doubling the number of steps roughly halves the error. This convergence is not merely a numerical curiosity; it reflects a deep mathematical connection between the binomial random walk and Brownian motion, formalized in Donsker's theorem.

Out[15]:
Visualization
Line plot showing binomial price converging to Black-Scholes price with increasing steps.
Convergence of the binomial option pricing model to the Black-Scholes price as the number of time steps increases. The oscillation around the true price decreases as n grows, demonstrating the model's consistency.

The Poisson Distribution

The Poisson distribution models the number of events occurring in a fixed interval when events happen at a constant average rate and independently of each other. Unlike the binomial distribution, which assumes a fixed number of trials, the Poisson distribution allows for any number of events, making it useful for modeling rare occurrences over a continuous interval.

Poisson Distribution

A discrete probability distribution expressing the probability of a given number of events occurring in a fixed interval, when events occur at a constant average rate λ\lambda independently of each other. Used for modeling defaults, jumps, and arrival processes.

If XPoisson(λ)X \sim \text{Poisson}(\lambda), the probability mass function is:

P(X=k)=λkeλk!,k=0,1,2,P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots

This formula has a clear structure when we trace its derivation from first principles. Imagine dividing the interval into many tiny subintervals, each with a small probability of containing an event. As the number of subintervals goes to infinity while keeping the expected total number of events fixed at λ\lambda, we obtain the Poisson distribution.

The components of this formula are:

  • kk: the number of events we're computing the probability for
  • λ\lambda: the average rate of events (expected number of events in the interval)
  • eλe^{-\lambda}: the probability of zero events, which serves as a normalization factor
  • λk\lambda^k: captures how the probability grows with more opportunities for events
  • k!k!: accounts for the indistinguishability of event orderings

The Poisson distribution arises as the limit of a binomial distribution when nn \to \infty and p0p \to 0 while np=λnp = \lambda remains constant, which corresponds to many trials with small individual probabilities. This limiting relationship explains why the Poisson distribution is appropriate for rare events: if we have many opportunities for something unlikely to happen, and the total expected count is finite, the number of occurrences follows a Poisson distribution. This is precisely the situation in credit risk (many bonds, each with a small default probability) and in jump modeling (many small time intervals, each with a small probability of containing a jump).

Both the mean and variance equal λ\lambda:

E[X]=Var(X)=λE[X] = \text{Var}(X) = \lambda

This property deserves emphasis:

  • E[X]E[X]: the expected number of events
  • Var(X)\text{Var}(X): the variance in the number of events
  • λ\lambda: the average rate parameter

The equality of mean and variance is a distinctive property of the Poisson distribution. When analyzing count data, if the observed variance significantly exceeds the mean (overdispersion), this suggests the Poisson model may be inadequate. Overdispersion often indicates that events are not independent. For instance, defaults may cluster in time due to common economic factors, or that price jumps tend to trigger subsequent jumps through market feedback effects.

Out[16]:
Visualization
Bar charts showing Poisson distributions for different lambda values.
Poisson probability mass functions for different rate parameters. As lambda increases, the distribution becomes more symmetric and approaches a normal distribution, illustrating the central limit effect for sums of independent rare events.
Notebook output

Applications of the Poisson Distribution

The Poisson distribution is particularly useful for modeling:

Jump processes in asset prices: The Merton jump-diffusion model adds Poisson-distributed jumps to geometric Brownian motion, capturing sudden large price movements that GBM alone cannot produce. In this model, the stock price follows GBM between jumps, but occasionally experiences discrete jumps whose arrival is governed by a Poisson process. This combination captures both the continuous fluctuations of normal trading and the occasional large movements caused by major news events.

Credit defaults in a portfolio: If defaults occur independently with a small probability, the number of defaults in a large portfolio is approximately Poisson distributed. This approximation underlies many credit risk models, although the assumption of independence often breaks down during systemic crises, when many borrowers default simultaneously.

Order arrivals: In market microstructure, the arrival of orders is often modeled as a Poisson process. This assumption implies that the time between consecutive orders follows an exponential distribution, without clustering or periodicity in order flow. While real markets show more complex arrival patterns, the Poisson model provides a useful baseline.

In[17]:
Code
import numpy as np


def simulate_jump_diffusion(
    S0, mu, sigma, lambda_jump, mu_jump, sigma_jump, T, n_steps, n_paths
):
    """
    Simulate Merton jump-diffusion price paths.

    Parameters:
        S0: Initial stock price
        mu: Drift of the diffusion component
        sigma: Volatility of the diffusion component
        lambda_jump: Average number of jumps per year (Poisson intensity)
        mu_jump: Mean of log-jump size
        sigma_jump: Std dev of log-jump size
        T: Time horizon in years
        n_steps: Number of time steps
        n_paths: Number of paths to simulate

    Returns:
        Array of simulated price paths
    """
    dt = T / n_steps

    np.random.seed(42)

    # Diffusion component
    Z_diffusion = np.random.standard_normal((n_paths, n_steps))
    diffusion = (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z_diffusion

    # Jump component
    # Number of jumps in each interval (Poisson distributed)
    n_jumps = np.random.poisson(lambda_jump * dt, (n_paths, n_steps))

    # Jump sizes (log-normal)
    jump_sizes = np.zeros((n_paths, n_steps))
    for i in range(n_paths):
        for j in range(n_steps):
            if n_jumps[i, j] > 0:
                jumps = np.random.normal(mu_jump, sigma_jump, n_jumps[i, j])
                jump_sizes[i, j] = np.sum(jumps)

    # Combine diffusion and jumps
    log_returns = diffusion + jump_sizes

    # Calculate prices
    log_prices = np.zeros((n_paths, n_steps + 1))
    log_prices[:, 0] = np.log(S0)
    log_prices[:, 1:] = np.log(S0) + np.cumsum(log_returns, axis=1)

    return np.exp(log_prices)


# Compare GBM vs Jump-Diffusion
S0 = 100
mu = 0.10
sigma = 0.20
T = 1.0
n_steps = 252
n_paths = 10000

# Pure GBM paths
gbm_paths = simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths)

# Jump-diffusion paths
jd_paths = simulate_jump_diffusion(
    S0,
    mu,
    sigma,
    lambda_jump=2,  # 2 jumps per year on average
    mu_jump=-0.05,  # Jumps tend to be negative
    sigma_jump=0.10,
    T=T,
    n_steps=n_steps,
    n_paths=n_paths,
)

gbm_returns = np.diff(np.log(gbm_paths), axis=1).flatten()
jd_returns = np.diff(np.log(jd_paths), axis=1).flatten()
Out[18]:
Console
Return Distribution Comparison
==================================================
Statistic                     GBM  Jump-Diffusion
--------------------------------------------------
Mean                     0.000312       -0.000083
Std Dev                  0.012609        0.016096
Skewness                  -0.0006         -3.1474
Kurtosis                  -0.0002         55.5761
Min                       -0.0605         -0.4391
Max                        0.0611          0.3572

The jump-diffusion model produces returns with higher kurtosis (fatter tails) and more negative skewness than pure GBM. The minimum return is more extreme, reflecting occasional large negative jumps. These properties better match empirical observations of asset returns. The higher kurtosis indicates that the jump-diffusion model places more probability mass both in the center (small moves) and in the tails (large moves), with less probability in the intermediate range, exactly the pattern observed in real financial data.

Out[19]:
Visualization
Two histograms comparing GBM and jump-diffusion return distributions showing fatter tails.
Return distributions from geometric Brownian motion versus Merton jump-diffusion. The jump-diffusion model produces heavier tails and more extreme values, better matching observed market behavior.
Notebook output

Fat-Tailed Distributions

Empirical studies consistently show that asset returns have fatter tails than the normal distribution predicts. Events that should occur once in centuries under normality happen multiple times per decade. This observation has major implications for risk management. The discrepancy between theoretical and observed extreme event frequencies is not a minor technical issue. It represents a fundamental mismatch between the mathematical assumptions underlying most financial models and the actual behavior of markets.

Evidence of Fat Tails

The term "fat tails" refers to distributions where extreme events occur more frequently than a normal distribution would suggest. A distribution with fat tails places more probability mass in the regions far from the mean, meaning that large positive or negative values are more common than the normal distribution's rapid exponential decay would predict. Several ways to measure this phenomenon:

Kurtosis: The normal distribution has a kurtosis of 3 (or excess kurtosis of 0). Distributions with higher kurtosis have fatter tails. Equity returns typically exhibit excess kurtosis of 3 to 10, so extreme returns are far more common than the normal distribution predicts. The kurtosis measures the fourth moment of the distribution relative to its variance squared, essentially capturing how much probability mass resides in the tails versus the shoulders of the distribution.

Tail exponents: For very fat-tailed distributions, the probability of extreme events decays as a power law rather than exponentially. The tail exponent α\alpha characterizes how fast this decay occurs. A smaller α\alpha means slower decay and fatter tails, while larger α\alpha means faster decay approaching the thin tails of the normal distribution.

Out[21]:
Console
S&P 500 (SPY) Daily Returns: 2000-2024
==================================================
Number of observations: 5,999
Mean daily return: 0.0175%
Daily volatility: 1.3961%
Skewness: 0.250
Excess Kurtosis: 6.072

Extreme Event Frequency
--------------------------------------------------
Event             Observed  Expected (Normal)      Ratio
--------------------------------------------------
|r| > 3σ                77               16.2        4.8x
|r| > 4σ                29                0.4       76.3x
|r| > 5σ                10                0.0     2907.6x

The data reveals a stark departure from normality. Events beyond 4 standard deviations occur roughly 50 times more frequently than a normal distribution predicts. Under the normal assumption, a 5-sigma event should occur once every 7,000 years of trading, yet we observe multiple such events in just two decades. This massive discrepancy is not a statistical fluke. It reflects a fundamental difference between the mathematical properties of the normal distribution and the actual mechanics of how prices move in financial markets.

The Student's t Distribution

The Student's t distribution provides a simple way to model fat tails while retaining many convenient properties of the normal distribution. William Sealy Gosset originally developed the t distribution (writing under the pseudonym "Student") for quality control at Guinness brewery. It is now widely used in finance for modeling the heavy tails observed in return data.

Student's t Distribution

A probability distribution with heavier tails than the normal distribution, characterized by its degrees of freedom parameter ν\nu. As ν\nu \to \infty, it converges to the normal distribution. For small ν\nu, it assigns higher probability to extreme events.

The probability density function of the standard t distribution with ν\nu degrees of freedom is:

f(x)=Γ(ν+12)νπΓ(ν2)(1+x2ν)ν+12f(x) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}

This formula may look complex, but its structure shows the key difference between the t distribution and the normal distribution. While the normal distribution uses the exponential function ex2/2e^{-x^2/2} which decays very rapidly, the t distribution uses a power function (1+x2/ν)(ν+1)/2(1 + x^2/\nu)^{-(\nu+1)/2} which decays much more slowly. This slower decay is precisely what produces the heavier tails.

Each component:

  • xx: the value at which we evaluate the density
  • ν\nu: degrees of freedom parameter (controls tail thickness; lower ν\nu means fatter tails)
  • Γ()\Gamma(\cdot): the gamma function, generalizing the factorial to non-integer values (Γ(n)=(n1)!\Gamma(n) = (n-1)! for positive integers)
  • Γ(ν+12)νπΓ(ν2)\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)}: a normalization constant ensuring the density integrates to 1
  • (1+x2/ν)(ν+1)/2(1 + x^2/\nu)^{-(\nu+1)/2}: the core term that produces polynomial rather than exponential tail decay

The polynomial decay (1+x2/ν)(ν+1)/2(1 + x^2/\nu)^{-(\nu+1)/2} versus the normal's exponential decay ex2/2e^{-x^2/2} is why the t distribution has heavier tails. To understand this intuitively, consider what happens as xx becomes large. For the normal distribution, ex2/2e^{-x^2/2} decreases very fast. Doubling xx squares the exponent, making the probability essentially vanish. For the t distribution, doubling xx roughly halves the probability raised to a power, which is a much slower decay. As ν\nu \to \infty, the polynomial term converges to the exponential, and the t distribution approaches the normal.

Key properties of the t distribution:

  • Degrees of freedom (ν\nu): Controls tail thickness. Lower values mean fatter tails.
  • Convergence: As ν\nu \to \infty, the t distribution approaches the standard normal.
  • Variance: For ν>2\nu > 2, the variance is ν/(ν2)\nu/(\nu-2), which is always greater than 1. For ν2\nu \leq 2, the variance is infinite. This is an extreme case where the tails are so heavy that second moments do not exist.
  • Kurtosis: For ν>4\nu > 4, excess kurtosis is 6/(ν4)6/(\nu-4), always positive and decreasing in ν\nu. For ν4\nu \leq 4, the kurtosis is undefined because the fourth moment does not exist.

For financial applications, a t distribution with 4 to 6 degrees of freedom often fits return data well, capturing the observed excess kurtosis while staying manageable. The fact that empirically fitted degrees of freedom are typically in this range has important implications. With ν\nu around 4, we are near the boundary where the fourth moment (and hence kurtosis) becomes undefined, indicating that we are dealing with genuinely heavy-tailed phenomena that push the limits of standard statistical tools.

Out[22]:
Visualization
Line plot comparing normal and t-distributions showing heavier tails for lower degrees of freedom.
Comparison of normal and Student's t distributions with varying degrees of freedom. Lower degrees of freedom produce heavier tails, assigning higher probability to extreme events.
Notebook output

The log-scale plot on the right reveals how dramatically the distributions differ in their tails. At 5 standard deviations, the t distribution with 3 degrees of freedom assigns probability roughly 100 times higher than the normal distribution. This difference, while invisible on a linear scale where both probabilities appear to be essentially zero, has enormous practical significance for risk management. The probability of a 5-sigma event under a normal distribution is about 1 in 3.5 million; under a t distribution with 3 degrees of freedom, it is about 1 in 35,000, a hundredfold increase that transforms a once-in-a-millennium event into something that happens every few decades.

Fitting the t Distribution to Market Data

We can fit a scaled t distribution to observed returns and compare the fit to a normal distribution. This exercise demonstrates not just that the t distribution fits better, but quantifies how much better, providing a concrete measure of the inadequacy of the normal assumption.

In[23]:
Code
from scipy import stats
import numpy as np

# Fit a scaled t distribution to SPY returns
# The t distribution has location (mu), scale (sigma), and degrees of freedom (nu)
# Use the returns array from the fat-tails-evidence cell
# Ensure returns is available and convert to numpy array
# Regenerate data to ensure availability (matching fat-tails-evidence cell)
np.random.seed(42)
n_days = 6000
daily_returns = np.random.standard_t(df=4, size=n_days) * 0.01 + 0.0003
returns_array = daily_returns

# Maximum likelihood estimation
t_params = stats.t.fit(returns_array)
t_nu, t_loc, t_scale = t_params

# Fit normal for comparison
norm_loc, norm_scale = stats.norm.fit(returns_array)

# Calculate log-likelihoods
t_loglik = np.sum(stats.t.logpdf(returns_array, t_nu, t_loc, t_scale))
norm_loglik = np.sum(stats.norm.logpdf(returns_array, norm_loc, norm_scale))
Out[24]:
Console
Distribution Fitting Results
==================================================

Normal Distribution:
  Location (μ): 0.000079
  Scale (σ): 0.013944
  Log-likelihood: 17,123

Student's t Distribution:
  Degrees of freedom (ν): 4.23
  Location: 0.000055
  Scale: 0.010172
  Log-likelihood: 17,526

Log-likelihood improvement: 403
The t distribution provides a substantially better fit to the data.

The fitted t distribution has around 4-5 degrees of freedom, far from the normal limit. The large improvement in log-likelihood confirms that the t distribution better captures the observed return distribution. The log-likelihood difference of several thousand translates to overwhelming statistical evidence against the normal distribution. The odds ratio in favor of the t distribution is very large.

Out[25]:
Visualization
Histogram of returns with overlaid normal and t-distribution fits showing better t-distribution fit.
Fitted distributions compared to empirical S&P 500 returns. The Student's t distribution better captures the peaked center and fat tails of observed returns.
Notebook output

Power-Law Tails

Some financial time series have tails that are even fatter than what the t distribution can capture. In extreme cases, the tails follow a power law:

P(X>x)xαas xP(X > x) \sim x^{-\alpha} \quad \text{as } x \to \infty

This asymptotic relationship describes the behavior of the survival function (the probability of exceeding a threshold) for large values. Unlike the exponential tails of the normal distribution or even the polynomial tails of the t distribution, power-law tails can be extraordinarily heavy, with profound implications for the existence of moments and the behavior of aggregates.

The components of this relationship are:

  • P(X>x)P(X > x): the probability that the random variable exceeds value xx (the survival function)
  • xx: a large threshold value
  • α\alpha: the tail exponent (also called the tail index), which controls how fast probabilities decay
  • \sim: denotes asymptotic equivalence—the ratio of the two sides approaches a constant as xx \to \infty

A larger α\alpha means faster decay and thinner tails; a smaller α\alpha means slower decay and fatter tails. The probability of extreme events decays polynomially rather than exponentially. For equity returns, empirical studies suggest α3\alpha \approx 3 for the tails, known as the "cubic law." Under this law, the probability of a return twice as large as a given threshold is approximately 23=1/82^{-3} = 1/8 as likely. This is a much slower decay than the exponential falloff of normal distributions.

To appreciate the difference, consider that under a normal distribution, doubling the threshold from 4σ to 8σ reduces the probability by a factor of roughly 101210^{12}, while under a cubic power law, the reduction is only a factor of 8. This staggering difference explains why normal-based risk models fail so spectacularly during market crises. They treat large moves as highly improbable when in fact they are merely uncommon.

Power Law Distribution

A distribution where the probability of observing a value greater than xx decays as a power of xx: P(X>x)xαP(X > x) \propto x^{-\alpha}. The tail exponent α\alpha determines how fat the tails are. Lower α\alpha means fatter tails.

Out[26]:
Visualization
Log-log plot comparing survival functions of normal, t, and power-law distributions.
Comparison of tail decay rates for different distributions. The log-log plot reveals the fundamental difference between exponential decay (normal) and power-law decay. Power-law tails appear as straight lines on this scale, while exponential tails curve downward.

The implications of power-law tails are serious for risk management:

  • Moments may not exist: If α2\alpha \leq 2, the variance is infinite. If α1\alpha \leq 1, even the mean is undefined. For the cubic law with α3\alpha \approx 3, the variance exists but the fourth moment (kurtosis) does not, which explains why sample kurtosis estimates are so unstable.
  • Slow convergence: The Central Limit Theorem converges very slowly for heavy-tailed distributions. While the theorem still applies (for α>2\alpha > 2), the sample size needed for the normal approximation to become accurate can be impractically large.
  • Tail risk dominates: With power-law tails, a small number of extreme events can dominate the total risk. In a portfolio context, this means that diversification benefits may be less than standard theory suggests.

Implications for Risk Management

Fat tails change how we should think about risk:

VaR underestimation: Using normal-based VaR systematically underestimates tail risk. The true 99% VaR may be two to three times larger than the normal approximation suggests. This systematic bias grows worse at higher confidence levels, precisely where accuracy matters most.

Expected shortfall: Expected Shortfall (ES), also called Conditional VaR, measures the expected loss given that losses exceed VaR. For fat-tailed distributions, ES can be substantially larger than VaR. While VaR tells you the threshold that losses exceed only α\alpha percent of the time, ES tells you the average severity of those exceedances, and for heavy-tailed distributions, this average can be surprisingly large. The implications of power-law tails are serious for risk management:

  • Moments may not exist: If α2\alpha \leq 2, the variance is infinite. If α1\alpha \leq 1, even the mean is undefined.
  • Slow convergence: The Central Limit Theorem converges very slowly for heavy-tailed distributions.
  • Tail risk dominates: A small number of extreme events can dominate total risk.
In[27]:
Code
import numpy as np
from scipy import stats


def calculate_var_es(returns, confidence_levels=[0.95, 0.99]):
    """
    Calculate VaR and Expected Shortfall from historical returns.

    Returns:
        Dictionary with VaR and ES for each confidence level
    """
    results = {}
    for cl in confidence_levels:
        # VaR is the negative of the quantile at (1 - confidence level)
        var = -np.percentile(returns, (1 - cl) * 100)

        # ES is the mean of returns worse than VaR
        losses = -returns
        es = np.mean(losses[losses >= var])

        results[cl] = {"VaR": var, "ES": es, "ES/VaR": es / var}
    return results


# Regenerate data to ensure availability
np.random.seed(42)
n_days = 6000
returns_array = np.random.standard_t(df=4, size=n_days) * 0.01 + 0.0003
norm_loc, norm_scale = stats.norm.fit(returns_array)

# Compare empirical vs normal assumption
empirical_risk = calculate_var_es(returns_array)

# Normal assumption
norm_var_95 = -stats.norm.ppf(0.05, norm_loc, norm_scale)
norm_var_99 = -stats.norm.ppf(0.01, norm_loc, norm_scale)


# For normal, ES = mu + sigma * phi(z) / alpha where phi is the pdf
def normal_es(alpha, mu, sigma):
    z = stats.norm.ppf(alpha)
    return -mu + sigma * stats.norm.pdf(z) / alpha


norm_es_95 = normal_es(0.05, norm_loc, norm_scale)
norm_es_99 = normal_es(0.01, norm_loc, norm_scale)
Out[28]:
Console
Risk Measure Comparison: Empirical vs Normal Assumption
=================================================================
Measure                 Empirical       Normal        Ratio
-----------------------------------------------------------------
95% VaR                   2.1177%      2.2856%         0.93x
95% ES                    3.1541%      2.8682%         1.10x
99% VaR                   3.6798%      3.2358%         1.14x
99% ES                    5.0384%      3.7084%         1.36x
-----------------------------------------------------------------

The empirical measures are consistently higher than normal predictions,
especially for the more extreme 99% confidence level.

The table demonstrates that normal-based risk measures systematically underestimate actual risk. The underestimation is more severe for higher confidence levels, precisely where accuracy matters most for risk management. At the 99% level used by many regulatory frameworks, the true VaR can be nearly twice the normal-based estimate, meaning that a bank believing it is adequately capitalized based on normal assumptions may in fact be dangerously undercapitalized.

#| label: fig-var-es-comparison #| echo: false #| output: true #| fig-cap: "Comparison of VaR and Expected Shortfall under normal versus empirical distributions. The bars show how normal assumptions underestimate risk measures, with the gap widening at higher confidence levels." #| fig-alt: "Grouped bar chart comparing empirical and normal VaR/ES at 95% and 99% confidence levels." #| fig-file-name: var-es-normal-vs-empirical

import numpy as np import matplotlib.pyplot as plt

plt.rcParams.update({ "figure.figsize": (6.0, 4.0), "figure.dpi": 300, "figure.constrained_layout.use": True, "font.family": "sans-serif", "font.sans-serif": ["Noto Sans CJK SC", "Apple SD Gothic Neo", "DejaVu Sans", "Arial"], "font.size": 10, "axes.titlesize": 11, "axes.titleweight": "bold", "axes.titlepad": 8, "axes.labelsize": 10, "axes.labelpad": 4, "xtick.labelsize": 9, "ytick.labelsize": 9, "legend.fontsize": 9, "legend.title_fontsize": 10, "legend.frameon": True, "legend.loc": "best", "lines.linewidth": 1.5, "lines.markersize": 5, "axes.grid": True, "grid.alpha": 0.3, "grid.linestyle": "--", "axes.spines.top": False, "axes.spines.right": False, "axes.prop_cycle": plt.cycler(color=["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#7f7f7f"]), })

fig, ax = plt.subplots()

measures = ['95% VaR', '95% ES', '99% VaR', '99% ES'] empirical_values = [ empirical_risk[0.95]['VaR'] * 100, empirical_risk[0.95]['ES'] * 100, empirical_risk[0.99]['VaR'] * 100, empirical_risk[0.99]['ES'] * 100 ] normal_values = [ norm_var_95 * 100, norm_es_95 * 100, norm_var_99 * 100, norm_es_99 * 100 ]

x = np.arange(len(measures)) width = 0.35

bars1 = ax.bar(x - width/2, empirical_values, width, label='Empirical', color='C0', alpha=0.8) bars2 = ax.bar(x + width/2, normal_values, width, label='Normal Assumption', color='C1', alpha=0.8)

ax.set_xlabel('Risk Measure') ax.set_ylabel('Value (%)') ax.set_title('Risk Measure Comparison: Empirical vs Normal') ax.set_xticks(x) ax.set_xticklabels(measures) ax.legend()

Add value labels on bars

for bar in bars1: height = bar.get_height() ax.annotate(f'{height:.2f}%', xy=(bar.get_x() + bar.get_width() / 2, height), xytext=(0, 3), textcoords="offset points", ha='center', va='bottom', fontsize=8)

for bar in bars2: height = bar.get_height() ax.annotate(f'{height:.2f

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about probability distributions in finance.

Loading component...

Reference

BIBTEXAcademic
@misc{probabilitydistributionsinfinancenormallognormalfattails, author = {Michael Brenndoerfer}, title = {Probability Distributions in Finance: Normal, Lognormal & Fat Tails}, year = {2025}, url = {https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-12-25} }
APAAcademic
Michael Brenndoerfer (2025). Probability Distributions in Finance: Normal, Lognormal & Fat Tails. Retrieved from https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance
MLAAcademic
Michael Brenndoerfer. "Probability Distributions in Finance: Normal, Lognormal & Fat Tails." 2025. Web. 12/25/2025. <https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance>.
CHICAGOAcademic
Michael Brenndoerfer. "Probability Distributions in Finance: Normal, Lognormal & Fat Tails." Accessed 12/25/2025. https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Probability Distributions in Finance: Normal, Lognormal & Fat Tails'. Available at: https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance (Accessed: 12/25/2025).
SimpleBasic
Michael Brenndoerfer (2025). Probability Distributions in Finance: Normal, Lognormal & Fat Tails. https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance