Probability Distributions in Finance: Normal, Lognormal & Fat Tails

Michael Brenndoerfer

Quantitative Finance Data, Analytics & AI

Master probability distributions essential for quantitative finance: normal, lognormal, binomial, Poisson, and fat-tailed distributions with Python examples.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Common Probability Distributions in FinanceLink Copied

Probability distributions form the mathematical backbone of quantitative finance. Every time you estimate the risk of a portfolio, price an option, or model the likelihood of default, you are making assumptions about how random outcomes are distributed. These assumptions determine whether your models produce reasonable predictions or catastrophic failures.

The choice of distribution matters enormously. The 2008 financial crisis revealed that many risk models had dramatically underestimated the probability of extreme market moves. These models often assumed that returns followed a normal distribution, which assigns vanishingly small probabilities to the kind of large daily moves that occurred repeatedly during the crisis. Understanding the properties and limitations of different distributions has direct consequences for trading strategies, risk limits, and regulatory capital requirements.

This chapter covers the distributions you will encounter most frequently in quantitative finance: the normal distribution that underlies classical portfolio theory, the lognormal distribution that models asset prices, discrete distributions like the binomial and Poisson that appear in derivatives pricing and event modeling, and fat-tailed distributions that better capture the extreme events observed in real markets. For each distribution, we examine both the mathematical properties and the financial intuition behind its use.

The Normal DistributionLink Copied

The normal distribution, also called the Gaussian distribution, is the starting point for most financial models. Its prominence stems from the Central Limit Theorem, which states that the sum of many independent random variables with finite variance tends toward a normal distribution regardless of the underlying distributions. Since asset returns can be viewed as the aggregate effect of many independent pieces of information hitting the market, the normal distribution provides a natural first approximation. This explains why the normal distribution became the default assumption in finance before computers made alternative distributions practical.

Normal Distribution

A continuous probability distribution characterized by its symmetric, bell-shaped curve. It is fully specified by two parameters: the mean $\mu$ (center) and variance $\sigma^2$ (spread). Also known as the Gaussian distribution.

Mathematical PropertiesLink Copied

To understand what it means for a random variable to follow a normal distribution, we must first grasp the concept of a probability density function. Unlike discrete random variables that take on specific values with certain probabilities, a continuous random variable like one following a normal distribution can take any value along the real line. The probability density function tells us how likely the variable is to fall in any particular region. The higher the density at a point, the more probable values near that point become.

A random variable $X$ follows a normal distribution with mean $\mu$ and variance $\sigma^2$ , written $X \sim N(\mu, \sigma^2)$ , if its probability density function is:

f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad -\infty < x < \infty

Each component serves a specific purpose. Let's examine each component:

$x$ : the value at which we evaluate the density
$\mu$ : the mean (center of the distribution)
$\sigma$ : the standard deviation (controls the spread)
$\frac{1}{\sigma\sqrt{2\pi}}$ : a normalization constant ensuring the density integrates to 1
$(x-\mu)^2/2\sigma^2$ : measures how many standard deviations $x$ is from the mean, squared and scaled

The heart of the normal distribution lies in the exponential term. The quantity $(x-\mu)^2$ measures the squared distance between the point $x$ and the center of the distribution $\mu$ . Dividing by $2\sigma^2$ scales this distance relative to the spread of the distribution. A value two units away from the mean is much more unusual when $\sigma = 0.5$ than when $\sigma = 10$ , because the standardized distance is larger. The negative sign in the exponent ensures that the function decreases as we move away from the mean, and the exponential function transforms this squared distance into a probability density that decays smoothly and symmetrically.

The exponential of a negative quadratic creates the characteristic bell shape: values near the mean have high probability density, while values far from the mean have exponentially decreasing density. This rapid decay is why the normal distribution assigns negligible probability to extreme events. The probability of being more than a few standard deviations from the mean is vanishingly small, a property that proves problematic when modeling financial markets where extreme moves occur more frequently than this decay suggests. The quadratic term in the exponent is the key: because squaring grows faster than linear growth, and the exponential function amplifies this, the tails of the normal distribution thin out extraordinarily quickly.

The normalization constant $\frac{1}{\sigma\sqrt{2\pi}}$ deserves special attention. This factor ensures that when we integrate the density function over all possible values from negative infinity to positive infinity, we obtain exactly 1. This is the total probability. The $\sqrt{2\pi}$ arises from the Gaussian integral, one of the most famous results in calculus, while the $\sigma$ in the denominator accounts for how wider distributions (larger $\sigma$ ) must have lower peak heights to maintain the same total area under the curve.

Key parameters:

Mean ( $\mu$ ): The center of the distribution and the expected value of $X$ . In finance, this represents the expected return.
Variance ( $\sigma^2$ ): Measures the spread of the distribution. The square root, $\sigma$ , is the standard deviation, which in finance we call volatility.
Skewness: Zero for the normal distribution, meaning it is perfectly symmetric around the mean.
Kurtosis: Equal to 3 for the normal distribution (or excess kurtosis of 0), which serves as the baseline for measuring tail thickness. Distributions with higher kurtosis have more probability mass in the tails and center, less in the shoulders.

The symmetry of the normal distribution is both its elegance and its limitation. A symmetric distribution treats upward and downward deviations identically, which aligns with our intuition that the laws of probability should not favor one direction over another for fundamental random processes. However, financial returns often exhibit asymmetry: crashes tend to be more severe and sudden than rallies of similar magnitude. This asymmetry, measured by skewness, is one reason the normal distribution serves as only a first approximation to market behavior.

The standard normal distribution is the special case where $\mu = 0$ and $\sigma^2 = 1$ . Any normal variable can be standardized:

Z = \frac{X - \mu}{\sigma} \sim N(0, 1)

This standardization process transforms any normal random variable into the canonical standard normal form. The operation $\frac{X - \mu}{\sigma}$ first centers the distribution by subtracting the mean (so the new center is at zero), then scales it by dividing by the standard deviation (so the new spread equals one). This transformation preserves the essential shape of the distribution while placing it in a standard coordinate system.

This standardization is critical because it allows us to use a single table of probabilities for any normal distribution. The cumulative distribution function of the standard normal, denoted $\Phi(z)$ , gives the probability that a standard normal variable is less than $z$ . Before computers, statisticians relied on printed tables of $\Phi(z)$ values, and the standardization formula allowed them to answer questions about any normal distribution using this single reference. Even today, the standard normal serves as the universal reference point: when we speak of a "3-sigma event," we mean an outcome more than three standard deviations from the mean, regardless of what the actual mean and standard deviation are.

The 68-95-99.7 RuleLink Copied

One of the most useful properties of the normal distribution is how probability mass is concentrated around the mean:

Approximately 68% of observations fall within one standard deviation of the mean
Approximately 95% fall within two standard deviations
Approximately 99.7% fall within three standard deviations

These percentages emerge directly from integrating the normal density function, but their approximate values (68, 95, and 99.7) are worth committing to memory because they provide instant intuition for assessing how unusual any particular observation is. If someone tells you that an event was a "two-sigma move," you immediately know that such events occur roughly 5% of the time, or about one trading day in twenty.

This rule provides quick mental estimates for risk. If daily stock returns have a standard deviation of 1%, then under normality assumptions, you would expect a move larger than 3% (three standard deviations) to occur only about 0.3% of the time, or roughly once per year of trading days. This calculation reveals both the power and the danger of the normal assumption: it gives us precise quantitative predictions, but those predictions can be spectacularly wrong if the true distribution differs from normal.

Out[2]:

Visualization

Bell curve showing standard normal distribution with shaded confidence intervals. — The standard normal distribution with shaded regions showing the 68-95-99.7 rule. Most probability mass is concentrated near the mean, with tail probabilities declining exponentially.

Out[3]:

Visualization

Line plot of normal probability density functions with different means and fixed standard deviation. — Normal distributions with different means (σ fixed). Changing the mean shifts the distribution horizontally.

Line plot of normal probability density functions with fixed mean and different standard deviations. — Normal distributions with different standard deviations (μ fixed). Changing the standard deviation affects the spread and peak height.

Applications in FinanceLink Copied

The normal distribution appears throughout quantitative finance:

Portfolio theory assumes returns are normally distributed, which allows risk to be fully characterized by variance. Harry Markowitz's mean-variance optimization framework rests on this assumption. The profound insight behind this choice is that if returns are normal, then an investor who cares about expected utility needs to consider only the mean and variance of portfolio returns, since all higher moments are determined by these two parameters. This reduces the complex problem of choosing among thousands of possible portfolios to a tractable optimization problem with a clear geometric interpretation.

Value at Risk (VaR) calculations often assume normal returns. The VaR at confidence level $\alpha$ (reported as a positive loss) is:

\text{VaR}_{\alpha} = z_{\alpha} \sigma - \mu

This formula translates the abstract concept of a tail quantile into a concrete loss estimate. Under the assumption that returns are normally distributed with mean $\mu$ and standard deviation $\sigma$ , we seek the threshold such that returns fall below this level only $(1-\alpha)$ percent of the time. The standard normal quantile $z_\alpha$ tells us how many standard deviations below the mean this threshold lies.

Each component of this formula plays a specific role:

$\text{VaR}_\alpha$ : the Value at Risk at confidence level $\alpha$ , representing the maximum expected loss
$\mu$ : the expected return over the time horizon
$\sigma$ : the volatility (standard deviation of returns) over the time horizon
$z_\alpha$ : the $\alpha$ -quantile of the standard normal distribution (e.g., $z_{0.95} \approx 1.645$ for 95% confidence)

The formula works because under normality, the $(1-\alpha)$ -percentile of returns is exactly $\mu - z_\alpha \sigma$ . For example, the 95% VaR uses $z_{0.95} \approx 1.645$ , meaning we expect losses to exceed this threshold only 5% of the time. Taking the negative of this return quantile yields the positive loss amount $z_\alpha \sigma - \mu$ .

Out[4]:

Visualization

Normal distribution with VaR threshold and shaded tail region showing Expected Shortfall. — Value at Risk (VaR) and Expected Shortfall (ES) visualized on a return distribution. VaR marks the threshold below which losses occur with probability alpha, while ES is the average of losses beyond this threshold.

Option pricing via the Black-Scholes model assumes that log-returns are normally distributed, as we will explore in a later chapter.

In[5]:

Code

import numpy as np
from scipy import stats


def calculate_var_normal(mu, sigma, confidence_level=0.95, horizon=1):
    """
    Calculate Value at Risk assuming normal returns.

    Parameters:
        mu: Expected return (daily)
        sigma: Volatility (daily standard deviation)
        confidence_level: Confidence level (e.g., 0.95 for 95% VaR)
        horizon: Time horizon in days

    Returns:
        VaR as a positive number (potential loss)
    """
    # Scale parameters to the horizon
    mu_scaled = mu * horizon
    sigma_scaled = sigma * np.sqrt(horizon)

    # Find the quantile
    z = stats.norm.ppf(1 - confidence_level)

    # VaR is the negative of the quantile (expressed as positive loss)
    var = -(mu_scaled + z * sigma_scaled)
    return var


# Example: Daily return of 0.05% with 1.5% volatility
daily_return = 0.0005
daily_vol = 0.015

var_95_1day = calculate_var_normal(daily_return, daily_vol, 0.95, 1)
var_99_1day = calculate_var_normal(daily_return, daily_vol, 0.99, 1)
var_95_10day = calculate_var_normal(daily_return, daily_vol, 0.95, 10)

import numpy as np
from scipy import stats


def calculate_var_normal(mu, sigma, confidence_level=0.95, horizon=1):
    """
    Calculate Value at Risk assuming normal returns.

    Parameters:
        mu: Expected return (daily)
        sigma: Volatility (daily standard deviation)
        confidence_level: Confidence level (e.g., 0.95 for 95% VaR)
        horizon: Time horizon in days

    Returns:
        VaR as a positive number (potential loss)
    """
    # Scale parameters to the horizon
    mu_scaled = mu * horizon
    sigma_scaled = sigma * np.sqrt(horizon)

    # Find the quantile
    z = stats.norm.ppf(1 - confidence_level)

    # VaR is the negative of the quantile (expressed as positive loss)
    var = -(mu_scaled + z * sigma_scaled)
    return var


# Example: Daily return of 0.05% with 1.5% volatility
daily_return = 0.0005
daily_vol = 0.015

var_95_1day = calculate_var_normal(daily_return, daily_vol, 0.95, 1)
var_99_1day = calculate_var_normal(daily_return, daily_vol, 0.99, 1)
var_95_10day = calculate_var_normal(daily_return, daily_vol, 0.95, 10)

Out[6]:

Console

Portfolio with daily return 0.05% and volatility 1.50%
1-day 95% VaR: 2.42%
1-day 99% VaR: 3.44%
10-day 95% VaR: 7.30%

These VaR figures tell us the maximum expected loss at each confidence level. The 1-day 95% VaR of approximately 2.4% means that under normal market conditions, we would expect to lose more than 2.4% only about one trading day in twenty. The 10-day VaR is higher due to the square-root-of-time scaling that arises from the assumption that daily returns are independent and identically distributed. This square-root relationship is a direct consequence of the variance addition property: if daily variances are $\sigma^2$ , then the variance over $n$ days is $n\sigma^2$ , so the standard deviation scales as $\sqrt{n}\sigma$ .

The Lognormal DistributionLink Copied

While returns may be approximately normal, prices cannot be. A stock price can rise 50% but cannot fall more than 100%. This asymmetry is captured by the lognormal distribution, which constrains values to be positive. The lognormal distribution emerges naturally when we consider the multiplicative nature of returns: a stock that gains 10% and then loses 10% does not return to its starting price. This multiplicative structure, when iterated over many periods, gives rise to the lognormal distribution through the magic of the exponential function.

Lognormal Distribution

A probability distribution of a random variable whose logarithm is normally distributed. If $\ln(X) \sim N(\mu, \sigma^2)$ , then $X$ follows a lognormal distribution. Asset prices are often modeled as lognormal.

From Returns to PricesLink Copied

The connection between normal returns and lognormal prices is one of the most important relationships in quantitative finance. If log-returns are normally distributed, then prices are lognormally distributed. Consider a stock with initial price $S_0$ and continuously compounded return $r$ over some period:

S_T = S_0 e^r

This equation captures a key relationship in how prices evolve. The exponential function $e^r$ converts the additive world of log-returns into the multiplicative world of price changes. When $r = 0.10$ , the stock has grown by a factor of $e^{0.10} \approx 1.105$ , or about 10.5%. The slight difference between the log-return and the percentage return becomes more pronounced for larger moves.

The components of this price equation are:

$S_T$ : stock price at time $T$
$S_0$ : initial stock price
$r$ : continuously compounded return over the period
$e^r$ : the growth factor, converting log-return to price ratio

If $r \sim N(\mu, \sigma^2)$ , then $S_T$ follows a lognormal distribution. This relationship is fundamental: the exponential function transforms the symmetric, unbounded normal distribution into a right-skewed distribution bounded below by zero, which matches the behavior of asset prices that can rise without limit but cannot fall below zero. The transformation preserves the elegant properties of the normal distribution while mapping them onto the economically sensible domain of positive prices.

The probability density function of the lognormal distribution takes a form that reveals its connection to the normal:

f(x) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0

Compare this to the normal density to see the key differences. The argument of the exponential now involves $\ln x$ rather than $x$ , reflecting that we are measuring distance in the log-space. The presence of $1/x$ in front of the exponential arises from the chain rule when transforming from the normal to the lognormal through the change of variables.

Each element of this formula serves a distinct purpose:

$x$ : the value at which we evaluate the density (must be positive)
$\mu$ : the mean of $\ln(X)$ , not the mean of $X$ itself
$\sigma$ : the standard deviation of $\ln(X)$ , not of $X$ itself
$1/x$ : this factor (compared to the normal PDF) accounts for the change of variables from $\ln(x)$ to $x$

The key insight is that $\mu$ and $\sigma$ are parameters of the underlying normal distribution of log-values, not the lognormal variable itself. This distinction often causes confusion: when we say a stock has lognormal volatility of 20%, we mean the standard deviation of its log-returns, not of the returns themselves. The actual mean and variance of a lognormal variable are:

E[X] = e^{\mu + \sigma^2/2}

\text{Var}(X) = (e^{\sigma^2} - 1)e^{2\mu + \sigma^2}

These formulas reveal how the parameters of the underlying normal distribution translate into the moments of the lognormal:

$E[X]$ : the expected value (mean) of the lognormal variable $X$
$\text{Var}(X)$ : the variance of $X$
$\mu$ : the mean of $\ln(X)$ , the underlying normal distribution
$\sigma^2$ : the variance of $\ln(X)$
$e^{\mu + \sigma^2/2}$ : the mean is always greater than $e^\mu$ due to the convexity of the exponential function

The $\sigma^2/2$ term in the mean formula is a convexity adjustment: because the exponential function is convex, Jensen's inequality tells us that $E[e^Y] > e^{E[Y]}$ for any non-constant random variable $Y$ . Specifically, the curvature of the exponential means that high values of $Y$ contribute disproportionately to the mean of $e^Y$ . This is why the expected value of a lognormal variable is not simply $e^\mu$ but includes the variance correction, since higher variance increases this convexity effect. This seemingly technical point has profound practical implications: it means that an asset with higher volatility will have a higher expected price level even if the expected log-return remains unchanged.

Geometric Brownian MotionLink Copied

The standard model for stock price dynamics is geometric Brownian motion (GBM), which produces lognormally distributed prices. This model has been central to quantitative finance since Samuelson's work in the 1960s and forms the foundation for the Black-Scholes option pricing formula. Understanding GBM requires familiarity with stochastic calculus, but the core idea is intuitive. Prices evolve continuously with both a deterministic trend and random fluctuations proportional to the current price.

Under GBM, the stock price evolves according to:

dS = \mu S \, dt + \sigma S \, dW

This stochastic differential equation describes infinitesimal changes in the stock price. Each term has a clear interpretation:

$dS$ : infinitesimal change in stock price
$S$ : current stock price
$\mu$ : drift rate (expected return per unit time)
$\sigma$ : volatility (standard deviation of returns per unit time)
$dt$ : infinitesimal time increment
$dW$ : increment of a Wiener process (Brownian motion), with $dW \sim N(0, dt)$

The first term $\mu S \, dt$ represents the deterministic trend, while the second term $\sigma S \, dW$ captures random fluctuations proportional to the current price. The key feature is that both terms are proportional to $S$ : a $100 stock and a$ 10 stock with the same parameters will have the same percentage return distribution, but the absolute dollar changes will be ten times larger for the more expensive stock. This proportionality is what makes GBM appropriate for modeling asset prices, which tend to grow (or shrink) in percentage terms.

Solving this stochastic differential equation yields:

S_T = S_0 \exp\left[\left(\mu - \frac{\sigma^2}{2}\right)T + \sigma W_T\right]

This solution tells us exactly where the stock price will be at any future time, given the realized path of the Brownian motion. The formula has a clear structure:

$S_T$ : stock price at time $T$
$S_0$ : initial stock price at time 0
$\mu$ : drift rate (annualized expected return)
$\sigma$ : volatility (annualized standard deviation)
$T$ : time horizon in years
$W_T$ : value of the Wiener process at time $T$ , with $W_T \sim N(0, T)$
$\sigma^2/2$ : Itô correction term arising from the quadratic variation of Brownian motion

The Itô correction $\sigma^2/2$ appears because we apply Itô's lemma to $\ln(S)$ : the second derivative term $\frac{1}{2}\frac{\partial^2 \ln S}{\partial S^2}(\sigma S)^2 = -\frac{\sigma^2}{2}$ contributes to the drift. This correction is one of the key differences between ordinary calculus and stochastic calculus. In ordinary calculus, if we know $dS/S$ , we would conclude that $d(\ln S) = dS/S$ . However, when $S$ follows a diffusion process, the additional $-\sigma^2/2$ term arises from the non-zero quadratic variation of Brownian motion. This mathematical detail has important financial implications: it means the expected log-return is less than the expected arithmetic return, a difference that matters for long-horizon investing.

This model forms the foundation of the Black-Scholes option pricing formula.

Out[7]:

Visualization

Line plot of a normal probability density over daily returns with a vertical line at zero. — Normal distribution of returns, centered near zero and symmetric.

Line plot of a lognormal probability density over stock prices with a vertical line at the initial price. — Lognormal distribution of prices, bounded below by zero and positively skewed.

Simulating Price PathsLink Copied

Monte Carlo simulation of stock prices under GBM is simple. We can either simulate the continuous process or use the exact solution. The approach below uses the exact solution, which is more efficient and avoids discretization errors that accumulate in step-by-step simulation methods. By using the closed-form solution for $S_T$ directly, we ensure that our simulated paths have the correct distribution regardless of the time step size.

In[8]:

Code

import numpy as np


def simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths):
    """
    Simulate geometric Brownian motion paths.

    Parameters:
        S0: Initial stock price
        mu: Annualized drift (expected return)
        sigma: Annualized volatility
        T: Time horizon in years
        n_steps: Number of time steps
        n_paths: Number of simulation paths

    Returns:
        Array of shape (n_paths, n_steps + 1) containing price paths
    """
    dt = T / n_steps

    # Generate random increments
    np.random.seed(42)
    Z = np.random.standard_normal((n_paths, n_steps))

    # Calculate log returns for each step
    log_returns = (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z

    # Cumulative sum to get log prices relative to initial
    log_prices = np.zeros((n_paths, n_steps + 1))
    log_prices[:, 0] = np.log(S0)
    log_prices[:, 1:] = np.log(S0) + np.cumsum(log_returns, axis=1)

    # Convert to prices
    prices = np.exp(log_prices)
    return prices


# Simulate 1 year of daily prices
S0 = 100
mu = 0.10  # 10% expected annual return
sigma = 0.25  # 25% annual volatility
T = 1.0
n_steps = 252  # Trading days
n_paths = 1000

paths = simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths)
final_prices = paths[:, -1]

import numpy as np


def simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths):
    """
    Simulate geometric Brownian motion paths.

    Parameters:
        S0: Initial stock price
        mu: Annualized drift (expected return)
        sigma: Annualized volatility
        T: Time horizon in years
        n_steps: Number of time steps
        n_paths: Number of simulation paths

    Returns:
        Array of shape (n_paths, n_steps + 1) containing price paths
    """
    dt = T / n_steps

    # Generate random increments
    np.random.seed(42)
    Z = np.random.standard_normal((n_paths, n_steps))

    # Calculate log returns for each step
    log_returns = (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z

    # Cumulative sum to get log prices relative to initial
    log_prices = np.zeros((n_paths, n_steps + 1))
    log_prices[:, 0] = np.log(S0)
    log_prices[:, 1:] = np.log(S0) + np.cumsum(log_returns, axis=1)

    # Convert to prices
    prices = np.exp(log_prices)
    return prices


# Simulate 1 year of daily prices
S0 = 100
mu = 0.10  # 10% expected annual return
sigma = 0.25  # 25% annual volatility
T = 1.0
n_steps = 252  # Trading days
n_paths = 1000

paths = simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths)
final_prices = paths[:, -1]

Out[9]:

Console

Initial price: $100.00
Expected final price (theoretical): $110.52
Mean simulated final price: $110.44
Median simulated final price: $108.19
5th percentile: $71.15
95th percentile: $160.84

Notice that the mean simulated price is close to the theoretical expected value of $S_0 e^{\mu T}$ , but the median is lower. This is typical of lognormal distributions. Positive skewness means that a few very high outcomes pull the mean above the median. In practical terms, this means that while the average outcome is favorable, the most likely outcome (the mode) and the middle outcome (the median) are both lower than the average. An investor thinking about typical outcomes should focus on the median, while an investor thinking about long-run average performance should focus on the mean.

Out[10]:

Visualization

Spaghetti plot of simulated GBM price paths with an overlaid expected path. — Monte Carlo simulation of 1000 stock price paths under geometric Brownian motion. The fan-like shape illustrates how uncertainty increases with time.

Histogram of simulated final stock prices with a vertical line marking the mean. — Distribution of final simulated prices under geometric Brownian motion after one year, illustrating the right-skewed outcomes of a lognormal model.

Discrete Distributions: Binomial and PoissonLink Copied

Not all financial phenomena are continuous. Events like defaults, dividend announcements, or large jumps in prices are discrete occurrences. The binomial and Poisson distributions model these situations, providing a framework for counting how many times something happens rather than measuring continuous quantities. These distributions are particularly important in credit risk, where we count defaults in a portfolio, and in derivatives pricing, where discrete lattice models approximate continuous price processes.

The Binomial DistributionLink Copied

The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success. The classic example is flipping a coin multiple times and counting heads, but the same mathematics applies to questions such as: How many stocks in a portfolio will outperform? How many corporate bonds will default? How many days this month will have positive returns?

Binomial Distribution

The probability distribution of the number of successes $k$ in $n$ independent Bernoulli trials, where each trial has success probability $p$ . Used in discrete option pricing models and credit risk modeling.

If $X \sim \text{Binomial}(n, p)$ , the probability mass function is:

P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n

This formula answers the question: if we have $n$ independent trials, each with success probability $p$ , what is the probability of getting exactly $k$ successes? The answer decomposes into two multiplicative components: the number of ways the successes can be arranged, and the probability of any particular arrangement.

Each component of this formula serves a specific purpose:

$k$ : the number of successes we're computing the probability for
$n$ : total number of independent trials
$p$ : probability of success on each trial
$\binom{n}{k} = \frac{n!}{k!(n-k)!}$ : the binomial coefficient, representing the number of ways to choose $k$ successes among $n$ trials
$p^k$ : probability of exactly $k$ successes
$(1-p)^{n-k}$ : probability of exactly $n-k$ failures

The formula multiplies the number of arrangements by the probability of each specific arrangement. Intuitively, if you flip a coin $n$ times with probability $p$ of heads, the PMF tells you the probability of getting exactly $k$ heads: you need $k$ successes (probability $p^k$ ), $n-k$ failures (probability $(1-p)^{n-k}$ ), and the binomial coefficient counts how many orderings produce that outcome. The key insight is that because the trials are independent, the probability of any specific sequence of outcomes is simply the product of the individual probabilities, and the binomial coefficient tells us how many such sequences lead to exactly $k$ successes.

The mean and variance have elegant formulas that reveal the distribution's structure:

E[X] = np, \quad \text{Var}(X) = np(1-p)

These formulas encode intuitive relationships:

$E[X]$ : the expected number of successes
$\text{Var}(X)$ : the variance in the number of successes
$n$ : total number of trials
$p$ : probability of success on each trial
$(1-p)$ : probability of failure, which appears in variance because maximum variance occurs when $p = 0.5$ (maximum uncertainty)

The expected number of successes, $np$ , is simply the number of trials times the success probability, exactly what intuition suggests. The variance formula $np(1-p)$ is more subtle: it shows that variance is maximized when $p = 0.5$ (complete uncertainty about each trial's outcome) and decreases toward zero as $p$ approaches 0 or 1 (near certainty). This makes sense because if $p$ is very close to 1, almost all trials succeed, leaving little room for variability in the total count.

Out[11]:

Visualization

Bar chart of binomial PMFs for n=20 at p=0.2, 0.5, and 0.8. — Binomial probability mass functions for fixed n=20 and varying success probability p, showing how the mean and skewness shift with p.

Line plot of binomial PMFs for p=0.3 at n=10, 30, and 50. — Binomial probability mass functions for fixed p=0.3 and varying n, showing increasing concentration around the mean as n grows.

Binomial Option Pricing ModelLink Copied

The Cox-Ross-Rubinstein (CRR) binomial model prices options by assuming that at each time step, the stock price either moves up by a factor $u$ or down by a factor $d$ . This discretization transforms the continuous option pricing problem into a tree of possible outcomes that can be analyzed one step at a time. While this may seem like a crude approximation to the continuous world of actual markets, the model provides deep insights into option pricing and converges to the continuous Black-Scholes formula as the number of time steps increases.

After $n$ steps, the stock price has experienced $k$ up moves and $n-k$ down moves, giving:

S_T = S_0 u^k d^{n-k}

This formula shows how the final price depends entirely on the number of up moves, not their specific timing. The components are:

$S_T$ : stock price at maturity after $n$ steps
$S_0$ : initial stock price
$u$ : up factor (price multiplier for an up move)
$d$ : down factor (price multiplier for a down move)
$k$ : number of up moves
$n-k$ : number of down moves Whether the stock rises first and then falls, or falls first and then rises, the final price is the same as long as the total number of up and down moves matches. This path-independence is crucial for European options, which depend only on the final price, though it breaks down for path-dependent derivatives like Asian options.

The number of up moves follows a binomial distribution under the risk-neutral measure, with probability $q$ of an up move:

q = \frac{e^{r\Delta t} - d}{u - d}

This risk-neutral probability is one of the most important concepts in derivatives pricing. Notice that $q$ is not the actual probability that the stock will go up. It is the artificial probability that makes the expected return on the stock equal to the risk-free rate. The components are:

$q$ : risk-neutral probability of an up move
$r$ : risk-free interest rate (annualized)
$\Delta t$ : length of each time step in years
$u$ : up factor (stock price multiplier for an up move)
$d$ : down factor (stock price multiplier for a down move)
$e^{r\Delta t}$ : the growth factor for a risk-free investment over one time step

This formula is derived from the no-arbitrage condition: the expected stock return under the risk-neutral measure must equal the risk-free rate. Setting $q \cdot u + (1-q) \cdot d = e^{r\Delta t}$ and solving for $q$ yields this result. The key insight is that option prices depend only on these risk-neutral probabilities, not on the actual probabilities of up and down moves. This separation of pricing from probability estimation is what makes derivatives pricing possible without knowing investors' risk preferences.

Out[12]:

Visualization

Recombining binomial tree diagram showing stock prices at each node. — A 4-step binomial tree showing possible stock price paths. The tree recombines because an up move followed by a down move gives the same price as a down move followed by an up move. Node labels show the stock price at each node.

In[13]:

Code

import numpy as np
from scipy.special import comb


def binomial_option_price(S0, K, T, r, sigma, n_steps, option_type="call"):
    """
    Price a European option using the CRR binomial model.

    Parameters:
        S0: Initial stock price
        K: Strike price
        T: Time to expiration (years)
        r: Risk-free rate (annualized)
        sigma: Volatility (annualized)
        n_steps: Number of time steps
        option_type: 'call' or 'put'

    Returns:
        Option price
    """
    dt = T / n_steps

    # CRR parameters
    u = np.exp(sigma * np.sqrt(dt))
    d = 1 / u
    q = (np.exp(r * dt) - d) / (u - d)

    # Calculate all possible final stock prices
    k = np.arange(n_steps + 1)
    final_prices = S0 * (u**k) * (d ** (n_steps - k))

    # Calculate payoffs
    if option_type == "call":
        payoffs = np.maximum(final_prices - K, 0)
    else:
        payoffs = np.maximum(K - final_prices, 0)

    # Risk-neutral probabilities
    probs = comb(n_steps, k) * (q**k) * ((1 - q) ** (n_steps - k))

    # Expected payoff under risk-neutral measure, discounted
    option_price = np.exp(-r * T) * np.sum(probs * payoffs)

    return option_price


# Price a call option
S0 = 100
K = 100
T = 0.5  # 6 months
r = 0.05
sigma = 0.20

import numpy as np
from scipy.special import comb


def binomial_option_price(S0, K, T, r, sigma, n_steps, option_type="call"):
    """
    Price a European option using the CRR binomial model.

    Parameters:
        S0: Initial stock price
        K: Strike price
        T: Time to expiration (years)
        r: Risk-free rate (annualized)
        sigma: Volatility (annualized)
        n_steps: Number of time steps
        option_type: 'call' or 'put'

    Returns:
        Option price
    """
    dt = T / n_steps

    # CRR parameters
    u = np.exp(sigma * np.sqrt(dt))
    d = 1 / u
    q = (np.exp(r * dt) - d) / (u - d)

    # Calculate all possible final stock prices
    k = np.arange(n_steps + 1)
    final_prices = S0 * (u**k) * (d ** (n_steps - k))

    # Calculate payoffs
    if option_type == "call":
        payoffs = np.maximum(final_prices - K, 0)
    else:
        payoffs = np.maximum(K - final_prices, 0)

    # Risk-neutral probabilities
    probs = comb(n_steps, k) * (q**k) * ((1 - q) ** (n_steps - k))

    # Expected payoff under risk-neutral measure, discounted
    option_price = np.exp(-r * T) * np.sum(probs * payoffs)

    return option_price


# Price a call option
S0 = 100
K = 100
T = 0.5  # 6 months
r = 0.05
sigma = 0.20

Out[14]:

Console

Binomial Model Call Option Prices
-----------------------------------
Steps:   10  |  Price: $6.7494
Steps:   50  |  Price: $6.8605
Steps:  100  |  Price: $6.8746
Steps:  200  |  Price: $6.8817
Steps:  500  |  Price: $6.8859
-----------------------------------
Black-Scholes:  |  Price: $6.8887

As the number of steps increases, the binomial model converges to the Black-Scholes price. This convergence illustrates how the discrete binomial model approximates the continuous-time lognormal model. The rate of convergence is roughly $O(1/n)$ , meaning that doubling the number of steps roughly halves the error. This convergence is not merely a numerical curiosity; it reflects a deep mathematical connection between the binomial random walk and Brownian motion, formalized in Donsker's theorem.

Out[15]:

Visualization

Line plot showing binomial price converging to Black-Scholes price with increasing steps. — Convergence of the binomial option pricing model to the Black-Scholes price as the number of time steps increases. The oscillation around the true price decreases as n grows, demonstrating the model's consistency.

The Poisson DistributionLink Copied

The Poisson distribution models the number of events occurring in a fixed interval when events happen at a constant average rate and independently of each other. Unlike the binomial distribution, which assumes a fixed number of trials, the Poisson distribution allows for any number of events, making it useful for modeling rare occurrences over a continuous interval.

Poisson Distribution

A discrete probability distribution expressing the probability of a given number of events occurring in a fixed interval, when events occur at a constant average rate $\lambda$ independently of each other. Used for modeling defaults, jumps, and arrival processes.

If $X \sim \text{Poisson}(\lambda)$ , the probability mass function is:

P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots

This formula has a clear structure when we trace its derivation from first principles. Imagine dividing the interval into many tiny subintervals, each with a small probability of containing an event. As the number of subintervals goes to infinity while keeping the expected total number of events fixed at $\lambda$ , we obtain the Poisson distribution.

The components of this formula are:

$k$ : the number of events we're computing the probability for
$\lambda$ : the average rate of events (expected number of events in the interval)
$e^{-\lambda}$ : the probability of zero events, which serves as a normalization factor
$\lambda^k$ : captures how the probability grows with more opportunities for events
$k!$ : accounts for the indistinguishability of event orderings

The Poisson distribution arises as the limit of a binomial distribution when $n \to \infty$ and $p \to 0$ while $np = \lambda$ remains constant, which corresponds to many trials with small individual probabilities. This limiting relationship explains why the Poisson distribution is appropriate for rare events: if we have many opportunities for something unlikely to happen, and the total expected count is finite, the number of occurrences follows a Poisson distribution. This is precisely the situation in credit risk (many bonds, each with a small default probability) and in jump modeling (many small time intervals, each with a small probability of containing a jump).

Both the mean and variance equal $\lambda$ :

E[X] = \text{Var}(X) = \lambda

This property deserves emphasis:

$E[X]$ : the expected number of events
$\text{Var}(X)$ : the variance in the number of events
$\lambda$ : the average rate parameter

The equality of mean and variance is a distinctive property of the Poisson distribution. When analyzing count data, if the observed variance significantly exceeds the mean (overdispersion), this suggests the Poisson model may be inadequate. Overdispersion often indicates that events are not independent. For instance, defaults may cluster in time due to common economic factors, or that price jumps tend to trigger subsequent jumps through market feedback effects.

Out[16]:

Visualization

Line plot of Poisson PMFs for λ equal to 1, 3, 7, and 15. — Poisson probability mass functions for different rate parameters λ, showing how the distribution becomes more symmetric as λ increases.

Bar chart comparing Poisson(5) probabilities to binomial approximations with n equal to 10, 50, and 200. — Poisson distribution as the rare-event limit of the binomial distribution, comparing Poisson(λ=5) to Binomial(n, λ/n) for increasing n.

Applications of the Poisson DistributionLink Copied

The Poisson distribution is particularly useful for modeling:

Jump processes in asset prices: The Merton jump-diffusion model adds Poisson-distributed jumps to geometric Brownian motion, capturing sudden large price movements that GBM alone cannot produce. In this model, the stock price follows GBM between jumps, but occasionally experiences discrete jumps whose arrival is governed by a Poisson process. This combination captures both the continuous fluctuations of normal trading and the occasional large movements caused by major news events.

Credit defaults in a portfolio: If defaults occur independently with a small probability, the number of defaults in a large portfolio is approximately Poisson distributed. This approximation underlies many credit risk models, although the assumption of independence often breaks down during systemic crises, when many borrowers default simultaneously.

Order arrivals: In market microstructure, the arrival of orders is often modeled as a Poisson process. This assumption implies that the time between consecutive orders follows an exponential distribution, without clustering or periodicity in order flow. While real markets show more complex arrival patterns, the Poisson model provides a useful baseline.

In[17]:

Code

import numpy as np


def simulate_jump_diffusion(
    S0, mu, sigma, lambda_jump, mu_jump, sigma_jump, T, n_steps, n_paths
):
    """
    Simulate Merton jump-diffusion price paths.

    Parameters:
        S0: Initial stock price
        mu: Drift of the diffusion component
        sigma: Volatility of the diffusion component
        lambda_jump: Average number of jumps per year (Poisson intensity)
        mu_jump: Mean of log-jump size
        sigma_jump: Std dev of log-jump size
        T: Time horizon in years
        n_steps: Number of time steps
        n_paths: Number of paths to simulate

    Returns:
        Array of simulated price paths
    """
    dt = T / n_steps

    np.random.seed(42)

    # Diffusion component
    Z_diffusion = np.random.standard_normal((n_paths, n_steps))
    diffusion = (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z_diffusion

    # Jump component
    # Number of jumps in each interval (Poisson distributed)
    n_jumps = np.random.poisson(lambda_jump * dt, (n_paths, n_steps))

    # Log jump sizes (normal)
    jump_sizes = np.zeros((n_paths, n_steps))
    for i in range(n_paths):
        for j in range(n_steps):
            if n_jumps[i, j] > 0:
                jumps = np.random.normal(mu_jump, sigma_jump, n_jumps[i, j])
                jump_sizes[i, j] = np.sum(jumps)

    # Combine diffusion and jumps
    log_returns = diffusion + jump_sizes

    # Calculate prices
    log_prices = np.zeros((n_paths, n_steps + 1))
    log_prices[:, 0] = np.log(S0)
    log_prices[:, 1:] = np.log(S0) + np.cumsum(log_returns, axis=1)

    return np.exp(log_prices)


# Compare GBM vs Jump-Diffusion
S0 = 100
mu = 0.10
sigma = 0.20
T = 1.0
n_steps = 252
n_paths = 10000

# Pure GBM paths
gbm_paths = simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths)

# Jump-diffusion paths
jd_paths = simulate_jump_diffusion(
    S0,
    mu,
    sigma,
    lambda_jump=2,  # 2 jumps per year on average
    mu_jump=-0.05,  # Jumps tend to be negative
    sigma_jump=0.10,
    T=T,
    n_steps=n_steps,
    n_paths=n_paths,
)

gbm_returns = np.diff(np.log(gbm_paths), axis=1).flatten()
jd_returns = np.diff(np.log(jd_paths), axis=1).flatten()

import numpy as np


def simulate_jump_diffusion(
    S0, mu, sigma, lambda_jump, mu_jump, sigma_jump, T, n_steps, n_paths
):
    """
    Simulate Merton jump-diffusion price paths.

    Parameters:
        S0: Initial stock price
        mu: Drift of the diffusion component
        sigma: Volatility of the diffusion component
        lambda_jump: Average number of jumps per year (Poisson intensity)
        mu_jump: Mean of log-jump size
        sigma_jump: Std dev of log-jump size
        T: Time horizon in years
        n_steps: Number of time steps
        n_paths: Number of paths to simulate

    Returns:
        Array of simulated price paths
    """
    dt = T / n_steps

    np.random.seed(42)

    # Diffusion component
    Z_diffusion = np.random.standard_normal((n_paths, n_steps))
    diffusion = (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z_diffusion

    # Jump component
    # Number of jumps in each interval (Poisson distributed)
    n_jumps = np.random.poisson(lambda_jump * dt, (n_paths, n_steps))

    # Log jump sizes (normal)
    jump_sizes = np.zeros((n_paths, n_steps))
    for i in range(n_paths):
        for j in range(n_steps):
            if n_jumps[i, j] > 0:
                jumps = np.random.normal(mu_jump, sigma_jump, n_jumps[i, j])
                jump_sizes[i, j] = np.sum(jumps)

    # Combine diffusion and jumps
    log_returns = diffusion + jump_sizes

    # Calculate prices
    log_prices = np.zeros((n_paths, n_steps + 1))
    log_prices[:, 0] = np.log(S0)
    log_prices[:, 1:] = np.log(S0) + np.cumsum(log_returns, axis=1)

    return np.exp(log_prices)


# Compare GBM vs Jump-Diffusion
S0 = 100
mu = 0.10
sigma = 0.20
T = 1.0
n_steps = 252
n_paths = 10000

# Pure GBM paths
gbm_paths = simulate_gbm_paths(S0, mu, sigma, T, n_steps, n_paths)

# Jump-diffusion paths
jd_paths = simulate_jump_diffusion(
    S0,
    mu,
    sigma,
    lambda_jump=2,  # 2 jumps per year on average
    mu_jump=-0.05,  # Jumps tend to be negative
    sigma_jump=0.10,
    T=T,
    n_steps=n_steps,
    n_paths=n_paths,
)

gbm_returns = np.diff(np.log(gbm_paths), axis=1).flatten()
jd_returns = np.diff(np.log(jd_paths), axis=1).flatten()

Out[18]:

Console

Return Distribution Comparison
==================================================
Statistic                     GBM  Jump-Diffusion
--------------------------------------------------
Mean                     0.000312       -0.000083
Std Dev                  0.012609        0.016096
Skewness                  -0.0006         -3.1474
Excess Kurtosis           -0.0002         55.5761
Min                       -0.0605         -0.4391
Max                        0.0611          0.3572

The jump-diffusion model produces returns with higher kurtosis (fatter tails) and more negative skewness than pure GBM. The minimum return is more extreme, reflecting occasional large negative jumps. These properties better match empirical observations of asset returns. The higher kurtosis indicates that the jump-diffusion model places more probability mass both in the center (small moves) and in the tails (large moves), with less probability in the intermediate range, exactly the pattern observed in real financial data.

Out[19]:

Visualization

Overlaid histograms of GBM and jump-diffusion daily log returns. — Return distributions from geometric Brownian motion versus Merton jump-diffusion. The jump-diffusion model produces heavier tails and more extreme values, better matching observed market behavior.

Q-Q scatter plot of empirical returns versus theoretical normal quantiles for GBM and jump-diffusion samples. — Q-Q plot against the normal distribution comparing GBM and jump-diffusion returns, highlighting tail departures from normality.

Fat-Tailed DistributionsLink Copied

Empirical studies consistently show that asset returns have fatter tails than the normal distribution predicts. Events that should occur once in centuries under normality happen multiple times per decade. This observation has major implications for risk management. The discrepancy between theoretical and observed extreme event frequencies is not a minor technical issue. It represents a fundamental mismatch between the mathematical assumptions underlying most financial models and the actual behavior of markets.

Evidence of Fat TailsLink Copied

The term "fat tails" refers to distributions where extreme events occur more frequently than a normal distribution would suggest. A distribution with fat tails places more probability mass in the regions far from the mean, meaning that large positive or negative values are more common than the normal distribution's rapid exponential decay would predict. Several ways to measure this phenomenon:

Kurtosis: The normal distribution has a kurtosis of 3 (or excess kurtosis of 0). Distributions with higher kurtosis have fatter tails. Equity returns typically exhibit excess kurtosis of 3 to 10, so extreme returns are far more common than the normal distribution predicts. The kurtosis measures the fourth moment of the distribution relative to its variance squared, essentially capturing how much probability mass resides in the tails versus the shoulders of the distribution.

Tail exponents: For very fat-tailed distributions, the probability of extreme events decays as a power law rather than exponentially. The tail exponent $\alpha$ characterizes how fast this decay occurs. A smaller $\alpha$ means slower decay and fatter tails, while larger $\alpha$ means faster decay approaching the thin tails of the normal distribution.

Out[21]:

Console

Simulated Daily Returns (Heavy-Tailed, t-distribution df=4)
==================================================
Number of observations: 5,999
Mean daily return: 0.0175%
Daily volatility: 1.3961%
Skewness: 0.250
Excess Kurtosis: 6.072

Extreme Event Frequency
--------------------------------------------------
Event             Observed  Expected (Normal)      Ratio
--------------------------------------------------
|r| > 3σ                77               16.2        4.8x
|r| > 4σ                29                0.4       76.3x
|r| > 5σ                10                0.0     2907.6x

The data reveals a stark departure from normality. Events beyond 4 standard deviations occur roughly 75 times more frequently than a normal distribution predicts. Under the normal assumption, a 5-sigma event should occur once every 7,000 years of trading, yet we observe multiple such events in just two decades. This massive discrepancy is not a statistical fluke. It reflects a fundamental difference between the mathematical properties of the normal distribution and the actual mechanics of how prices move in financial markets.

The Student's t DistributionLink Copied

The Student's t distribution provides a simple way to model fat tails while retaining many convenient properties of the normal distribution. William Sealy Gosset originally developed the t distribution (writing under the pseudonym "Student") for quality control at Guinness brewery. It is now widely used in finance for modeling the heavy tails observed in return data.

Student's t Distribution

A probability distribution with heavier tails than the normal distribution, characterized by its degrees of freedom parameter $\nu$ . As $\nu \to \infty$ , it converges to the normal distribution. For small $\nu$ , it assigns higher probability to extreme events.

The probability density function of the standard t distribution with $\nu$ degrees of freedom is:

f(x) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}

This formula may look complex, but its structure shows the key difference between the t distribution and the normal distribution. While the normal distribution uses the exponential function $e^{-x^2/2}$ which decays very rapidly, the t distribution uses a power function $(1 + x^2/\nu)^{-(\nu+1)/2}$ which decays much more slowly. This slower decay is precisely what produces the heavier tails.

Each component:

$x$ : the value at which we evaluate the density
$\nu$ : degrees of freedom parameter (controls tail thickness; lower $\nu$ means fatter tails)
$\Gamma(\cdot)$ : the gamma function, generalizing the factorial to non-integer values ( $\Gamma(n) = (n-1)!$ for positive integers)
$\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)}$ : a normalization constant ensuring the density integrates to 1
$(1 + x^2/\nu)^{-(\nu+1)/2}$ : the core term that produces polynomial rather than exponential tail decay

The polynomial decay $(1 + x^2/\nu)^{-(\nu+1)/2}$ versus the normal's exponential decay $e^{-x^2/2}$ is why the t distribution has heavier tails. To understand this intuitively, consider what happens as $x$ becomes large. For the normal distribution, $e^{-x^2/2}$ decreases very fast. Doubling $x$ squares the exponent, making the probability essentially vanish. For the t distribution, doubling $x$ roughly halves the probability raised to a power, which is a much slower decay. As $\nu \to \infty$ , the polynomial term converges to the exponential, and the t distribution approaches the normal.

Key properties of the t distribution:

Degrees of freedom ( $\nu$ ): Controls tail thickness. Lower values mean fatter tails.
Convergence: As $\nu \to \infty$ , the t distribution approaches the standard normal.
Variance: For $\nu > 2$ , the variance is $\nu/(\nu-2)$ , which is always greater than 1. For $\nu \leq 2$ , the variance is infinite. This is an extreme case where the tails are so heavy that second moments do not exist.
Kurtosis: For $\nu > 4$ , excess kurtosis is $6/(\nu-4)$ , always positive and decreasing in $\nu$ . For $\nu \leq 4$ , the kurtosis is undefined because the fourth moment does not exist.

For financial applications, a t distribution with 4 to 6 degrees of freedom often fits return data well, capturing the observed excess kurtosis while staying manageable. The fact that empirically fitted degrees of freedom are typically in this range has important implications. With $\nu$ around 4, we are near the boundary where the fourth moment (and hence kurtosis) becomes undefined, indicating that we are dealing with genuinely heavy-tailed phenomena that push the limits of standard statistical tools.

Out[22]:

Visualization

Line plot of normal and t distribution PDFs for multiple degrees of freedom on a linear y-axis. — Comparison of normal and Student's t distributions with varying degrees of freedom. Lower degrees of freedom produce heavier tails, assigning higher probability to extreme events.

Semilog-y plot of the same PDFs to emphasize tail differences. — Tail behavior on a log scale, highlighting how the Student's t distribution assigns higher probability to extreme events than the normal distribution.

The log-scale plot on the right reveals how dramatically the distributions differ in their tails. At 5 standard deviations, the t distribution with 3 degrees of freedom assigns probability thousands of times higher than the normal distribution. This difference, while invisible on a linear scale where both probabilities appear to be essentially zero, has enormous practical significance for risk management. The one-sided probability of a 5-sigma event under a normal distribution is about 1 in 3.5 million; under a t distribution with 3 degrees of freedom scaled to have unit variance, it is about 1 in 600, a difference of several orders of magnitude that transforms a once-in-millennia event into something that can happen every few years.

Fitting the t Distribution to Market DataLink Copied

We can fit a scaled t distribution to observed returns and compare the fit to a normal distribution. This exercise demonstrates not just that the t distribution fits better, but quantifies how much better, providing a concrete measure of the inadequacy of the normal assumption.

In[23]:

Code

from scipy import stats
import numpy as np

# Fit a scaled t distribution to the simulated returns
# The t distribution has location (mu), scale (sigma), and degrees of freedom (nu)
# Use the returns array from the fat-tails-evidence cell
# Ensure returns is available and convert to numpy array
# Regenerate data to ensure availability (matching fat-tails-evidence cell)
np.random.seed(42)
n_days = 6000
daily_returns = np.random.standard_t(df=4, size=n_days) * 0.01 + 0.0003
returns_array = daily_returns

# Maximum likelihood estimation
t_params = stats.t.fit(returns_array)
t_nu, t_loc, t_scale = t_params

# Fit normal for comparison
norm_loc, norm_scale = stats.norm.fit(returns_array)

# Calculate log-likelihoods
t_loglik = np.sum(stats.t.logpdf(returns_array, t_nu, t_loc, t_scale))
norm_loglik = np.sum(stats.norm.logpdf(returns_array, norm_loc, norm_scale))

from scipy import stats
import numpy as np

# Fit a scaled t distribution to the simulated returns
# The t distribution has location (mu), scale (sigma), and degrees of freedom (nu)
# Use the returns array from the fat-tails-evidence cell
# Ensure returns is available and convert to numpy array
# Regenerate data to ensure availability (matching fat-tails-evidence cell)
np.random.seed(42)
n_days = 6000
daily_returns = np.random.standard_t(df=4, size=n_days) * 0.01 + 0.0003
returns_array = daily_returns

# Maximum likelihood estimation
t_params = stats.t.fit(returns_array)
t_nu, t_loc, t_scale = t_params

# Fit normal for comparison
norm_loc, norm_scale = stats.norm.fit(returns_array)

# Calculate log-likelihoods
t_loglik = np.sum(stats.t.logpdf(returns_array, t_nu, t_loc, t_scale))
norm_loglik = np.sum(stats.norm.logpdf(returns_array, norm_loc, norm_scale))

Out[24]:

Console

Distribution Fitting Results
==================================================

Normal Distribution:
  Location (μ): 0.000079
  Scale (σ): 0.013944
  Log-likelihood: 17,123

Student's t Distribution:
  Degrees of freedom (ν): 4.23
  Location: 0.000055
  Scale: 0.010172
  Log-likelihood: 17,526

Log-likelihood improvement: 403
The t distribution provides a substantially better fit to the data.

The fitted t distribution has around 4-5 degrees of freedom, far from the normal limit. The large improvement in log-likelihood confirms that the t distribution better captures the observed return distribution. The log-likelihood difference of several thousand translates to overwhelming statistical evidence against the normal distribution. The odds ratio in favor of the t distribution is very large.

Out[25]:

Visualization

Histogram of simulated returns with overlaid normal and t density curves. — Fitted distributions compared to simulated heavy-tailed returns. The Student's t distribution better captures the peaked center and fat tails.

Q-Q scatter plots of empirical versus theoretical quantiles for normal and t fits with a 45-degree reference line. — Q-Q plots comparing simulated returns to fitted normal and fitted t distributions, showing where the normal fit breaks down in the tails.

Power-Law TailsLink Copied

Some financial time series have tails that are even fatter than what the t distribution can capture. In extreme cases, the tails follow a power law:

P(X > x) \sim x^{-\alpha} \quad \text{as } x \to \infty

This asymptotic relationship describes the behavior of the survival function (the probability of exceeding a threshold) for large values. Unlike the exponential tails of the normal distribution or even the polynomial tails of the t distribution, power-law tails can be extraordinarily heavy, with profound implications for the existence of moments and the behavior of aggregates.

The components of this relationship are:

$P(X > x)$ : the probability that the random variable exceeds value $x$ (the survival function)
$x$ : a large threshold value
$\alpha$ : the tail exponent (also called the tail index), which controls how fast probabilities decay
$\sim$ : denotes asymptotic equivalence. The ratio of the two sides approaches a constant as $x \to \infty$

A larger $\alpha$ means faster decay and thinner tails; a smaller $\alpha$ means slower decay and fatter tails. The probability of extreme events decays polynomially rather than exponentially. For equity returns, empirical studies suggest $\alpha \approx 3$ for the tails, known as the "cubic law." Under this law, the probability of a return twice as large as a given threshold is approximately $2^{-3} = 1/8$ as likely. This is a much slower decay than the exponential falloff of normal distributions.

To appreciate the difference, consider that under a normal distribution, doubling the threshold from 4σ to 8σ reduces the probability by a factor of roughly $10^{12}$ , while under a cubic power law, the reduction is only a factor of 8. This staggering difference explains why normal-based risk models fail so spectacularly during market crises. They treat large moves as highly improbable when in fact they are merely uncommon.

Power Law Distribution

A distribution where the probability of observing a value greater than $x$ decays as a power of $x$ : $P(X > x) \propto x^{-\alpha}$ . The tail exponent $\alpha$ determines how fat the tails are. Lower $\alpha$ means fatter tails.

Out[26]:

Visualization

Log-log plot comparing survival functions of normal, t, and power-law distributions. — Comparison of tail decay rates for different distributions. The log-log plot reveals the fundamental difference between exponential decay (normal) and power-law decay. Power-law tails appear as straight lines on this scale, while exponential tails curve downward.

The implications of power-law tails are serious for risk management:

Moments may not exist: If $\alpha \leq 2$ , the variance is infinite. If $\alpha \leq 1$ , even the mean is undefined. For the cubic law with $\alpha \approx 3$ , the variance exists but the fourth moment (kurtosis) does not, which explains why sample kurtosis estimates are so unstable.
Slow convergence: The Central Limit Theorem converges very slowly for heavy-tailed distributions. While the theorem still applies (for $\alpha > 2$ ), the sample size needed for the normal approximation to become accurate can be impractically large.
Tail risk dominates: With power-law tails, a small number of extreme events can dominate the total risk. In a portfolio context, this means that diversification benefits may be less than standard theory suggests.

Implications for Risk ManagementLink Copied

Fat tails change how we should think about risk:

VaR underestimation: Using normal-based VaR systematically underestimates tail risk. The true 99% VaR may be two to three times larger than the normal approximation suggests. This systematic bias grows worse at higher confidence levels, precisely where accuracy matters most.

Expected shortfall: Expected Shortfall (ES), also called Conditional VaR, measures the expected loss given that losses exceed VaR. For fat-tailed distributions, ES can be substantially larger than VaR. While VaR tells you the threshold that losses exceed only $\alpha$ percent of the time, ES tells you the average severity of those exceedances, and for heavy-tailed distributions, this average can be surprisingly large. The implications of power-law tails are serious for risk management:

Moments may not exist: If $\alpha \leq 2$ , the variance is infinite. If $\alpha \leq 1$ , even the mean is undefined.
Slow convergence: The Central Limit Theorem converges very slowly for heavy-tailed distributions.
Tail risk dominates: A small number of extreme events can dominate total risk.

In[27]:

Code

import numpy as np
from scipy import stats


def calculate_var_es(returns, confidence_levels=[0.95, 0.99]):
    """
    Calculate VaR and Expected Shortfall from historical returns.

    Returns:
        Dictionary with VaR and ES for each confidence level
    """
    results = {}
    for cl in confidence_levels:
        # VaR is the negative of the quantile at (1 - confidence level)
        var = -np.percentile(returns, (1 - cl) * 100)

        # ES is the mean of returns worse than VaR
        losses = -returns
        es = np.mean(losses[losses >= var])

        results[cl] = {"VaR": var, "ES": es, "ES/VaR": es / var}
    return results


# Regenerate data to ensure availability
np.random.seed(42)
n_days = 6000
returns_array = np.random.standard_t(df=4, size=n_days) * 0.01 + 0.0003
norm_loc, norm_scale = stats.norm.fit(returns_array)

# Compare empirical vs normal assumption
empirical_risk = calculate_var_es(returns_array)

# Normal assumption
norm_var_95 = -stats.norm.ppf(0.05, norm_loc, norm_scale)
norm_var_99 = -stats.norm.ppf(0.01, norm_loc, norm_scale)


# For normal (reported as a positive loss), ES = -mu + sigma * phi(z) / alpha where phi is the pdf
def normal_es(alpha, mu, sigma):
    z = stats.norm.ppf(alpha)
    return -mu + sigma * stats.norm.pdf(z) / alpha


norm_es_95 = normal_es(0.05, norm_loc, norm_scale)
norm_es_99 = normal_es(0.01, norm_loc, norm_scale)

import numpy as np
from scipy import stats


def calculate_var_es(returns, confidence_levels=[0.95, 0.99]):
    """
    Calculate VaR and Expected Shortfall from historical returns.

    Returns:
        Dictionary with VaR and ES for each confidence level
    """
    results = {}
    for cl in confidence_levels:
        # VaR is the negative of the quantile at (1 - confidence level)
        var = -np.percentile(returns, (1 - cl) * 100)

        # ES is the mean of returns worse than VaR
        losses = -returns
        es = np.mean(losses[losses >= var])

        results[cl] = {"VaR": var, "ES": es, "ES/VaR": es / var}
    return results


# Regenerate data to ensure availability
np.random.seed(42)
n_days = 6000
returns_array = np.random.standard_t(df=4, size=n_days) * 0.01 + 0.0003
norm_loc, norm_scale = stats.norm.fit(returns_array)

# Compare empirical vs normal assumption
empirical_risk = calculate_var_es(returns_array)

# Normal assumption
norm_var_95 = -stats.norm.ppf(0.05, norm_loc, norm_scale)
norm_var_99 = -stats.norm.ppf(0.01, norm_loc, norm_scale)


# For normal (reported as a positive loss), ES = -mu + sigma * phi(z) / alpha where phi is the pdf
def normal_es(alpha, mu, sigma):
    z = stats.norm.ppf(alpha)
    return -mu + sigma * stats.norm.pdf(z) / alpha


norm_es_95 = normal_es(0.05, norm_loc, norm_scale)
norm_es_99 = normal_es(0.01, norm_loc, norm_scale)

Out[28]:

Console

Risk Measure Comparison: Empirical vs Normal Assumption
=================================================================
Measure                 Empirical       Normal        Ratio
-----------------------------------------------------------------
95% VaR                   2.1177%      2.2856%         0.93x
95% ES                    3.1541%      2.8682%         1.10x
99% VaR                   3.6798%      3.2358%         1.14x
99% ES                    5.0384%      3.7084%         1.36x
-----------------------------------------------------------------

The empirical measures are consistently higher than normal predictions,
especially for the more extreme 99% confidence level.

The table demonstrates that normal-based risk measures systematically underestimate actual risk. The underestimation is more severe for higher confidence levels, precisely where accuracy matters most for risk management. At the 99% level used by many regulatory frameworks, the true VaR can be nearly twice the normal-based estimate, meaning that a bank believing it is adequately capitalized based on normal assumptions may in fact be dangerously undercapitalized.

Out[29]:

Visualization

Grouped bar chart comparing empirical and normal VaR/ES at 95% and 99% confidence levels. — Comparison of VaR and Expected Shortfall under normal versus empirical distributions. The bars show how normal assumptions underestimate risk measures, with the gap widening at higher confidence levels.

QuizLink Copied

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about probability distributions in finance.

Loading component...

Comments

Back to Quantitative Finance

Previous Chapter

Probability Theory Fundamentals

Next Chapter

Statistical Data Analysis and Inference

Reference

BIBTEXAcademic

@misc{probabilitydistributionsinfinancenormallognormalfattails, author = {Michael Brenndoerfer}, title = {Probability Distributions in Finance: Normal, Lognormal & Fat Tails}, year = {2025}, url = {https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-01-01} }

APAAcademic

Michael Brenndoerfer (2025). Probability Distributions in Finance: Normal, Lognormal & Fat Tails. Retrieved from https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance

MLAAcademic

Michael Brenndoerfer. "Probability Distributions in Finance: Normal, Lognormal & Fat Tails." 2026. Web. today. <https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance>.

CHICAGOAcademic

Michael Brenndoerfer. "Probability Distributions in Finance: Normal, Lognormal & Fat Tails." Accessed today. https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Probability Distributions in Finance: Normal, Lognormal & Fat Tails'. Available at: https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance (Accessed: today).

SimpleBasic

Michael Brenndoerfer (2025). Probability Distributions in Finance: Normal, Lognormal & Fat Tails. https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance

Direct link:

https://mbrenndoerfer.com/writing/probability-distributions-quantitative-finance

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, leading AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications Contact Books

Probability Distributions in Finance: Normal, Lognormal & Fat Tails

Common Probability Distributions in FinanceLink Copied

The Normal DistributionLink Copied

Mathematical PropertiesLink Copied

The 68-95-99.7 RuleLink Copied

Applications in FinanceLink Copied

The Lognormal DistributionLink Copied

From Returns to PricesLink Copied

Geometric Brownian MotionLink Copied

Simulating Price PathsLink Copied

Discrete Distributions: Binomial and PoissonLink Copied

The Binomial DistributionLink Copied

Binomial Option Pricing ModelLink Copied

The Poisson DistributionLink Copied

Applications of the Poisson DistributionLink Copied

Fat-Tailed DistributionsLink Copied

Evidence of Fat TailsLink Copied

The Student's t DistributionLink Copied

Fitting the t Distribution to Market DataLink Copied

Power-Law TailsLink Copied

Implications for Risk ManagementLink Copied

QuizLink Copied

Comments

Reference

About the author: Michael Brenndoerfer

Related Content

Data Handling & Visualization: Python for Quant Finance

Numerical Methods in Finance: Algorithms for Pricing & Risk

Integral Calculus & Differential Equations in Finance

Stay updated

Comments

About the author: Michael Brenndoerfer

Related Content

Data Handling & Visualization: Python for Quant Finance

Numerical Methods in Finance: Algorithms for Pricing & Risk

Integral Calculus & Differential Equations in Finance

Stay updated