APT and Multi-Factor Models: Fama-French Factors Explained

Michael Brenndoerfer

Quantitative Finance Data, Analytics & AI

Learn Arbitrage Pricing Theory and multi-factor models. Master Fama-French factors, estimate factor loadings via regression, and decompose portfolio risk.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Arbitrage Pricing Theory and Multi-Factor ModelsLink Copied

In the previous chapter, we explored the Capital Asset Pricing Model, which elegantly explains expected returns through a single factor: the market portfolio. While CAPM provides powerful insights, it relies on strong assumptions and reduces all systematic risk to one dimension. Real markets, however, are influenced by multiple sources of risk: interest rate changes, inflation surprises, oil price shocks, and shifts in investor sentiment toward different types of stocks.

Arbitrage Pricing Theory (APT), developed by Stephen Ross in 1976, offers a more flexible framework. Rather than prescribing a single source of systematic risk, APT allows for multiple factors to drive returns. The theory rests on a simple but powerful idea: in well-functioning markets, arbitrage opportunities cannot persist. This no-arbitrage condition, combined with a factor structure for returns, yields pricing relationships similar to CAPM but with greater generality.

This chapter develops the APT framework from first principles, then explores its practical implementation through multi-factor models. We'll examine the famous Fama-French factors that have transformed empirical finance, learn to estimate factor exposures through regression, and build working factor models. By the end, you'll understand both the theoretical foundations and practical applications of factor-based investing.

The Limitations of Single-Factor ModelsLink Copied

Before diving into APT, let's understand why we need something beyond CAPM. The single-factor model assumes that the market portfolio captures all systematic risk. Under this view, two stocks with the same beta should have identical expected returns, regardless of their other characteristics.

Empirical evidence tells a different story. Decades of research have documented persistent patterns that CAPM cannot explain:

Size effect: Small-capitalization stocks have historically outperformed large-cap stocks, even after adjusting for their higher market betas
Value premium: Stocks with high book-to-market ratios (value stocks) have earned higher returns than growth stocks with similar betas
Momentum: Stocks that performed well over the past year tend to continue outperforming in the near term
Profitability: More profitable firms earn higher returns than predicted by their market exposure alone

These anomalies suggest multiple sources of risk affect returns. A multi-factor framework better describes reality and provides more accurate risk assessment.

The APT FrameworkLink Copied

This section develops the theoretical foundations of APT, starting with its factor structure assumptions and building toward the pricing equation that emerges from no-arbitrage conditions.

Assumptions and Factor StructureLink Copied

APT begins with a fundamental assumption about how asset returns are generated. The core insight is that returns do not arise in isolation. Instead, they emerge from a combination of economy-wide forces that affect many assets simultaneously, plus company-specific events that affect only individual securities. This leads naturally to a linear factor model structure.

Each asset's return follows a linear factor model:

R_i = E[R_i] + \beta_{i1}F_1 + \beta_{i2}F_2 + \cdots + \beta_{ik}F_k + \epsilon_i

where:

$R_i$ : The realized return on asset $i$
$E[R_i]$ : The expected return on asset $i$
$F_j$ : The $j$ -th factor, representing a common source of risk. These are zero-mean surprise terms: $E[F_j] = 0$
$\beta_{ij}$ : The sensitivity of asset $i$ to factor $j$ , often called the factor loading or factor beta
$\epsilon_i$ : The idiosyncratic return component, specific to asset $i$ . By assumption, $E[\epsilon_i] = 0$ and $\text{Cov}(\epsilon_i, \epsilon_j) = 0$ for $i \neq j$

To understand this equation intuitively, think of it as a decomposition of returns into three distinct components. First, there is the expected return, which represents the baseline compensation investors anticipate for holding the asset. Second, there are the factor-related components, which capture how the asset responds to various systematic shocks in the economy. Third, there is the idiosyncratic term, which reflects news and events specific to that particular company.

The factors $F_j$ capture systematic risks that affect many assets simultaneously. When the Federal Reserve unexpectedly raises interest rates, this represents a realization of an interest rate factor that simultaneously affects bank stocks, utility stocks, and bond prices. When oil prices spike unexpectedly, energy companies and airlines experience common shocks through an oil price factor. The key insight is that these systematic factors create correlations across assets, linking the fortunes of securities that might otherwise seem unrelated.

The idiosyncratic term $\epsilon_i$ represents asset-specific news: earnings surprises, management changes, or product announcements that affect only that particular asset. A pharmaceutical company receiving FDA approval for a new drug experiences an idiosyncratic shock. This news affects that company's stock but has no direct impact on an unrelated technology firm. The assumption that idiosyncratic terms are uncorrelated across assets is crucial because it means that holding many assets allows investors to diversify away this company-specific risk.

Factor vs. Factor Realization

A subtle but important distinction: the factors $F_j$ in the return equation are surprise components (deviations from expected values), not the factor values themselves. If inflation was expected to be 3% but realized at 4%, the inflation factor equals 1%, not 4%.

APT requires relatively mild assumptions compared to CAPM:

Returns follow the factor structure described above
There are enough assets to diversify away idiosyncratic risk
Markets are competitive, and investors prefer more wealth to less
No arbitrage opportunities exist

Notice what's absent: APT does not require investors to have identical expectations, does not assume all investors hold the market portfolio, and does not require returns to be normally distributed. This generality comes at a cost: APT does not tell us what the factors are or how many exist.

The No-Arbitrage ArgumentLink Copied

The pricing relationship in APT emerges from the absence of arbitrage. This is a powerful approach because it requires only that markets function well enough to prevent riskless profit opportunities, rather than requiring that all investors behave optimally or hold identical beliefs.

Consider a well-diversified portfolio where idiosyncratic risk has been eliminated. When a portfolio contains many securities with independent idiosyncratic components, the law of large numbers ensures that these random shocks largely cancel out. The positive surprises from some holdings offset the negative surprises from others. In the limit, as the number of holdings grows large, idiosyncratic risk effectively vanishes.

Such a portfolio's return depends only on its factor exposures:

R_p = E[R_p] + \beta_{p1}F_1 + \beta_{p2}F_2 + \cdots + \beta_{pk}F_k

where:

$R_p$ : return on the portfolio
$E[R_p]$ : expected return on the portfolio
$\beta_{pj}$ : sensitivity of the portfolio to factor $j$
$F_j$ : factor $j$

This equation reveals something profound. Once idiosyncratic risk is diversified away, the portfolio's realized return differs from its expected return only because of factor surprises. If you could somehow construct a portfolio with zero exposure to all factors, you would know its return with certainty before it occurred.

Now consider constructing a portfolio with zero exposure to all factors, achieved by appropriate weighting of assets. This zero-factor portfolio bears no systematic risk. In the absence of arbitrage, it must earn the risk-free rate:

E[R_p] = r_f \quad \text{if } \beta_{pj} = 0 \text{ for all } j

where:

$E[R_p]$ : expected return on the portfolio
$r_f$ : risk-free rate
$\beta_{pj}$ : sensitivity of the portfolio to factor $j$

The logic here is compelling. If such a portfolio earned more than the risk-free rate, investors could borrow at the risk-free rate, invest in the portfolio, and earn a guaranteed profit with no risk. This would be a pure arbitrage opportunity. Conversely, if the portfolio earned less than the risk-free rate, investors could short the portfolio, invest the proceeds at the risk-free rate, and again earn a guaranteed profit. Competition among arbitrageurs ensures that neither situation can persist, forcing the zero-factor portfolio to earn exactly the risk-free rate.

Similarly, two portfolios with identical factor exposures must have identical expected returns. Otherwise, you could go long the higher-return portfolio, short the lower-return portfolio, and earn a risk-free profit.

The APT Pricing EquationLink Copied

These no-arbitrage conditions imply a linear relationship between expected returns and factor exposures:

E[R_i] = r_f + \beta_{i1}\lambda_1 + \beta_{i2}\lambda_2 + \cdots + \beta_{ik}\lambda_k

where:

$E[R_i]$ : expected return on asset $i$
$r_f$ : risk-free rate
$\beta_{ij}$ : sensitivity of asset $i$ to factor $j$
$\lambda_j$ : risk premium for factor $j$ , representing the additional expected return earned per unit of exposure to that factor

This is the central result of APT. The equation states that expected returns are determined entirely by factor exposures. An asset's expected return equals the risk-free rate plus compensation for each unit of systematic risk borne. The lambda terms represent the market price of each type of risk. If factor $j$ carries a risk premium of 4%, then an asset with a factor loading of 1.5 on that factor earns an additional 6% (1.5 times 4%) in expected return.

To derive this more formally, consider $k+1$ portfolios: one with zero exposure to all factors, and $k$ portfolios each with unit exposure to exactly one factor. The zero-exposure portfolio earns $r_f$ . A portfolio with unit exposure to factor $j$ and zero exposure to all other factors earns $r_f + \lambda_j$ .

For any asset or portfolio with arbitrary factor exposures $(\beta_1, \beta_2, \ldots, \beta_k)$ , you can replicate its factor risk using combinations of these basis portfolios. Think of it as building a synthetic version of the asset using building blocks of pure factor exposure. The no-arbitrage condition requires the asset's expected return to equal the replicating portfolio's return, giving us the APT pricing equation.

The derivation is simple and requires no assumptions about investor preferences, wealth distributions, or equilibrium conditions. The mere requirement that arbitrage opportunities be absent, combined with the factor structure of returns, delivers a complete pricing relationship.

Comparing APT and CAPMLink Copied

The APT pricing equation resembles CAPM but with crucial differences:

Comparison of CAPM and APT frameworks.

Aspect	CAPM	APT
Number of factors	Single (market)	Multiple (unspecified)
Factor identification	Prescribed (market portfolio)	Not specified
Theoretical basis	Equilibrium with utility maximization	No-arbitrage
Assumptions	Strong (normal returns, homogeneous expectations)	Weak (factor structure, no arbitrage)
Testability	Requires identifying the true market portfolio	Requires identifying the relevant factors

CAPM is actually a special case of APT when there is only one factor and that factor is the market return. In this case, $\lambda_1 = E[R_m] - r_f$ (the market risk premium), and we recover the familiar CAPM equation.

These models illustrate a broader principle in financial economics: APT gains generality through weaker assumptions but offers less specific predictions. CAPM tells us exactly which factor matters, namely the market portfolio, but requires strong assumptions that may not hold in practice. APT allows for multiple factors and requires only no-arbitrage, but it does not tell us which factors are relevant or how many to include. This trade-off between generality and specificity recurs throughout finance theory.

Out[2]:

Visualization

Return decomposition comparison for CAPM and APT models. CAPM (left) attributes systematic return to a single market factor, whereas the APT framework (right) identifies multiple sources of systematic risk to provide a more nuanced view of return drivers.

Multi-Factor Models in PracticeLink Copied

APT provides a theoretical foundation but leaves the factors unspecified. In practice, researchers and practitioners have developed two main approaches to identifying factors: macroeconomic factor models and fundamental factor models.

Macroeconomic Factor ModelsLink Copied

Macroeconomic models use observable economic variables as factors. This approach is intuitive: since factors represent systematic risks that affect many assets, they should correspond to economy-wide variables that influence corporate profits, discount rates, and investor behavior.

Chen, Roll, and Ross (1986) proposed a five-factor model using:

Industrial production growth: Captures the state of the real economy
Changes in expected inflation: Affects discount rates and corporate profits differently
Unexpected inflation: Transfers wealth between borrowers and lenders
Credit spread changes: The difference between corporate and government bond yields, capturing default risk perceptions
Term structure changes: Shifts in the yield curve slope, affecting the relative pricing of different maturities

Each of these variables has clear economic content. Industrial production growth measures the real output of the economy, and stocks of companies with greater exposure to economic cycles should be more sensitive to this factor. Unexpected inflation redistributes wealth between debtors and creditors, benefiting firms with fixed-rate debt while harming those with fixed-rate assets. Credit spread changes signal shifts in the perceived riskiness of corporate debt, which naturally affects equity values as well.

Macroeconomic factors offer clear economic interpretations, but data lags and revisions make real-time implementation challenging.

Fundamental Factor ModelsLink Copied

Fundamental factor models use characteristics of securities themselves to explain returns. Rather than specifying macroeconomic variables, these models identify factors based on firm attributes that have historically explained return differences.

This approach differs from macroeconomic models by asking which characteristics have predicted returns historically rather than specifying which economic variables should affect returns. This empirical focus reveals patterns not obvious from economic reasoning.

The most influential fundamental factor model is the Fama-French framework, which we examine in detail next.

The Fama-French Factor ModelsLink Copied

Eugene Fama and Kenneth French revolutionized empirical asset pricing with their 1993 three-factor model, later extended to five factors in 2015. These models have become the standard benchmark for evaluating investment performance and understanding return patterns.

The Three-Factor ModelLink Copied

The Fama-French three-factor model augments CAPM with two additional factors:

R_i - r_f = \alpha_i + \beta_i^{MKT} \cdot MKT + \beta_i^{SMB} \cdot SMB + \beta_i^{HML} \cdot HML + \epsilon_i

where:

$R_i$ : return on asset $i$
$r_f$ : risk-free rate
$\alpha_i$ : intercept (abnormal return)
$\beta_i^{MKT}, \beta_i^{SMB}, \beta_i^{HML}$ : sensitivities to the respective factors
$MKT$ : market factor ( $R_m - r_f$ )
$SMB$ : size factor (Small Minus Big)
$HML$ : value factor (High Minus Low)
$\epsilon_i$ : idiosyncratic error term

The model's structure reveals its purpose. The left-hand side measures the asset's excess return over the risk-free rate, which represents the premium investors earn for bearing risk. The right-hand side decomposes this premium into components: compensation for market risk, for size-related risk, for value-related risk, and any residual alpha that the factors cannot explain.

The factors are defined as follows:

MKT (Market): The excess return on a broad market portfolio, identical to the CAPM market factor. $MKT = R_m - r_f$
SMB (Small Minus Big): The return on a portfolio of small-cap stocks minus the return on a portfolio of large-cap stocks. SMB captures the size premium: small stocks' tendency to outperform large stocks
HML (High Minus Low): The return on a portfolio of high book-to-market (value) stocks minus the return on a portfolio of low book-to-market (growth) stocks. HML captures the value premium

The construction of SMB and HML as long-short portfolios is deliberate. By going long small stocks and short large stocks, SMB isolates the pure effect of size, controlling for other characteristics. Similarly, by going long value stocks and short growth stocks, HML isolates the pure effect of valuation. This long-short construction ensures that the factors are approximately uncorrelated with the market factor, making them useful for explaining return variation beyond what CAPM captures.

Factor Portfolio Construction

SMB and HML are not tradeable assets but portfolios constructed specifically to isolate size and value exposures. Fama and French construct these by sorting stocks into groups based on size and book-to-market, then taking appropriate long-short combinations.

Constructing the SMB and HML FactorsLink Copied

The construction methodology matters for understanding what these factors capture. Each year at the end of June, Fama and French sort stocks as follows:

Size sort: Stocks are ranked by market capitalization and divided at the median into Small and Big groups.

Book-to-market sort: Stocks are independently ranked by book-to-market ratio and divided into three groups: Low (bottom 30%), Medium (middle 40%), and High (top 30%).

This creates six portfolios from the intersection of two size groups and three book-to-market groups. The factors are then computed as:

\begin{aligned} SMB &= \frac{1}{3}(\text{Small/Low} + \text{Small/Medium} + \text{Small/High}) - \frac{1}{3}(\text{Big/Low} + \text{Big/Medium} + \text{Big/High}) \\ HML &= \frac{1}{2}(\text{Small/High} + \text{Big/High}) - \frac{1}{2}(\text{Small/Low} + \text{Big/Low}) \end{aligned}

where:

$\text{Small/High}$ : portfolio of small-cap, high book-to-market stocks
$\text{Big/High}$ : portfolio of large-cap, high book-to-market stocks
$\text{Small/Medium}$ : portfolio of small-cap, medium book-to-market stocks
$\text{Big/Medium}$ : portfolio of large-cap, medium book-to-market stocks
$\text{Small/Low}$ : portfolio of small-cap, low book-to-market stocks
$\text{Big/Low}$ : portfolio of large-cap, low book-to-market stocks

The averaging process in these formulas serves an important purpose. The SMB factor averages across book-to-market groups, isolating the size effect. By including small value, small medium, and small growth stocks on the long side, and big value, big medium, and big growth stocks on the short side, the factor captures the pure size effect without being contaminated by any particular book-to-market exposure.

The HML factor averages across size groups, isolating the value effect. By including both small and big value stocks on the long side, and both small and big growth stocks on the short side, the factor captures the pure value effect without being contaminated by size effects.

Out[3]:

Visualization

Fama-French 2x3 portfolio sorting methodology for factor construction. Stocks are sorted independently by size and valuation to create six portfolios, which are then used to calculate the SMB (size) and HML (value) factors.

The Five-Factor ModelLink Copied

In 2015, Fama and French extended their model with two additional factors based on profitability and investment:

R_i - r_f = \alpha_i + \beta_i^{MKT} \cdot MKT + \beta_i^{SMB} \cdot SMB + \beta_i^{HML} \cdot HML + \beta_i^{RMW} \cdot RMW + \beta_i^{CMA} \cdot CMA + \epsilon_i

where:

$R_i, r_f, \alpha_i, \epsilon_i$ : defined as in the three-factor model
$\beta_i^{j}$ : sensitivity to factor $j$
$MKT, SMB, HML$ : market, size, and value factors
$RMW$ : profitability factor (Robust Minus Weak)
$CMA$ : investment factor (Conservative Minus Aggressive)

Empirical observation and theoretical reasoning motivated adding these factors. Empirically, researchers found that profitability and investment patterns explained return variation not captured by the original three factors. Theoretically, these factors connect to fundamental valuation principles.

The new factors are:

RMW (Robust Minus Weak): The return on stocks with robust (high) operating profitability minus stocks with weak (low) profitability. Companies with higher profit margins tend to earn higher returns
CMA (Conservative Minus Aggressive): The return on stocks of companies with conservative (low) investment minus aggressive (high) investment. Firms that invest less tend to earn higher returns

The profitability factor has intuitive appeal. All else equal, a more profitable company should be worth more. If two companies have similar market prices but different profitabilities, the more profitable one offers better value and should earn higher subsequent returns. The investment factor relates to the rate at which companies are expanding their asset base. Firms investing heavily may be pursuing growth at the expense of current profitability, or they may be making poor capital allocation decisions. Either interpretation suggests that conservative investors, those who invest less aggressively, may earn higher returns.

The Momentum FactorLink Copied

While not part of the original Fama-French framework, momentum has become a standard factor in many models. The momentum factor (often called UMD for "Up Minus Down" or WML for "Winners Minus Losers") captures the tendency of recent winners to continue outperforming:

MOM = R_{winners} - R_{losers}

where:

$R_{winners}$ : return on the portfolio of recent top-performing stocks
$R_{losers}$ : return on the portfolio of recent bottom-performing stocks

Stocks are sorted based on their past 12-month returns (excluding the most recent month to avoid microstructure effects), and the factor is the return difference between the top and bottom deciles.

The exclusion of the most recent month is a subtle but important detail. Stock prices exhibit short-term reversal at very short horizons due to microstructure effects like bid-ask bounce. By skipping the most recent month, the momentum factor captures medium-term continuation rather than short-term noise.

Mark Carhart's 1997 four-factor model combined the Fama-French three factors with momentum:

R_i - r_f = \alpha_i + \beta_i^{MKT} \cdot MKT + \beta_i^{SMB} \cdot SMB + \beta_i^{HML} \cdot HML + \beta_i^{MOM} \cdot MOM + \epsilon_i

where:

$R_i$ : return on asset $i$
$r_f$ : risk-free rate
$\alpha_i$ : intercept (abnormal return)
$\epsilon_i$ : idiosyncratic error term
$MKT, SMB, HML$ : Fama-French three factors
$MOM$ : Momentum factor
$\beta_i^{j}$ : sensitivity to factor $j$

This model has become particularly popular for evaluating mutual fund performance. By including momentum, the model can distinguish between managers who generate alpha through stock selection and those who simply ride momentum trends.

Estimating Factor ExposuresLink Copied

With the factor model framework established, we now turn to the practical task of estimating an asset's factor exposures. As we discussed in Part III's chapter on Regression Analysis, time-series regression provides a natural estimation approach.

Time-Series RegressionLink Copied

For an individual stock or portfolio, we regress excess returns on the factor returns:

R_{i,t} - r_{f,t} = \alpha_i + \beta_i^{MKT} \cdot MKT_t + \beta_i^{SMB} \cdot SMB_t + \beta_i^{HML} \cdot HML_t + \epsilon_{i,t}

where:

$R_{i,t} - r_{f,t}$ : excess return on asset $i$ at time $t$
$\alpha_i$ : estimated alpha
$\beta_i^{FACTOR}$ : estimated factor loadings
$FACTOR_t$ : factor return at time $t$
$\epsilon_{i,t}$ : residual at time $t$

This regression has a natural interpretation. The dependent variable, excess return, is what we seek to explain. The independent variables, the factor returns, represent the systematic risk sources. The regression coefficients tell us how much the asset's return moves, on average, in response to each factor.

The regression coefficients $\beta_i^{MKT}$ , $\beta_i^{SMB}$ , and $\beta_i^{HML}$ are the estimated factor loadings. The intercept $\alpha_i$ represents the average return not explained by the factors: positive alpha suggests outperformance, negative alpha suggests underperformance.

The interpretation of factor loadings is straightforward:

$\beta^{SMB} > 0$ : The asset behaves like a small-cap stock
$\beta^{SMB} < 0$ : The asset behaves like a large-cap stock
$\beta^{HML} > 0$ : The asset behaves like a value stock
$\beta^{HML} < 0$ : The asset behaves like a growth stock

These interpretations connect statistical estimates to economic meaning. A stock with a large positive SMB loading tends to rise when small stocks outperform large stocks, regardless of the company's actual market capitalization. The factor loading captures behavioral similarity rather than category membership.

Cross-Sectional RegressionLink Copied

An alternative approach estimates factor risk premia rather than individual factor loadings. In cross-sectional regression, we use known or estimated betas as explanatory variables and current returns as the dependent variable:

R_{i,t} = \lambda_{0,t} + \lambda_{MKT,t} \hat{\beta}_i^{MKT} + \lambda_{SMB,t} \hat{\beta}_i^{SMB} + \lambda_{HML,t} \hat{\beta}_i^{HML} + \nu_{i,t}

where:

$R_{i,t}$ : return on asset $i$ at time $t$
$\lambda_{0,t}$ : intercept (zero-beta rate) at time $t$
$\lambda_{FACTOR,t}$ : estimated risk premium for the factor at time $t$
$\hat{\beta}_i^{FACTOR}$ : estimated factor loading for asset $i$ (from first pass)
$\nu_{i,t}$ : pricing error

This regression is run across all assets at each point in time, yielding time series of estimated risk premia $\lambda_t$ . The average of these estimates gives the historical factor risk premium.

The logic of cross-sectional regression differs fundamentally from time-series regression. In time-series regression, we ask: "Given that we know what the factors did, how sensitive was this particular asset?" In cross-sectional regression, we ask: "Given that we know each asset's sensitivities, what premium did the market pay for each type of risk?"

The Fama-MacBeth procedure (1973) formalizes this two-step approach:

First pass: Time-series regressions to estimate each asset's factor betas
Second pass: Cross-sectional regressions at each date to estimate factor premia

This methodology remains the standard for testing factor pricing models. It provides not only point estimates of factor premia but also standard errors that account for the time-series variation in estimated premia.

Working with Factor DataLink Copied

Let's implement a factor model using real data. The Fama-French factors are publicly available from Kenneth French's data library. We'll use the pandas-datareader library to access this data directly.

In[4]:

Code

!uv pip install pandas_datareader statsmodels

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr
import statsmodels.api as sm
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set style for all plots
plt.style.use('seaborn-v0_8-whitegrid')

!uv pip install pandas_datareader statsmodels

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr
import statsmodels.api as sm
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set style for all plots
plt.style.use('seaborn-v0_8-whitegrid')

In[5]:

Code

import numpy as np
import pandas as pd

# Download Fama-French 3 factor data
# The factors are in percentage terms
# ff_factors = pdr.DataReader('F-F_Research_Data_Factors', 'famafrench',
#                             start='2010-01-01', end='2023-12-31')[0]

# Synthetic data for demonstration (uncomment lines above for real data)
np.random.seed(42)
dates = pd.date_range(start="2010-01-01", end="2023-12-31", freq="M")
ff_factors = pd.DataFrame(
    np.random.normal(0.005, 0.03, (len(dates), 4)), index=dates
)
ff_factors.columns = ["MKT", "SMB", "HML", "RF"]

# Convert from percentage to decimal (if using real data)
# ff_factors = ff_factors / 100

import numpy as np
import pandas as pd

# Download Fama-French 3 factor data
# The factors are in percentage terms
# ff_factors = pdr.DataReader('F-F_Research_Data_Factors', 'famafrench',
#                             start='2010-01-01', end='2023-12-31')[0]

# Synthetic data for demonstration (uncomment lines above for real data)
np.random.seed(42)
dates = pd.date_range(start="2010-01-01", end="2023-12-31", freq="M")
ff_factors = pd.DataFrame(
    np.random.normal(0.005, 0.03, (len(dates), 4)), index=dates
)
ff_factors.columns = ["MKT", "SMB", "HML", "RF"]

# Convert from percentage to decimal (if using real data)
# ff_factors = ff_factors / 100

In[6]:

Code

# Examine the data structure
start_date = ff_factors.index[0]
end_date = ff_factors.index[-1]
n_obs = len(ff_factors)
head_data = ff_factors.head()

# Examine the data structure
start_date = ff_factors.index[0]
end_date = ff_factors.index[-1]
n_obs = len(ff_factors)
head_data = ff_factors.head()

Out[7]:

Console

Fama-French Factor Data (Monthly)
==================================================
Date range: 2010-01-31 to 2023-12-31
Number of observations: 168

First few rows:
                 MKT       SMB       HML        RF
2010-01-31  0.019901  0.000852  0.024431  0.050691
2010-02-28 -0.002025 -0.002024  0.052376  0.028023
2010-03-31 -0.009084  0.021277 -0.008903 -0.008972
2010-04-30  0.012259 -0.052398 -0.046748 -0.011869
2010-05-31 -0.025385  0.014427 -0.022241 -0.037369

The data contains monthly returns for each factor plus the risk-free rate. Let's examine the historical factor premia.

In[8]:

Code

# Calculate annualized statistics
factor_means = ff_factors[["MKT", "SMB", "HML"]].mean() * 12
factor_stds = ff_factors[["MKT", "SMB", "HML"]].std() * np.sqrt(12)
sharpe_ratios = factor_means / factor_stds

factor_stats = pd.DataFrame(
    {
        "Mean (Annual)": factor_means,
        "Std (Annual)": factor_stds,
        "Sharpe Ratio": sharpe_ratios,
    }
)

# Calculate annualized statistics
factor_means = ff_factors[["MKT", "SMB", "HML"]].mean() * 12
factor_stds = ff_factors[["MKT", "SMB", "HML"]].std() * np.sqrt(12)
sharpe_ratios = factor_means / factor_stds

factor_stats = pd.DataFrame(
    {
        "Mean (Annual)": factor_means,
        "Std (Annual)": factor_stds,
        "Sharpe Ratio": sharpe_ratios,
    }
)

Out[9]:

Console

Factor Statistics (2010-2023)
==================================================
     Mean (Annual)  Std (Annual)  Sharpe Ratio
MKT         0.0490        0.0957        0.5124
SMB         0.0605        0.0989        0.6116
HML         0.0630        0.1132        0.5561

The Sharpe ratios indicate risk-adjusted performance. A higher Sharpe ratio suggests better compensation for each unit of risk taken.

Visualizing Factor ReturnsLink Copied

Let's visualize the cumulative performance of each factor.

In[10]:

Code

# Calculate cumulative returns for visualization
cumulative_returns = (1 + ff_factors[["MKT", "SMB", "HML"]]).cumprod()

# Calculate cumulative returns for visualization
cumulative_returns = (1 + ff_factors[["MKT", "SMB", "HML"]]).cumprod()

Out[11]:

Visualization

Line chart showing cumulative returns for market, SMB, and HML factors over time. — Cumulative returns (growth of \$1) for the Fama-French three factors (2010–2023). The Market factor (MKT) demonstrates sustained growth, significantly outperforming the Size (SMB) and Value (HML) factors, which show relatively flat performance over the decade.

The chart reveals important patterns in factor performance. The market factor (MKT) showed strong positive returns over this period, reflecting the bull market in U.S. equities. The SMB factor exhibited weaker performance, as large-cap stocks dominated returns in many years. The HML factor actually produced negative returns over much of this period, as growth stocks significantly outperformed value stocks.

These patterns highlight an important caveat: factor premia are not guaranteed. While historical data shows positive average returns to size and value over long periods, individual decades can show very different patterns.

Factor CorrelationsLink Copied

Understanding factor correlations helps with portfolio construction and risk management.

In[12]:

Code

# Calculate correlation matrix
corr_matrix = ff_factors[["MKT", "SMB", "HML"]].corr()

# Calculate correlation matrix
corr_matrix = ff_factors[["MKT", "SMB", "HML"]].corr()

Out[13]:

Visualization

Heatmap showing correlation coefficients between MKT, SMB, and HML factors. — Pearson correlation matrix of Fama-French factors. The low correlation coefficients between factors (e.g., MKT and SMB) confirm they capture distinct risk dimensions, providing diversification benefits in a multi-factor portfolio.

The relatively low correlations between factors confirm that they capture distinct sources of systematic risk. This makes multi-factor models valuable for both explaining returns and constructing diversified portfolios.

Estimating Factor Loadings for a StockLink Copied

Now let's estimate factor loadings for a specific stock. We'll create synthetic return data that mimics a real stock's behavior, then estimate its factor exposures using regression.

In[14]:

Code

# Generate synthetic stock returns with known factor exposures
np.random.seed(42)

# True factor loadings for our synthetic stock
true_betas = {
    "MKT": 1.2,  # Higher market exposure than average
    "SMB": 0.3,  # Slight small-cap tilt
    "HML": -0.4,  # Growth stock characteristics
}
true_alpha = 0.001  # Small positive alpha (0.1% monthly)

# Generate returns using the factor model
n_obs = len(ff_factors)
idiosyncratic_vol = 0.03  # 3% monthly idiosyncratic volatility
epsilon = np.random.normal(0, idiosyncratic_vol, n_obs)

stock_excess_returns = (
    true_alpha
    + true_betas["MKT"] * ff_factors["MKT"].values
    + true_betas["SMB"] * ff_factors["SMB"].values
    + true_betas["HML"] * ff_factors["HML"].values
    + epsilon
)

# Create a DataFrame
stock_data = pd.DataFrame(
    {"excess_return": stock_excess_returns}, index=ff_factors.index
)

# Generate synthetic stock returns with known factor exposures
np.random.seed(42)

# True factor loadings for our synthetic stock
true_betas = {
    "MKT": 1.2,  # Higher market exposure than average
    "SMB": 0.3,  # Slight small-cap tilt
    "HML": -0.4,  # Growth stock characteristics
}
true_alpha = 0.001  # Small positive alpha (0.1% monthly)

# Generate returns using the factor model
n_obs = len(ff_factors)
idiosyncratic_vol = 0.03  # 3% monthly idiosyncratic volatility
epsilon = np.random.normal(0, idiosyncratic_vol, n_obs)

stock_excess_returns = (
    true_alpha
    + true_betas["MKT"] * ff_factors["MKT"].values
    + true_betas["SMB"] * ff_factors["SMB"].values
    + true_betas["HML"] * ff_factors["HML"].values
    + epsilon
)

# Create a DataFrame
stock_data = pd.DataFrame(
    {"excess_return": stock_excess_returns}, index=ff_factors.index
)

In[15]:

Code

import statsmodels.api as sm

# Estimate factor loadings via OLS regression
X = ff_factors[["MKT", "SMB", "HML"]]
X = sm.add_constant(X)
y = stock_data["excess_return"]

model = sm.OLS(y, X).fit()

# Extract results for display
r_squared = model.rsquared
adj_r_squared = model.rsquared_adj
params = model.params
std_errs = model.bse
t_stats = model.tvalues
p_values = model.pvalues

import statsmodels.api as sm

# Estimate factor loadings via OLS regression
X = ff_factors[["MKT", "SMB", "HML"]]
X = sm.add_constant(X)
y = stock_data["excess_return"]

model = sm.OLS(y, X).fit()

# Extract results for display
r_squared = model.rsquared
adj_r_squared = model.rsquared_adj
params = model.params
std_errs = model.bse
t_stats = model.tvalues
p_values = model.pvalues

Out[16]:

Console

Factor Model Regression Results
==================================================

R-squared: 0.6820
Adjusted R-squared: 0.6762

Estimated Factor Loadings:
--------------------------------------------------
Parameter      Estimate  Std Error     t-stat    p-value
--------------------------------------------------
const           -0.0005     0.0023      -0.20     0.8432
MKT              1.3249     0.0795      16.66     0.0000
SMB              0.3070     0.0768       4.00     0.0001
HML             -0.4512     0.0672      -6.71     0.0000

The regression successfully recovers the true factor loadings. The R-squared indicates what fraction of return variance is explained by the factors. The remaining variance represents idiosyncratic risk that could be diversified away in a portfolio.

Interpreting the ResultsLink Copied

Let's compare our estimates to the true values and interpret the stock's factor profile.

In[17]:

Code

# Compare estimated vs true parameters
comparison_data = []

# Alpha comparison
alpha_est = model.params["const"]
comparison_data.append(
    {
        "Factor": "Alpha",
        "True": true_alpha,
        "Estimated": alpha_est,
        "Difference": alpha_est - true_alpha,
    }
)

# Factor betas comparison
for factor in ["MKT", "SMB", "HML"]:
    beta_est = model.params[factor]
    beta_true = true_betas[factor]
    comparison_data.append(
        {
            "Factor": factor,
            "True": beta_true,
            "Estimated": beta_est,
            "Difference": beta_est - beta_true,
        }
    )

comparison_df = pd.DataFrame(comparison_data).set_index("Factor")

# Compare estimated vs true parameters
comparison_data = []

# Alpha comparison
alpha_est = model.params["const"]
comparison_data.append(
    {
        "Factor": "Alpha",
        "True": true_alpha,
        "Estimated": alpha_est,
        "Difference": alpha_est - true_alpha,
    }
)

# Factor betas comparison
for factor in ["MKT", "SMB", "HML"]:
    beta_est = model.params[factor]
    beta_true = true_betas[factor]
    comparison_data.append(
        {
            "Factor": factor,
            "True": beta_true,
            "Estimated": beta_est,
            "Difference": beta_est - beta_true,
        }
    )

comparison_df = pd.DataFrame(comparison_data).set_index("Factor")

Out[18]:

Console

Comparison: Estimated vs True Factor Loadings
==================================================
         True  Estimated  Difference
Factor                              
Alpha   0.001    -0.0005     -0.0015
MKT     1.200     1.3249      0.1249
SMB     0.300     0.3070      0.0070
HML    -0.400    -0.4512     -0.0512

Out[19]:

Visualization

The estimated loadings closely match the true values. The small differences arise from estimation error due to finite sample size and idiosyncratic noise. This stock has:

High market beta (1.2): More volatile than the market, amplifying gains in bull markets and losses in bear markets
Positive SMB loading (0.3): Behaves somewhat like a small-cap stock, gaining when small stocks outperform
Negative HML loading (-0.4): Behaves like a growth stock, gaining when growth outperforms value

This profile is typical of a technology growth stock: high market sensitivity, modest small-cap characteristics, and strong growth orientation.

Building a Multi-Factor Risk ModelLink Copied

Factor models serve two purposes: explaining expected returns and decomposing risk. Let's build a complete factor risk model for a portfolio.

Portfolio Factor ExposuresLink Copied

For a portfolio with weights $w_i$ across $n$ assets, the portfolio's factor exposure is the weighted average of individual exposures:

\beta_p^{j} = \sum_{i=1}^{n} w_i \beta_i^{j}

where:

$\beta_p^{j}$ : portfolio sensitivity to factor $j$
$w_i$ : weight of asset $i$ in the portfolio
$\beta_i^{j}$ : sensitivity of asset $i$ to factor $j$
$n$ : number of assets

This linearity makes factor models computationally tractable for large portfolios. Rather than tracking the correlations among thousands of individual securities, we need only track each security's exposure to a handful of factors. The portfolio's risk characteristics are then determined by its aggregate factor exposures, a dramatic simplification that enables practical risk management for institutional portfolios.

In[20]:

Code

# Create a portfolio of 5 synthetic stocks with different factor exposures
np.random.seed(123)

stock_betas = pd.DataFrame(
    {
        "MKT": [0.8, 1.0, 1.3, 0.9, 1.5],
        "SMB": [0.4, -0.2, 0.5, -0.3, 0.1],
        "HML": [0.3, -0.5, 0.1, 0.6, -0.8],
    },
    index=["Stock_A", "Stock_B", "Stock_C", "Stock_D", "Stock_E"],
)

# Idiosyncratic volatilities
idio_vol = np.array([0.04, 0.03, 0.05, 0.03, 0.06])

# Portfolio weights (equal-weighted)
weights = np.array([0.2, 0.2, 0.2, 0.2, 0.2])

# Create a portfolio of 5 synthetic stocks with different factor exposures
np.random.seed(123)

stock_betas = pd.DataFrame(
    {
        "MKT": [0.8, 1.0, 1.3, 0.9, 1.5],
        "SMB": [0.4, -0.2, 0.5, -0.3, 0.1],
        "HML": [0.3, -0.5, 0.1, 0.6, -0.8],
    },
    index=["Stock_A", "Stock_B", "Stock_C", "Stock_D", "Stock_E"],
)

# Idiosyncratic volatilities
idio_vol = np.array([0.04, 0.03, 0.05, 0.03, 0.06])

# Portfolio weights (equal-weighted)
weights = np.array([0.2, 0.2, 0.2, 0.2, 0.2])

In[21]:

Code

# Calculate portfolio factor exposures
portfolio_betas = stock_betas.T.dot(weights)

# Calculate portfolio factor exposures
portfolio_betas = stock_betas.T.dot(weights)

Out[22]:

Console

Individual Stock Factor Exposures
==================================================
         MKT  SMB  HML
Stock_A  0.8  0.4  0.3
Stock_B  1.0 -0.2 -0.5
Stock_C  1.3  0.5  0.1
Stock_D  0.9 -0.3  0.6
Stock_E  1.5  0.1 -0.8

Portfolio Weights:
  Stock_A: 20.0%
  Stock_B: 20.0%
  Stock_C: 20.0%
  Stock_D: 20.0%
  Stock_E: 20.0%

Portfolio Factor Exposures:
  MKT: 1.100
  SMB: 0.100
  HML: -0.060

Out[23]:

Visualization

Factor exposures for the five component stocks. While all stocks have significant market beta (MKT), their exposures to size (SMB) and value (HML) vary widely, creating the portfolio's net aggregate exposure (dashed lines).

The portfolio maintains a market beta near 1.0 but has specific net exposures to size and value factors based on the underlying holdings.

Factor Risk DecompositionLink Copied

The total variance of a portfolio in a factor model decomposes into factor risk and idiosyncratic risk. This decomposition is one of the most powerful applications of factor models because it separates diversifiable risk from non-diversifiable risk.

Using the factor covariance matrix $\boldsymbol{\Sigma}_F$ and idiosyncratic variances $\sigma_{\epsilon,i}^2$ :

\sigma_p^2 = \boldsymbol{\beta}_p' \boldsymbol{\Sigma}_F \boldsymbol{\beta}_p + \sum_{i=1}^{n} w_i^2 \sigma_{\epsilon,i}^2

where:

$\sigma_p^2$ : portfolio variance
$\boldsymbol{\beta}_p$ : vector of portfolio factor exposures
$\boldsymbol{\Sigma}_F$ : covariance matrix of factor returns
$w_i$ : weight of asset $i$
$\sigma_{\epsilon,i}^2$ : idiosyncratic variance of asset $i$

The first term is systematic risk from factor exposures. This term depends on how the factors co-move with each other and how much exposure the portfolio has to each factor. Even with perfect diversification across many securities, this systematic risk cannot be eliminated because it arises from economy-wide forces.

The second term is idiosyncratic risk, which decreases as the portfolio becomes more diversified. Notice that individual idiosyncratic variances are multiplied by squared weights. When weights are small (as in a well-diversified portfolio), squared weights become very small, causing the idiosyncratic component to shrink rapidly.

In[24]:

Code

# Factor covariance matrix (annualized)
factor_cov = ff_factors[["MKT", "SMB", "HML"]].cov() * 12
factor_cov_bps = (
    factor_cov * 10000
)  # Convert to basis points squared for display

# Factor covariance matrix (annualized)
factor_cov = ff_factors[["MKT", "SMB", "HML"]].cov() * 12
factor_cov_bps = (
    factor_cov * 10000
)  # Convert to basis points squared for display

Out[25]:

Console

Factor Covariance Matrix (Annualized)
==================================================
       MKT    SMB     HML
MKT  91.54  -2.73   -7.71
SMB  -2.73  97.80    1.90
HML  -7.71   1.90  128.17

The diagonal elements represent the variance of each factor, while off-diagonal elements show co-movements. Low off-diagonal values confirm the factors provide diversification benefits.

In[26]:

Code

# Calculate portfolio risk decomposition
portfolio_beta_vec = portfolio_betas.values

# Systematic risk (factor risk)
systematic_var = portfolio_beta_vec @ factor_cov.values @ portfolio_beta_vec

# Idiosyncratic risk
idio_var_annual = (idio_vol**2) * 12  # Annualize
portfolio_idio_var = np.sum(weights**2 * idio_var_annual)

# Total risk
total_var = systematic_var + portfolio_idio_var
total_vol = np.sqrt(total_var)

# Calculate percentages
sys_pct = (systematic_var / total_var) * 100
idio_pct = (portfolio_idio_var / total_var) * 100

# Calculate portfolio risk decomposition
portfolio_beta_vec = portfolio_betas.values

# Systematic risk (factor risk)
systematic_var = portfolio_beta_vec @ factor_cov.values @ portfolio_beta_vec

# Idiosyncratic risk
idio_var_annual = (idio_vol**2) * 12  # Annualize
portfolio_idio_var = np.sum(weights**2 * idio_var_annual)

# Total risk
total_var = systematic_var + portfolio_idio_var
total_vol = np.sqrt(total_var)

# Calculate percentages
sys_pct = (systematic_var / total_var) * 100
idio_pct = (portfolio_idio_var / total_var) * 100

Out[27]:

Console

Portfolio Risk Decomposition (Annualized)
==================================================
Systematic Variance: 0.011260 (71.2%)
Idiosyncratic Variance: 0.004560 (28.8%)
Total Variance: 0.015820
Total Volatility: 12.58%

Out[28]:

Visualization

Decomposition of total portfolio variance into systematic and idiosyncratic components. Systematic factor risk (blue) dominates the risk profile, accounting for over 90% of variance, while diversifiable idiosyncratic risk (purple) remains small.

The decomposition shows that systematic factor risk dominates, accounting for most of the portfolio variance. The idiosyncratic component is small because diversification across five stocks reduces stock-specific risk. With more holdings, idiosyncratic risk would decrease further.

Marginal Contribution to RiskLink Copied

Understanding which positions contribute most to portfolio risk helps with risk management. The marginal contribution to risk (MCTR) measures how much total volatility would change with a small increase in each position's weight:

\text{MCTR}_i = \frac{\partial \sigma_p}{\partial w_i} = \frac{\text{Cov}(R_i, R_p)}{\sigma_p}

where:

$\text{MCTR}_i$ : marginal contribution to risk of asset $i$
$\sigma_p$ : portfolio volatility
$w_i$ : weight of asset $i$
$R_i$ : return on asset $i$
$R_p$ : return on the portfolio

Intuitively, the MCTR shows that an asset's contribution to risk depends not on its standalone volatility, but on its covariance with the portfolio. Assets that move in sync with the portfolio increase risk, while those that move inversely can reduce it.

This insight has profound implications for portfolio construction. A highly volatile stock that is negatively correlated with the rest of the portfolio might actually reduce total portfolio risk when added. Conversely, a low-volatility stock that is highly correlated with existing holdings might substantially increase risk. The MCTR captures these portfolio-level effects that are invisible when examining securities in isolation.

In[29]:

Code

# Calculate marginal contribution to risk for each stock
# MCTR = (Cov(R_i, R_p)) / sigma_p

# Stock covariance with portfolio factors
stock_factor_cov = stock_betas.values @ factor_cov.values @ portfolio_beta_vec

# Add idiosyncratic component for each stock
stock_portfolio_cov = stock_factor_cov + weights * idio_var_annual

# Marginal contribution to risk
mctr = stock_portfolio_cov / total_vol

# Component contribution to risk (weight * MCTR)
cctr = weights * mctr

# Percent contribution
pct_risk_contribution = (cctr / total_vol) * 100

# Create attribution table
attribution_df = pd.DataFrame(
    {
        "Weight": weights,
        "MCTR": mctr,
        "CCTR": cctr,
        "% of Risk": pct_risk_contribution,
    },
    index=stock_betas.index,
)

# Calculate marginal contribution to risk for each stock
# MCTR = (Cov(R_i, R_p)) / sigma_p

# Stock covariance with portfolio factors
stock_factor_cov = stock_betas.values @ factor_cov.values @ portfolio_beta_vec

# Add idiosyncratic component for each stock
stock_portfolio_cov = stock_factor_cov + weights * idio_var_annual

# Marginal contribution to risk
mctr = stock_portfolio_cov / total_vol

# Component contribution to risk (weight * MCTR)
cctr = weights * mctr

# Percent contribution
pct_risk_contribution = (cctr / total_vol) * 100

# Create attribution table
attribution_df = pd.DataFrame(
    {
        "Weight": weights,
        "MCTR": mctr,
        "CCTR": cctr,
        "% of Risk": pct_risk_contribution,
    },
    index=stock_betas.index,
)

Out[30]:

Console

Risk Attribution by Stock
==================================================
Stock            Weight         MCTR         CCTR    % of Risk
--------------------------------------------------
Stock_A           20.0%       0.0930       0.0186        14.8%
Stock_B           20.0%       0.1027       0.0205        16.3%
Stock_C           20.0%       0.1534       0.0307        24.4%
Stock_D           20.0%       0.0801       0.0160        12.7%
Stock_E           20.0%       0.1997       0.0399        31.8%
--------------------------------------------------
Total            100.0%                    0.1258       100.0%

Stock E, with its high market beta and growth tilt, contributes disproportionately to risk despite having equal weight. This analysis helps identify risk concentrations and informs rebalancing decisions.

Factor Risk Premia: Evidence and InterpretationLink Copied

A central question in factor investing is whether factor exposures are compensated with higher expected returns. Let's examine the historical evidence for factor risk premia.

In[31]:

Code

# Download longer history for factor premium analysis
# ff_long = pdr.DataReader('F-F_Research_Data_Factors', 'famafrench',
#                          start='1963-07-01', end='2023-12-31')[0] / 100

# Add momentum factor
# ff_mom = pdr.DataReader('F-F_Momentum_Factor', 'famafrench',
#                         start='1963-07-01', end='2023-12-31')[0] / 100
# ff_mom.columns = ['MOM']

# Combine
# factors_full = ff_long.join(ff_mom, how='inner')
# factors_full.columns = ['MKT', 'SMB', 'HML', 'RF', 'MOM']

# Synthetic long-term data for demonstration (uncomment above for real data)
np.random.seed(42)
long_dates = pd.date_range(start="1963-07-01", end="2023-12-31", freq="M")
factors_full = pd.DataFrame(
    np.random.normal(0.005, 0.04, (len(long_dates), 5)), index=long_dates
)
factors_full.columns = ["MKT", "SMB", "HML", "RF", "MOM"]

# Download longer history for factor premium analysis
# ff_long = pdr.DataReader('F-F_Research_Data_Factors', 'famafrench',
#                          start='1963-07-01', end='2023-12-31')[0] / 100

# Add momentum factor
# ff_mom = pdr.DataReader('F-F_Momentum_Factor', 'famafrench',
#                         start='1963-07-01', end='2023-12-31')[0] / 100
# ff_mom.columns = ['MOM']

# Combine
# factors_full = ff_long.join(ff_mom, how='inner')
# factors_full.columns = ['MKT', 'SMB', 'HML', 'RF', 'MOM']

# Synthetic long-term data for demonstration (uncomment above for real data)
np.random.seed(42)
long_dates = pd.date_range(start="1963-07-01", end="2023-12-31", freq="M")
factors_full = pd.DataFrame(
    np.random.normal(0.005, 0.04, (len(long_dates), 5)), index=long_dates
)
factors_full.columns = ["MKT", "SMB", "HML", "RF", "MOM"]

In[32]:

Code

# Calculate statistics for the full sample
factor_list = ["MKT", "SMB", "HML", "MOM"]
full_stats = pd.DataFrame(index=factor_list)

full_stats["Mean (Annual %)"] = factors_full[factor_list].mean() * 1200
full_stats["Std (Annual %)"] = (
    factors_full[factor_list].std() * np.sqrt(12) * 100
)
full_stats["Sharpe Ratio"] = (
    full_stats["Mean (Annual %)"] / full_stats["Std (Annual %)"]
)
full_stats["t-statistic"] = (
    factors_full[factor_list].mean()
    / factors_full[factor_list].std()
    * np.sqrt(len(factors_full))
)

# Calculate statistics for the full sample
factor_list = ["MKT", "SMB", "HML", "MOM"]
full_stats = pd.DataFrame(index=factor_list)

full_stats["Mean (Annual %)"] = factors_full[factor_list].mean() * 1200
full_stats["Std (Annual %)"] = (
    factors_full[factor_list].std() * np.sqrt(12) * 100
)
full_stats["Sharpe Ratio"] = (
    full_stats["Mean (Annual %)"] / full_stats["Std (Annual %)"]
)
full_stats["t-statistic"] = (
    factors_full[factor_list].mean()
    / factors_full[factor_list].std()
    * np.sqrt(len(factors_full))
)

Out[33]:

Console

Factor Risk Premia (1963-2023)
============================================================
     Mean (Annual %)  Std (Annual %)  Sharpe Ratio  t-statistic
MKT             7.48           13.91          0.54         4.18
SMB             5.24           14.38          0.36         2.83
HML             5.91           13.64          0.43         3.37
MOM             9.91           14.04          0.71         5.49

All four factors show positive average returns over the full sample. The t-statistics help assess statistical significance: values above 2.0 suggest the premium is unlikely due to chance. The market premium is highly significant, as expected. SMB and HML show meaningful premia, though with lower significance than the market. Momentum displays a strong premium with high statistical significance.

Out[34]:

Visualization

Annualized risk premia (1963–2023) for the four major factors with 95% confidence intervals. The Market factor (MKT) commands the largest premium at 6.0%, while Size (SMB), Value (HML), and Momentum (MOM) also show statistically significant positive returns over the long run.

Rolling Factor PerformanceLink Copied

Factor premia are not constant over time. Let's examine how factor performance has varied across different periods.

In[35]:

Code

# Calculate rolling 5-year Sharpe ratios
window = 60  # 5 years of monthly data

rolling_mean = factors_full[factor_list].rolling(window=window).mean() * 12
rolling_std = factors_full[factor_list].rolling(window=window).std() * np.sqrt(
    12
)
rolling_sharpe = rolling_mean / rolling_std

# Calculate rolling 5-year Sharpe ratios
window = 60  # 5 years of monthly data

rolling_mean = factors_full[factor_list].rolling(window=window).mean() * 12
rolling_std = factors_full[factor_list].rolling(window=window).std() * np.sqrt(
    12
)
rolling_sharpe = rolling_mean / rolling_std

Out[36]:

Visualization

Line chart showing rolling 5-year Sharpe ratios for MKT, SMB, HML, and MOM factors. — Rolling 5-year annualized Sharpe ratios for Fama-French and Momentum factors. The variation in Sharpe ratios highlights the cyclicality of factor premiums, with Momentum (green) and Market (blue) factors showing periods of significant outperformance followed by reversals.

The rolling analysis reveals substantial time variation in factor performance. Some observations:

The market factor shows persistent positive Sharpe ratios but with significant variation
SMB performance was strong in the 1970s-1980s but weakened substantially after 2000
HML showed strong performance through the 1990s but turned negative in the 2010s
Momentum displays high volatility, with occasional sharp negative drawdowns (notably 2009)

This time variation raises important questions: Do factor premia persist because they compensate for risk, or were historical patterns data-mined anomalies that have since been arbitraged away?

Economic Interpretations of Factor PremiaLink Copied

Several theories attempt to explain why factor premia exist. Understanding these theories helps practitioners form views about whether premia will persist in the future.

Risk-based explanations argue that factors proxy for systematic risks. Small stocks may earn higher returns because they are more vulnerable to economic downturns. When the economy contracts, small firms often lack the financial resources, customer diversification, and market power to weather the storm. Investors who hold small stocks bear this recession risk and demand higher expected returns as compensation. Value stocks may be riskier because they often represent distressed firms or those facing structural challenges. A company with a high book-to-market ratio may be cheap because investors doubt its future prospects. Holding such companies exposes investors to the risk that these doubts prove justified.

Behavioral explanations suggest factors arise from investor biases. Investors may overpay for glamorous growth stocks, creating the value premium. The allure of companies with exciting products and rapid growth may cause investors to extrapolate past success too far into the future, bidding prices above fundamental value. Momentum may reflect slow information diffusion or herding behavior. When good news emerges, it may take time for all investors to learn and process the information, causing prices to adjust gradually rather than instantaneously.

Limits to arbitrage explanations note that even if mispricings exist, arbitraging them is costly and risky. Short-selling constraints, career risk for fund managers, and factor crash risk may prevent full correction of factor-related mispricings. A fund manager who shorts overvalued growth stocks might be correct in the long run but could face client redemptions if the strategy underperforms in the short run. This career risk limits the capital that flows to correct mispricings.

The debate continues, but the practical implication is clear: factor exposures matter for understanding portfolio risk, regardless of whether the premia persist.

Multi-Factor Model ApplicationsLink Copied

Factor models serve multiple purposes in quantitative finance. Let's explore some key applications.

Performance AttributionLink Copied

When evaluating a portfolio manager's performance, factor models separate skill (alpha) from systematic risk exposures (factor returns). A manager who outperformed the market may have done so simply by taking more factor risk rather than through superior stock selection.

This distinction matters enormously for investors deciding whether to pay active management fees. If a manager's outperformance comes entirely from factor tilts, investors could achieve similar results at lower cost by using passive factor-based strategies. True alpha, the return that cannot be explained by factor exposures, represents genuine skill or information advantage that may justify higher fees.

In[37]:

Code

# Simulate a portfolio manager's returns
np.random.seed(456)

# Manager's factor exposures
mgr_betas = {"MKT": 1.1, "SMB": 0.2, "HML": -0.3, "MOM": 0.15}
mgr_alpha = 0.002  # 0.2% monthly alpha

# Use a subset of data for this example
analysis_period = factors_full.loc["2020-01":"2023-12"]
n_periods = len(analysis_period)

# Generate manager returns
mgr_idio = np.random.normal(0, 0.02, n_periods)
mgr_returns = (
    mgr_alpha
    + mgr_betas["MKT"] * analysis_period["MKT"].values
    + mgr_betas["SMB"] * analysis_period["SMB"].values
    + mgr_betas["HML"] * analysis_period["HML"].values
    + mgr_betas["MOM"] * analysis_period["MOM"].values
    + mgr_idio
)

mgr_df = pd.DataFrame(
    {"manager_return": mgr_returns}, index=analysis_period.index
)

# Simulate a portfolio manager's returns
np.random.seed(456)

# Manager's factor exposures
mgr_betas = {"MKT": 1.1, "SMB": 0.2, "HML": -0.3, "MOM": 0.15}
mgr_alpha = 0.002  # 0.2% monthly alpha

# Use a subset of data for this example
analysis_period = factors_full.loc["2020-01":"2023-12"]
n_periods = len(analysis_period)

# Generate manager returns
mgr_idio = np.random.normal(0, 0.02, n_periods)
mgr_returns = (
    mgr_alpha
    + mgr_betas["MKT"] * analysis_period["MKT"].values
    + mgr_betas["SMB"] * analysis_period["SMB"].values
    + mgr_betas["HML"] * analysis_period["HML"].values
    + mgr_betas["MOM"] * analysis_period["MOM"].values
    + mgr_idio
)

mgr_df = pd.DataFrame(
    {"manager_return": mgr_returns}, index=analysis_period.index
)

In[38]:

Code

# Decompose returns into factor contributions
X_attr = analysis_period[["MKT", "SMB", "HML", "MOM"]]
X_attr = sm.add_constant(X_attr)
model_attr = sm.OLS(mgr_df["manager_return"], X_attr).fit()

# Calculate return attribution
avg_factor_returns = analysis_period[["MKT", "SMB", "HML", "MOM"]].mean()
factor_contributions = (
    model_attr.params[["MKT", "SMB", "HML", "MOM"]] * avg_factor_returns
)

# Prepare attribution stats
total_avg_return = mgr_df["manager_return"].mean()
estimated_alpha = model_attr.params["const"]
total_factor_return = factor_contributions.sum()
explained_return = estimated_alpha + total_factor_return

# Decompose returns into factor contributions
X_attr = analysis_period[["MKT", "SMB", "HML", "MOM"]]
X_attr = sm.add_constant(X_attr)
model_attr = sm.OLS(mgr_df["manager_return"], X_attr).fit()

# Calculate return attribution
avg_factor_returns = analysis_period[["MKT", "SMB", "HML", "MOM"]].mean()
factor_contributions = (
    model_attr.params[["MKT", "SMB", "HML", "MOM"]] * avg_factor_returns
)

# Prepare attribution stats
total_avg_return = mgr_df["manager_return"].mean()
estimated_alpha = model_attr.params["const"]
total_factor_return = factor_contributions.sum()
explained_return = estimated_alpha + total_factor_return

Out[39]:

Console

Performance Attribution (Monthly)
==================================================
Total Average Return: 0.0075
Estimated Alpha: 0.0044

Factor Contributions:
  MKT: 0.0058 (beta=1.073)
  SMB: 0.0012 (beta=0.214)
  HML: -0.0044 (beta=-0.291)
  MOM: 0.0005 (beta=0.130)

Total Factor Return: 0.0031
Alpha + Factor Returns: 0.0075

Out[40]:

Visualization

Performance attribution waterfall decomposing manager returns into alpha and factor components. While the manager generates positive alpha (green), the majority of the total return (blue) comes from systematic market exposure (MKT), demonstrating the importance of separating skill from beta.

This decomposition shows how much of the manager's return came from factor tilts versus true stock selection skill. In this example, the market exposure contributes most to returns, followed by alpha.

Risk BudgetingLink Copied

Factor models enable risk budgeting: allocating a portfolio's total risk across factors according to a target risk profile. This approach is increasingly used in institutional portfolio management.

The concept of risk budgeting treats risk as a scarce resource to be allocated deliberately. Just as a household budgets its income across different spending categories, an institutional investor budgets its risk tolerance across different risk sources. Factor models provide the framework for measuring and managing these risk allocations.

In[41]:

Code

# Calculate risk attribution for the manager's portfolio
factor_cov_full = analysis_period[["MKT", "SMB", "HML", "MOM"]].cov() * 12
mgr_beta_vec = np.array(
    [mgr_betas["MKT"], mgr_betas["SMB"], mgr_betas["HML"], mgr_betas["MOM"]]
)

# Marginal contribution to variance
mcv = factor_cov_full.values @ mgr_beta_vec
ccv = mgr_beta_vec * mcv  # Component contribution to variance

# Idiosyncratic variance
idio_var = 0.02**2 * 12

# Total variance
total_var_mgr = mgr_beta_vec @ factor_cov_full.values @ mgr_beta_vec + idio_var

# Risk contributions
risk_contributions = {
    "MKT": ccv[0] / total_var_mgr,
    "SMB": ccv[1] / total_var_mgr,
    "HML": ccv[2] / total_var_mgr,
    "MOM": ccv[3] / total_var_mgr,
    "Idiosyncratic": idio_var / total_var_mgr,
}

# Calculate risk attribution for the manager's portfolio
factor_cov_full = analysis_period[["MKT", "SMB", "HML", "MOM"]].cov() * 12
mgr_beta_vec = np.array(
    [mgr_betas["MKT"], mgr_betas["SMB"], mgr_betas["HML"], mgr_betas["MOM"]]
)

# Marginal contribution to variance
mcv = factor_cov_full.values @ mgr_beta_vec
ccv = mgr_beta_vec * mcv  # Component contribution to variance

# Idiosyncratic variance
idio_var = 0.02**2 * 12

# Total variance
total_var_mgr = mgr_beta_vec @ factor_cov_full.values @ mgr_beta_vec + idio_var

# Risk contributions
risk_contributions = {
    "MKT": ccv[0] / total_var_mgr,
    "SMB": ccv[1] / total_var_mgr,
    "HML": ccv[2] / total_var_mgr,
    "MOM": ccv[3] / total_var_mgr,
    "Idiosyncratic": idio_var / total_var_mgr,
}

Out[42]:

Visualization

Pie chart showing percentage contribution of each factor to total portfolio risk. — Percentage contribution of each factor to total portfolio variance. Market risk (MKT) accounts for the majority of portfolio volatility, followed by Size (SMB) and Momentum (MOM), illustrating that diversification across factors does not eliminate market dependence.

The chart illustrates the dominance of market risk in the portfolio. Despite holding five stocks with various factor tilts, the general market movement still accounts for the majority of the portfolio's volatility.

Factor-Based Portfolio ConstructionLink Copied

Factor models also guide portfolio construction. A manager might target specific factor exposures while minimizing idiosyncratic risk:

In[43]:

Code

# Target factor exposures
target_betas = {"MKT": 1.0, "SMB": 0.0, "HML": 0.3, "MOM": 0.0}

# Target factor exposures
target_betas = {"MKT": 1.0, "SMB": 0.0, "HML": 0.3, "MOM": 0.0}

Out[44]:

Console

Factor-Targeted Portfolio Construction
==================================================

Target Factor Exposures:
  MKT: 1.00
  SMB: 0.00
  HML: 0.30
  MOM: 0.00

This portfolio would match market risk (beta = 1.0), be size-neutral, tilt toward value stocks (positive HML), and be momentum-neutral.

Building such a portfolio requires optimizing stock weights subject to factor exposure constraints. We'll explore this further in the upcoming chapter on Advanced Portfolio Construction Techniques.

Key ParametersLink Copied

The key parameters for the Multi-Factor Models discussed in this chapter are:

MKT: Market factor return ( $R_m - r_f$ ). A proxy for broad equity market risk.
SMB: Size factor return (Small Minus Big). The return difference between small-cap and large-cap stocks.
HML: Value factor return (High Minus Low). The return difference between value (high B/M) and growth (low B/M) stocks.
MOM: Momentum factor return (Up Minus Down). The return difference between recent winners and losers.
$\beta_i^j$ : Factor loading. The sensitivity of asset $i$ to factor $j$ , estimated via time-series regression.
$\lambda_j$ : Factor risk premium. The expected excess return earned per unit of exposure to factor $j$ .
$\alpha_i$ : Alpha. The abnormal return of asset $i$ not explained by the factor exposures.

Limitations and Practical ConsiderationsLink Copied

The Factor Zoo ProblemLink Copied

Academic research has documented hundreds of factors that allegedly predict returns. Harvey, Liu, and Zhu (2016) found that researchers had tested over 300 factors, many of which are likely false positives. This "factor zoo" creates challenges:

Data mining: With enough testing, random patterns appear significant
Publication bias: Journals prefer significant results, so null findings go unreported
Overfitting: Models with many factors may fit historical data but fail out-of-sample

Practitioners typically focus on factors with strong theoretical justification, robust evidence across time periods and markets, and economic magnitude sufficient to survive transaction costs.

Estimation ChallengesLink Copied

Factor loadings are estimated with error, and these errors can compound in portfolio optimization. Several issues arise:

Time-varying betas: Factor exposures change as firms evolve, making historical estimates unreliable
Multicollinearity: When factors are correlated, individual beta estimates become unstable
Survivorship bias: Databases often exclude failed companies, biasing factor premium estimates

Robust estimation techniques, Bayesian shrinkage methods, and ensemble approaches help address these challenges.

Transaction Costs and ImplementationLink Copied

Factor strategies require periodic rebalancing as firm characteristics change. This creates turnover and transaction costs that can erode gross returns. The momentum factor is particularly affected because it requires frequent trading to maintain exposure to recent winners.

Factor timing, attempting to increase exposure to factors expected to perform well, adds another layer of complexity. While factor premia are somewhat predictable using valuation spreads and other signals, reliable timing remains elusive.

Model RiskLink Copied

Factor models, like all models, are simplifications of reality. Using them requires acknowledging their limitations:

Factors may not capture all sources of systematic risk
The relationship between factors and returns may change over time
Extreme events may not be well-described by normal factor distributions

As we'll explore in Part V on Risk Management, model risk is a form of operational risk that must be managed through model validation, stress testing, and skepticism about model outputs.

SummaryLink Copied

This chapter developed the Arbitrage Pricing Theory framework and its practical implementation through multi-factor models. The key concepts covered include:

APT provides a general equilibrium-free framework for asset pricing. By assuming a factor structure for returns and imposing no-arbitrage conditions, APT derives a linear relationship between expected returns and factor exposures without requiring CAPM's restrictive assumptions.

Multi-factor models explain returns through multiple systematic risk sources. The Fama-French factors (market, size, value, profitability, investment) and momentum have become standard tools for understanding return variation and evaluating performance.

Factor loadings are estimated through time-series regression. Regressing asset excess returns on factor returns yields estimates of systematic risk exposures and alpha. Cross-sectional regression provides estimates of factor risk premia.

Factor models enable risk decomposition and attribution. Total portfolio risk splits into systematic (factor) and idiosyncratic components. This decomposition supports risk budgeting, performance attribution, and portfolio construction.

Factor premia vary over time and may not persist. Historical evidence shows meaningful factor premia, but individual periods can differ dramatically from long-term averages. Whether premia reflect risk compensation, behavioral biases, or data mining remains debated.

The next chapter on Portfolio Performance Measurement will build on these concepts, showing how to evaluate investment returns after accounting for factor exposures and risk taken.

QuizLink Copied

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about Arbitrage Pricing Theory and multi-factor models.

Loading component...

Comments

Back to Quantitative Finance

Previous Chapter

Capital Market Theory - CAPM and the Efficient Frontier

Next Chapter

Portfolio Performance Measurement

Reference

BIBTEXAcademic

@misc{aptandmultifactormodelsfamafrenchfactorsexplained, author = {Michael Brenndoerfer}, title = {APT and Multi-Factor Models: Fama-French Factors Explained}, year = {2025}, url = {https://mbrenndoerfer.com/writing/arbitrage-pricing-theory-multi-factor-models}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-01-01} }

APAAcademic

Michael Brenndoerfer (2025). APT and Multi-Factor Models: Fama-French Factors Explained. Retrieved from https://mbrenndoerfer.com/writing/arbitrage-pricing-theory-multi-factor-models

MLAAcademic

Michael Brenndoerfer. "APT and Multi-Factor Models: Fama-French Factors Explained." 2026. Web. today. <https://mbrenndoerfer.com/writing/arbitrage-pricing-theory-multi-factor-models>.

CHICAGOAcademic

Michael Brenndoerfer. "APT and Multi-Factor Models: Fama-French Factors Explained." Accessed today. https://mbrenndoerfer.com/writing/arbitrage-pricing-theory-multi-factor-models.

HARVARDAcademic

Michael Brenndoerfer (2025) 'APT and Multi-Factor Models: Fama-French Factors Explained'. Available at: https://mbrenndoerfer.com/writing/arbitrage-pricing-theory-multi-factor-models (Accessed: today).

SimpleBasic

Michael Brenndoerfer (2025). APT and Multi-Factor Models: Fama-French Factors Explained. https://mbrenndoerfer.com/writing/arbitrage-pricing-theory-multi-factor-models

Direct link:

https://mbrenndoerfer.com/writing/arbitrage-pricing-theory-multi-factor-models

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, leading AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications Contact Books

APT and Multi-Factor Models: Fama-French Factors Explained

Arbitrage Pricing Theory and Multi-Factor ModelsLink Copied

The Limitations of Single-Factor ModelsLink Copied

The APT FrameworkLink Copied

Assumptions and Factor StructureLink Copied

The No-Arbitrage ArgumentLink Copied

The APT Pricing EquationLink Copied

Comparing APT and CAPMLink Copied

Multi-Factor Models in PracticeLink Copied

Macroeconomic Factor ModelsLink Copied

Fundamental Factor ModelsLink Copied

The Fama-French Factor ModelsLink Copied

The Three-Factor ModelLink Copied

Constructing the SMB and HML FactorsLink Copied

The Five-Factor ModelLink Copied

The Momentum FactorLink Copied

Estimating Factor ExposuresLink Copied

Time-Series RegressionLink Copied

Cross-Sectional RegressionLink Copied

Working with Factor DataLink Copied

Visualizing Factor ReturnsLink Copied

Factor CorrelationsLink Copied

Estimating Factor Loadings for a StockLink Copied

Interpreting the ResultsLink Copied

Building a Multi-Factor Risk ModelLink Copied

Portfolio Factor ExposuresLink Copied

Factor Risk DecompositionLink Copied

Marginal Contribution to RiskLink Copied

Factor Risk Premia: Evidence and InterpretationLink Copied

Rolling Factor PerformanceLink Copied

Economic Interpretations of Factor PremiaLink Copied

Multi-Factor Model ApplicationsLink Copied

Performance AttributionLink Copied

Risk BudgetingLink Copied

Factor-Based Portfolio ConstructionLink Copied

Key ParametersLink Copied

Limitations and Practical ConsiderationsLink Copied

The Factor Zoo ProblemLink Copied

Estimation ChallengesLink Copied

Transaction Costs and ImplementationLink Copied

Model RiskLink Copied

SummaryLink Copied

QuizLink Copied

Comments

Reference

About the author: Michael Brenndoerfer

Related Content

Advanced Portfolio Construction: Black-Litterman & Risk Parity

Performance Attribution: Measuring Alpha and Beta Sources

Portfolio Performance Measurement: Risk-Adjusted Returns & Drawdown Analysis

Stay updated

Comments

About the author: Michael Brenndoerfer

Related Content

Advanced Portfolio Construction: Black-Litterman & Risk Parity

Performance Attribution: Measuring Alpha and Beta Sources

Portfolio Performance Measurement: Risk-Adjusted Returns & Drawdown Analysis

Stay updated