N-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting

Michael Brenndoerfer

Data, Analytics & AI Software Engineering Machine Learning Data Science Handbook

Complete guide to N-BEATS, an interpretable deep learning architecture for time series forecasting. Learn how N-BEATS decomposes time series into trend and seasonal components, understand the mathematical foundation, and implement it in PyTorch.

Part of Data Science Handbook

This article is part of the free-to-read Data Science Handbook

View full handbook

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

N-BEATS: Neural Basis Expansion Analysis for Time Series

N-BEATS (Neural Basis Expansion Analysis for Time Series) is a deep learning architecture specifically designed for time series forecasting that combines the interpretability of traditional statistical methods with the flexibility of neural networks. Unlike many black-box deep learning approaches, N-BEATS provides clear insights into how it makes predictions by decomposing forecasts into interpretable components like trend and seasonality.

The architecture was introduced to address a fundamental challenge in time series forecasting: creating models that are both highly accurate and interpretable. Traditional statistical methods like ARIMA or exponential smoothing are interpretable but may lack the flexibility to capture complex patterns, while deep learning models can capture intricate relationships but often operate as black boxes. N-BEATS bridges this gap by using a specialized neural network architecture that explicitly models trend and seasonal components.

N-BEATS operates by learning to decompose time series into interpretable basis functions, similar to how Fourier analysis breaks down signals into sine and cosine components. The model learns to identify and extrapolate trends and seasonal patterns separately, then combines them to produce forecasts. This decomposition approach makes the model's predictions transparent and allows practitioners to understand which components are driving the forecast.

Advantages

N-BEATS offers several key advantages that make it particularly valuable for time series forecasting. First, its interpretable architecture allows you to understand exactly how the model makes predictions by examining the trend and seasonal components separately. This transparency is important for business applications where stakeholders need to understand and trust the forecasting process. The model can identify whether a forecast is driven primarily by trend, seasonality, or both, providing valuable insights for decision-making.

Second, N-BEATS demonstrates strong performance across diverse time series without requiring extensive hyperparameter tuning or domain-specific knowledge. The architecture is designed to be robust and general-purpose, often outperforming both traditional statistical methods and other deep learning approaches on standard benchmarks. This makes it particularly valuable when working with multiple time series that may have different characteristics.

Finally, the model's modular design allows for easy extension and customization. You can modify the basis functions, add new components, or adjust the architecture to better suit specific forecasting problems. This flexibility makes N-BEATS adaptable to various domains, from financial forecasting to demand planning and beyond.

Disadvantages

Despite its strengths, N-BEATS has several limitations that practitioners should consider. The model's computational requirements can be substantial, especially for long time series or when training on large datasets. The deep architecture with multiple fully connected layers requires significant computational resources, which may limit its applicability in resource-constrained environments or when real-time forecasting is needed.

Another limitation is that N-BEATS assumes the time series can be decomposed into trend and seasonal components, which may not always be appropriate. Some time series have complex, non-stationary patterns that don't fit this decomposition framework well. In such cases, the model may struggle to capture the underlying dynamics effectively, leading to poor forecasts.

The model also requires sufficient historical data to learn the underlying patterns effectively. For very short time series or those with limited historical context, N-BEATS may not have enough information to identify meaningful trends and seasonality, potentially leading to overfitting or poor generalization. Additionally, the interpretability comes at the cost of some flexibility compared to more general deep learning architectures that can capture arbitrary non-linear relationships.

Formula

Imagine you're analyzing monthly sales data for a retail store. The numbers tell a story: steady growth over the years, but also predictable spikes every December. The challenge is that these two patterns, the long-term growth and the seasonal cycles, are woven together in a single stream of numbers. How do we separate them? More importantly, how do we predict future values when both patterns might evolve in complex ways?

This is the fundamental problem N-BEATS addresses: forecasting time series that contain multiple overlapping patterns. The solution lies in a mathematical idea called basis expansion, the same principle that allows us to decompose a musical chord into individual notes, or break down a complex signal into simpler frequencies.

The Core Insight: Decomposition as the Foundation

Before diving into formulas, let's build intuition. When you look at a time series, what you're seeing is the sum of different underlying forces:

Long-term trends: The overall direction. Is the business growing, declining, or stable?
Seasonal patterns: Repeating cycles. Do sales peak in certain months, days, or seasons?
Random noise: Unpredictable fluctuations that we can't model

The key insight is that if we can mathematically separate these components, we can:

Understand what's driving our forecasts (interpretability)
Extrapolate each component independently (flexibility)
Learn the right combination from data (adaptability)

This separation is exactly what N-BEATS does. The model doesn't just predict numbers. It decomposes the time series into interpretable building blocks, then combines them to produce forecasts.

Formalizing the Decomposition

To make this intuition precise, N-BEATS expresses any time series value $y_t$ as the sum of trend components, seasonal components, and a small error term. This is the fundamental decomposition structure:

y_t = \sum_{k=1}^{K} g_k^t(x_t) + \sum_{k=1}^{K} g_k^s(x_t) + \epsilon_t

where:

$y_t$ : observed value at time $t$
$g_k^t(x_t)$ : contribution of the $k$ -th trend basis function at time $t$ (captures long-term patterns)
$g_k^s(x_t)$ : contribution of the $k$ -th seasonal basis function at time $t$ (captures periodic patterns)
$K$ : number of basis functions for each component type
$\epsilon_t$ : error term (residual noise we can't predict)
$x_t$ : input features at time $t$ (typically a window of historical values)

Why this structure matters: This equation tells us that every observation is built from two types of interpretable components plus noise. The trend components capture long-term direction and growth patterns, while the seasonal components capture repeating cycles. By separating these, we can understand not just what the forecast is, but why it takes that value.

Note: When making forecasts, this decomposition is applied at each forecast horizon $h$ . The basis functions $g_k^t$ and $g_k^s$ are evaluated using coefficients learned by the neural network, which we'll see in detail below.

But here's the key question: How do we choose the right basis functions $g_k^t$ and $g_k^s$ ? We need functions that:

Can represent the types of patterns we see in real time series
Allow us to learn how much each function contributes to the forecast
Maintain interpretability so we can understand what's driving predictions

This is where N-BEATS' mathematical design addresses each component systematically.

Trend Basis Functions: Capturing Growth Patterns

When we examine a time series, trends reveal the underlying direction of change. But trends come in many shapes:

A startup's revenue might grow steadily month after month (linear growth)
A company gaining market share might start slow and accelerate as it scales (quadratic growth)
A product lifecycle might show growth, then plateau, then decline (cubic with inflection points)

The challenge: We don't know in advance which pattern our time series will exhibit. We need a flexible way to represent all these possibilities.

Why polynomials? Polynomials are mathematical functions that can represent different types of growth patterns. The key insight is that we don't have to choose just one. We can include multiple polynomial degrees and let the model learn how much each one contributes. This is the foundation of N-BEATS' trend modeling.

For a forecast horizon $H$ (the number of future periods we want to predict), we compute the trend component at each forecast step $h$ . The trend component at horizon $h$ is:

\text{trend}_h = \sum_{k=1}^{K} \alpha_{k,h} \cdot \left(\frac{h}{H}\right)^k

where:

$\alpha_{k,h}$ : learned coefficient for the $k$ -th trend basis function at forecast horizon $h$
$H$ : total forecast horizon (e.g., 6 months, 12 months)
$h$ : specific forecast step within the horizon ( $h \in \{1, 2, \ldots, H\}$ )
$k$ : polynomial degree (typically $k \in \{1, 2, 3\}$ )
$K$ : number of polynomial basis functions used

Understanding the normalization: The term $\left(\frac{h}{H}\right)^k$ is carefully designed. By normalizing the horizon $h$ by the total horizon $H$ , we create a scale that ranges from 0 to 1. This normalization is important because it makes the basis functions scale-invariant. The same model architecture works whether we're forecasting 6 months or 12 months ahead. Without this normalization, we'd need different models for different forecast horizons.

How the exponent $k$ shapes the pattern:

The exponent $k$ determines the mathematical shape of the trend pattern:

$k=1$ (linear): The function $\frac{h}{H}$ creates a straight line. This works well for steady, consistent growth or decline. If the learned coefficient $\alpha_{1,h}$ is positive, the trend increases linearly; if negative, it decreases linearly.
$k=2$ (quadratic): The function $\left(\frac{h}{H}\right)^2$ creates a curve that accelerates or decelerates. This captures scenarios where growth starts slow and speeds up (positive acceleration), or starts fast and slows down (negative acceleration).
$k=3$ (cubic): The function $\left(\frac{h}{H}\right)^3$ creates more complex curves with inflection points, places where the curve changes direction. This allows the model to capture trends that grow, plateau, and then decline, or other complex directional changes.

The power of learned coefficients: The coefficients $\alpha_{k,h}$ make this approach flexible. Rather than forcing a single polynomial degree, N-BEATS learns coefficients for multiple degrees simultaneously. The neural network examines the historical data and automatically determines: "This time series needs 70% linear trend, 20% quadratic trend, and 10% cubic trend."

This automatic selection makes N-BEATS both flexible (it adapts to different patterns) and interpretable (we can see which trend components are driving the forecast). The model fits a curve by decomposing the trend into understandable components.

Seasonal Basis Functions: Modeling Periodic Patterns

While trends capture long-term direction, seasonal patterns capture repeating cycles. Think of retail sales: they might peak every December (yearly seasonality), or show weekly patterns with higher sales on weekends. These patterns repeat at regular intervals, creating waves in the time series.

The challenge: Seasonal patterns can have different shapes. Some are smooth sine waves, others have multiple peaks and valleys within a single cycle. How do we represent all these possibilities?

Why trigonometric functions? Trigonometric functions (sine and cosine) are the natural mathematical language for periodic patterns. This isn't arbitrary. It's grounded in Fourier analysis, which tells us that any periodic signal can be decomposed into sine and cosine waves. This theoretical foundation gives us confidence that trigonometric basis functions can represent the seasonal patterns we see in real time series.

The seasonal component at forecast horizon $h$ is:

\text{seasonal}_h = \sum_{k=1}^{K} \left[ \beta_{k,h} \cdot \sin\left(\frac{2\pi k h}{P}\right) + \gamma_{k,h} \cdot \cos\left(\frac{2\pi k h}{P}\right) \right]

where:

$\beta_{k,h}, \gamma_{k,h}$ : learned coefficients for sine and cosine components of the $k$ -th harmonic at forecast horizon $h$
$P$ : seasonal period (e.g., 12 for monthly data with yearly seasonality)
$k$ : harmonic frequency (typically $k \in \{1, 2, 3, 4\}$ )
$h$ : specific forecast step within the horizon ( $h \in \{1, 2, \ldots, H\}$ )
$K$ : number of harmonic basis functions used

Why both sine and cosine? This is an important question. A pure sine wave starts at zero and oscillates, but real seasonal patterns can start at any point in the cycle. For example, retail sales might peak in December (month 12), but a sine wave that starts at zero would peak at a different time.

The solution lies in phase shifts. Mathematically, any sinusoidal pattern can be written as $A \sin(\omega t + \phi)$ , where $\phi$ is the phase shift. Using trigonometric identities, this can be expanded to $A_1 \sin(\omega t) + A_2 \cos(\omega t)$ . By combining sine and cosine with learned coefficients, we can represent any phase-shifted pattern. The coefficients $\beta_{k,h}$ and $\gamma_{k,h}$ automatically learn the right combination to match the phase of the actual seasonal pattern.

Understanding harmonics with parameter $k$ : The parameter $k$ controls the frequency of oscillations. When $k=1$ , we capture the fundamental seasonal pattern, the main yearly cycle, for example. But real seasonal patterns are often more complex.

Consider a retail time series: it might have a yearly cycle (higher in December) but also sub-cycles within the year (higher in spring and fall). Higher values of $k$ capture these harmonics:

$k=1$ : The fundamental pattern (one complete cycle per period)
$k=2$ : Patterns that repeat twice per cycle (e.g., spring and fall peaks)
$k=3$ : Patterns that repeat three times per cycle
$k=4$ : Even more complex patterns with multiple peaks and valleys

By including multiple harmonics, N-BEATS can represent complex seasonal patterns that have multiple peaks and valleys within a single cycle. The model learns which harmonics are needed for each specific time series.

The role of period $P$ : The period $P$ determines the length of one complete cycle. For monthly data with yearly seasonality, $P=12$ means the pattern repeats every 12 months. The model learns the coefficients $\beta_{k,h}$ and $\gamma_{k,h}$ to determine both:

Amplitude: How strong the seasonal effect is (the height of the peaks)
Phase: When the peaks and valleys occur (the timing within the cycle)

This automatic learning allows the model to adapt to different seasonal patterns without manual tuning. The same architecture can handle yearly seasonality ( $P=12$ ), quarterly patterns ( $P=4$ ), or weekly patterns ( $P=7$ ), simply by adjusting the period parameter.

Neural Network Architecture: Learning Coefficients from Data

Here's where N-BEATS becomes particularly effective: the coefficients $\alpha_{k,h}$ , $\beta_{k,h}$ , and $\gamma_{k,h}$ are not fixed values chosen by hand, but are learned automatically from data by a neural network. This automatic learning makes N-BEATS adaptive and general-purpose.

The network examines historical patterns in your specific time series and automatically determines: "For this particular dataset, I need 60% linear trend, 30% quadratic trend, and 10% cubic trend, with a seasonal pattern that peaks in month 6 and has a secondary peak in month 12." Feed it different data, and it learns a different decomposition.

Why a neural network? The relationship between historical patterns and future coefficients is complex and non-linear. Consider this: if recent values are increasing rapidly, we might need strong linear and quadratic trend components. But if the increase is accelerating, we might need more quadratic than linear. If there's a seasonal peak approaching, we need specific sine/cosine coefficients to capture that phase.

A simple linear model couldn't capture these nuanced relationships. The neural network learns these patterns through training, discovering relationships that would be difficult to specify manually.

The learning mechanism: The network takes a window of historical values as input and outputs all the coefficients needed for forecasting:

\boldsymbol{\theta} = \text{NN}(\mathbf{x}_t)

where:

$\boldsymbol{\theta} = [\alpha_{1,1}, \alpha_{1,2}, \ldots, \beta_{1,1}, \beta_{1,2}, \ldots, \gamma_{1,1}, \gamma_{1,2}, \ldots]^T$ : vector of all coefficients
$\mathbf{x}_t$ : input features (typically a window of historical values)
$\text{NN}(\cdot)$ : neural network function

How the network processes information: The neural network consists of multiple fully connected layers that transform the input historical data into the coefficient vector. Think of each layer as learning increasingly abstract patterns:

First layer: Detects simple features like "recent values are increasing" or "there's a peak in the last 12 months"
Second layer: Combines these simple features into more complex patterns like "accelerating growth with seasonal peaks" or "steady decline with quarterly cycles"
Final layer: Translates these abstract patterns into the specific coefficients needed for forecasting

The mathematical structure: Each layer applies a linear transformation followed by a ReLU activation:

\mathbf{h}_1 = \text{ReLU}(\mathbf{W}_1 \mathbf{x}_t + \mathbf{b}_1)

\mathbf{h}_2 = \text{ReLU}(\mathbf{W}_2 \mathbf{h}_1 + \mathbf{b}_2)

\boldsymbol{\theta} = \mathbf{W}_3 \mathbf{h}_2 + \mathbf{b}_3

where:

$\mathbf{x}_t$ : input vector (window of historical time series values)
$\mathbf{h}_1, \mathbf{h}_2$ : hidden layer activations (intermediate representations)
$\mathbf{W}_i$ : weight matrix for layer $i$ (learned parameters that transform the input)
$\mathbf{b}_i$ : bias vector for layer $i$ (learned parameters that shift the transformation)
$\boldsymbol{\theta}$ : output vector containing all learned coefficients $[\alpha_{1,1}, \alpha_{1,2}, \ldots, \beta_{1,1}, \beta_{1,2}, \ldots, \gamma_{1,1}, \gamma_{1,2}, \ldots]^T$
$\text{ReLU}(z) = \max(0, z)$ : Rectified Linear Unit activation function (applied element-wise)

Why ReLU activation? The ReLU (Rectified Linear Unit) activation function introduces non-linearity by setting negative values to zero. This is important because it allows the network to learn complex, non-linear relationships between historical patterns and future coefficients. Without this non-linearity, the network would only be able to learn linear relationships, severely limiting its ability to capture the complex patterns in time series data.

Interpreting coefficients: The final layer outputs the coefficient vector $\boldsymbol{\theta}$ without activation, as coefficients can be positive or negative. This design choice is important:

A positive coefficient means that basis function contributes positively to the forecast (e.g., increasing trend, seasonal peak)
A negative coefficient means it contributes negatively (e.g., decreasing trend, seasonal trough)

This allows the model to represent both upward and downward patterns, and to cancel out components when needed to match the actual time series behavior.

Out[2]:

Visualization

N-BEATS architecture flow diagram showing how historical time series data flows through the neural network to produce interpretable forecasts. The input window of historical values is processed by separate neural network blocks for trend and seasonality, which learn coefficients (α, β, γ) that are then applied to basis functions. The trend and seasonal components are combined additively to produce the final forecast, maintaining interpretability throughout the process.

Loss Function: Guiding the Learning Process

We've seen how N-BEATS represents time series using basis functions and how the neural network learns coefficients. But here's an important question: How does the network know if it's learning the right coefficients?

This is where the loss function comes in. It measures how well the model's predictions match the actual values, providing feedback that guides the learning process. Without this feedback mechanism, the network would have no way to know whether its learned coefficients are producing good forecasts.

The optimization objective: The model learns the optimal coefficients by minimizing prediction error on training data. The loss function measures how far the model's predictions are from the actual values using mean squared error:

\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2

where:

$N$ : total number of predictions (across all training samples and forecast horizons)
$\hat{y}_i$ : predicted value for the $i$ -th prediction
$y_i$ : actual value for the $i$ -th prediction
$\mathcal{L}$ : mean squared error loss

In practice, for multi-step forecasting with $T$ training samples and forecast horizon $H$ , we have $N = T \cdot H$ total predictions, and the loss averages the squared errors across all of them.

Why mean squared error? This is the mean squared error (MSE), which penalizes large prediction errors more than small ones. Squaring the errors means that being off by 10 units is penalized 100 times more than being off by 1 unit. This encourages the model to avoid large mistakes, which is particularly important in forecasting where a few large errors can be more costly than many small ones.

How predictions are computed: For each forecast horizon $h$ , the predicted value $\hat{y}_{t+h}$ is computed by combining the trend and seasonal components:

\hat{y}_{t+h} = \text{trend}_h + \text{seasonal}_h = \sum_{k=1}^{K} \alpha_{k,h} \cdot \left(\frac{h}{H}\right)^k + \sum_{k=1}^{K} \left[ \beta_{k,h} \cdot \sin\left(\frac{2\pi k h}{P}\right) + \gamma_{k,h} \cdot \cos\left(\frac{2\pi k h}{P}\right) \right]

where:

$\hat{y}_{t+h}$ : predicted value at time $t+h$ (forecast $h$ steps ahead)
$\text{trend}_h$ : trend component at forecast horizon $h$
$\text{seasonal}_h$ : seasonal component at forecast horizon $h$
All other symbols are as defined previously

This is where everything comes together: the basis functions are evaluated at each forecast step $h$ using the learned coefficients, then combined additively to produce the forecast.

The learning process: During training, the neural network adjusts its weights through backpropagation to minimize this loss. The process works iteratively:

The network makes a prediction using current coefficients
It compares the prediction to the actual value
It calculates the loss (how far off it was)
It adjusts its weights to reduce the loss
Repeat over many iterations

Over many iterations, the network learns to extract coefficients that produce accurate forecasts.

The deeper insight: The loss function measures accuracy and also guides the decomposition. When the model minimizes the loss, it's simultaneously learning:

How to separate trend from seasonality: The model needs to figure out which patterns are long-term trends versus repeating cycles
Which polynomial degrees are needed: Does this time series need linear, quadratic, or cubic trend components?
Which harmonics are needed: What seasonal frequencies are present in the data?
The appropriate coefficients: How much should each basis function contribute?

All of this happens automatically through the optimization process. The model doesn't need us to tell it what patterns to look for. It discovers them by minimizing prediction error. This automatic discovery makes N-BEATS both effective (it adapts to different patterns) and general-purpose (the same architecture works across diverse forecasting problems).

Why This Mathematical Framework Works: The Complete Picture

We've now built up the complete mathematical framework, from decomposition to basis functions to neural network learning. Let's step back and understand why this particular combination of ideas is so effective for time series forecasting. Each design choice serves a specific purpose, and together they create a system that is both theoretically sound and practically powerful.

1. Theoretical foundation: The basis expansion approach is grounded in fundamental mathematical theory. Two key theorems guarantee that our approach can represent the patterns we need:

Weierstrass approximation theorem: Polynomials can approximate any smooth function to arbitrary precision. This means our polynomial trend basis functions can represent any smooth growth pattern we encounter.
Fourier's theorem: Trigonometric functions can represent any periodic function. This means our trigonometric seasonal basis functions can capture any repeating pattern.

By combining both, we get a framework that's theoretically sound: we know mathematically that our basis functions are capable of representing the types of patterns we see in real time series. This theoretical guarantee gives us confidence that the model architecture isn't fundamentally limited in what it can learn.

2. Additive decomposition for interpretability: The choice to add trend and seasonal components together (rather than multiply them) is important for interpretability. When components are additive, they don't interfere with each other. This means:

We can examine the trend component and understand long-term growth patterns without seasonal fluctuations obscuring our view
We can examine the seasonal component and understand periodic patterns without the trend distorting the seasonal shape
We can see exactly how much each component contributes to the final forecast

This separation makes N-BEATS interpretable: we can see what's driving our forecasts. The additive nature also makes the model robust, as it doesn't impose strong assumptions about how trend and seasonality interact.

3. Automatic adaptation through learning: The neural network learning mechanism transforms this mathematical framework from a theoretical possibility into a practical tool. The network learns coefficients and the right decomposition for each specific time series.

When you feed it historical sales data, it might learn: "This time series needs a strong linear trend with a moderate seasonal pattern that peaks in December." Feed it different data, and it learns a different decomposition. This automatic adaptation makes N-BEATS general-purpose: the same architecture works across diverse forecasting problems without requiring domain-specific knowledge or extensive hyperparameter tuning.

4. Scale invariance: The normalization in the trend basis functions ( $\frac{h}{H}$ ) and the period-based scaling in seasonal functions ( $\frac{2\pi k h}{P}$ ) make the model scale-invariant. This means:

The same model architecture can work with different forecast horizons (6 months, 12 months, etc.) without modification
The same architecture can handle different seasonal periods (yearly, quarterly, weekly) by adjusting the period parameter
The normalization automatically adjusts the basis functions to the appropriate scale

This design choice means we don't need separate models for different forecasting scenarios. One architecture adapts to them all.

The synthesis: Together, these mathematical properties create a forecasting system that is simultaneously interpretable, flexible, and adaptive, a rare combination in the world of deep learning models. The theoretical foundation ensures we're not missing important patterns, the additive decomposition ensures we can understand what we're learning, the neural network ensures we can adapt to any specific time series automatically, and the scale invariance ensures the same architecture works across different forecasting scenarios.

This is the power of N-BEATS: it combines mathematical rigor with practical flexibility, creating a model that is both theoretically sound and practically useful.

Visualizing N-BEATS

Let's create comprehensive visualizations that demonstrate how N-BEATS decomposes time series into interpretable components. These plots will help you understand the model's internal workings and how it separates trend from seasonal patterns.

The first plot shows a synthetic time series with both trend and seasonal components, along with N-BEATS' decomposition. You'll see how the model identifies the underlying trend (red line) and seasonal pattern (blue line) separately, then combines them to produce the final forecast (green line). Here you can see the model's ability to separate different types of patterns in the data.

The second and third plots focus on the basis functions themselves, showing how the polynomial trend functions and trigonometric seasonal functions work. These plots help you understand the mathematical foundation of the decomposition and how different basis functions contribute to the overall forecast.

Out[3]:

Visualization

N-BEATS decomposition of a synthetic time series with trend and seasonal components. The model successfully separates the underlying linear trend (red) from the seasonal pattern (blue), then combines them to produce accurate forecasts (green). Here's how N-BEATS can identify and extrapolate different types of patterns in time series data.

Out[4]:

Visualization

Polynomial trend basis functions used in N-BEATS decomposition. The linear (k=1), quadratic (k=2), and cubic (k=3) functions capture different types of growth patterns, from steady linear growth to accelerating or decelerating trends with inflection points. These functions form the mathematical foundation for N-BEATS' interpretable trend decomposition.

Out[5]:

Visualization

Trigonometric seasonal basis functions used in N-BEATS decomposition. The sine and cosine functions at different harmonic frequencies (k=1, 2, 3, 4) capture periodic patterns with varying complexity. These functions allow N-BEATS to represent seasonal patterns with different phases and multiple peaks and valleys within a single cycle, forming the mathematical foundation for interpretable seasonal decomposition.

Example

Now that we understand the mathematical framework, let's see N-BEATS in action. We'll work through a concrete example that demonstrates how the formulas we've discussed translate into a working forecasting model.

The scenario: Suppose we have monthly sales data for a retail store over 24 months, and we want to forecast the next 6 months. This is a realistic forecasting problem where we need to separate the underlying growth trend from seasonal patterns.

What we'll learn: Through this example, you'll see:

How the decomposition formula $y_t = \sum_{k=1}^{K} g_k^t(x_t) + \sum_{k=1}^{K} g_k^s(x_t) + \epsilon_t$ is applied in practice
How the neural network learns the coefficients $\alpha_{k,h}$ , $\beta_{k,h}$ , and $\gamma_{k,h}$ from data
How the basis functions combine to produce interpretable forecasts

Let's begin by creating synthetic data that mimics a real-world scenario with both trend and seasonality.

Step 1: Prepare the Input Data

First, we need to format our time series data for N-BEATS. The model typically takes a window of historical values as input. This is the $\mathbf{x}_t$ in our formula. We'll create data that has both a linear growth trend and yearly seasonality, which will allow us to see how N-BEATS separates these components:

In[6]:

1import numpy as np
2import pandas as pd
3
4# Generate sample monthly sales data
5np.random.seed(42)
6months = 24
7t = np.arange(months)
8
9# Create trend + seasonality + noise
10trend = 100 + 2 * t  # Linear growth
11seasonal = 20 * np.sin(2 * np.pi * t / 12) + 10 * np.cos(2 * np.pi * t / 12)
12noise = np.random.normal(0, 5, months)
13sales = trend + seasonal + noise
14
15# Create DataFrame
16df = pd.DataFrame({
17    'month': t + 1,
18    'sales': sales
19})

1import numpy as np
2import pandas as pd
3
4# Generate sample monthly sales data
5np.random.seed(42)
6months = 24
7t = np.arange(months)
8
9# Create trend + seasonality + noise
10trend = 100 + 2 * t  # Linear growth
11seasonal = 20 * np.sin(2 * np.pi * t / 12) + 10 * np.cos(2 * np.pi * t / 12)
12noise = np.random.normal(0, 5, months)
13sales = trend + seasonal + noise
14
15# Create DataFrame
16df = pd.DataFrame({
17    'month': t + 1,
18    'sales': sales
19})

Out[7]:

Sample sales data:
   month       sales
0      1  112.483571
1      2  119.968933
2      3  129.558951
3      4  133.615149
4      5  119.149741
5      6  110.169061
6      7  109.896064
7      8   99.176920
8      9   91.332120
9     10  100.712800

The sales data shows monthly values with both trend and seasonal components. Notice how the values combine a steady upward trend (the linear growth) with periodic fluctuations (the seasonal pattern). The first 10 months demonstrate the underlying pattern that N-BEATS will learn to decompose and forecast.

What's happening mathematically: This data follows the decomposition formula we discussed: $y_t = \text{trend} + \text{seasonal} + \epsilon_t$ . The trend component is $100 + 2t$ (linear growth), the seasonal component is $20\sin(2\pi t/12) + 10\cos(2\pi t/12)$ (yearly cycle), and we've added noise $\epsilon_t$ to make it realistic. N-BEATS will learn to separate these components automatically.

Step 2: Define the N-BEATS Architecture

Now let's implement a simplified version of N-BEATS to understand how it works. This implementation will demonstrate the key components we discussed in the Formula section:

The neural network that learns coefficients ( $\boldsymbol{\theta} = \text{NN}(\mathbf{x}_t)$ )
The trend basis functions ( $g_k^t$ with polynomial terms)
The seasonal basis functions ( $g_k^s$ with trigonometric terms)
The additive combination to produce forecasts

In[8]:

1import torch
2import torch.nn as nn
3
4class SimpleNBEATS(nn.Module):
5    def __init__(self, input_size=12, hidden_size=64, output_size=6):
6        super(SimpleNBEATS, self).__init__()
7        
8        # Neural network to learn coefficients
9        self.network = nn.Sequential(
10            nn.Linear(input_size, hidden_size),
11            nn.ReLU(),
12            nn.Linear(hidden_size, hidden_size),
13            nn.ReLU(),
14            nn.Linear(hidden_size, output_size * 4)  # 4 coefficients per horizon
15        )
16        
17        self.output_size = output_size
18    
19    def forward(self, x):
20        # Get coefficients from neural network
21        coeffs = self.network(x)
22        
23        # Reshape coefficients
24        coeffs = coeffs.view(-1, self.output_size, 4)
25        
26        # Extract trend and seasonal coefficients
27        trend_coeffs = coeffs[:, :, :2]  # Linear and quadratic
28        seasonal_coeffs = coeffs[:, :, 2:]  # Sine and cosine
29        
30        # Generate forecasts
31        forecasts = []
32        for h in range(self.output_size):
33            # Trend component
34            trend_h = trend_coeffs[:, h, 0] * (h + 1) + trend_coeffs[:, h, 1] * ((h + 1) ** 2)
35            
36            # Seasonal component
37            seasonal_h = seasonal_coeffs[:, h, 0] * np.sin(2 * np.pi * (h + 1) / 12) + \
38                        seasonal_coeffs[:, h, 1] * np.cos(2 * np.pi * (h + 1) / 12)
39            
40            forecast_h = trend_h + seasonal_h
41            forecasts.append(forecast_h)
42        
43        return torch.stack(forecasts, dim=1)
44
45# Initialize model
46model = SimpleNBEATS(input_size=12, hidden_size=64, output_size=6)
47total_params = sum(p.numel() for p in model.parameters())

1import torch
2import torch.nn as nn
3
4class SimpleNBEATS(nn.Module):
5    def __init__(self, input_size=12, hidden_size=64, output_size=6):
6        super(SimpleNBEATS, self).__init__()
7        
8        # Neural network to learn coefficients
9        self.network = nn.Sequential(
10            nn.Linear(input_size, hidden_size),
11            nn.ReLU(),
12            nn.Linear(hidden_size, hidden_size),
13            nn.ReLU(),
14            nn.Linear(hidden_size, output_size * 4)  # 4 coefficients per horizon
15        )
16        
17        self.output_size = output_size
18    
19    def forward(self, x):
20        # Get coefficients from neural network
21        coeffs = self.network(x)
22        
23        # Reshape coefficients
24        coeffs = coeffs.view(-1, self.output_size, 4)
25        
26        # Extract trend and seasonal coefficients
27        trend_coeffs = coeffs[:, :, :2]  # Linear and quadratic
28        seasonal_coeffs = coeffs[:, :, 2:]  # Sine and cosine
29        
30        # Generate forecasts
31        forecasts = []
32        for h in range(self.output_size):
33            # Trend component
34            trend_h = trend_coeffs[:, h, 0] * (h + 1) + trend_coeffs[:, h, 1] * ((h + 1) ** 2)
35            
36            # Seasonal component
37            seasonal_h = seasonal_coeffs[:, h, 0] * np.sin(2 * np.pi * (h + 1) / 12) + \
38                        seasonal_coeffs[:, h, 1] * np.cos(2 * np.pi * (h + 1) / 12)
39            
40            forecast_h = trend_h + seasonal_h
41            forecasts.append(forecast_h)
42        
43        return torch.stack(forecasts, dim=1)
44
45# Initialize model
46model = SimpleNBEATS(input_size=12, hidden_size=64, output_size=6)
47total_params = sum(p.numel() for p in model.parameters())

Out[9]:

Model parameters: 6552

The model architecture has a moderate number of parameters, making it suitable for learning the trend and seasonal patterns in the sales data.

Step 3: Prepare Training Data

We need to create input-output pairs for training:

In[10]:

1def create_sequences(data, input_size=12, output_size=6):
2    """Create input-output sequences for training"""
3    X, y = [], []
4    
5    for i in range(len(data) - input_size - output_size + 1):
6        X.append(data[i:i + input_size])
7        y.append(data[i + input_size:i + input_size + output_size])
8    
9    return np.array(X), np.array(y)
10
11# Create training sequences
12X, y = create_sequences(sales, input_size=12, output_size=6)

1def create_sequences(data, input_size=12, output_size=6):
2    """Create input-output sequences for training"""
3    X, y = [], []
4    
5    for i in range(len(data) - input_size - output_size + 1):
6        X.append(data[i:i + input_size])
7        y.append(data[i + input_size:i + input_size + output_size])
8    
9    return np.array(X), np.array(y)
10
11# Create training sequences
12X, y = create_sequences(sales, input_size=12, output_size=6)

Out[11]:

Input shape: (7, 12)
Output shape: (7, 6)
Number of training samples: 7

The sliding window approach creates multiple training samples from the 24 months of data. Each sample uses 12 months as input (this is our $\mathbf{x}_t$ ) to predict the next 6 months (this is our forecast horizon $H$ ). This approach allows the model to learn from different segments of the time series, seeing how historical patterns relate to future values.

Why this matters: By creating multiple training samples, we give the neural network many examples of how historical patterns (the input window) should translate into coefficient values (the learned $\alpha_{k,h}$ , $\beta_{k,h}$ , $\gamma_{k,h}$ ) that produce accurate forecasts.

Step 4: Train the Model

Now let's train our N-BEATS model. This is where the loss function we discussed comes into play. The model will learn the optimal coefficients by minimizing the mean squared error:

\mathcal{L} = \frac{1}{T} \sum_{t=1}^{T} (y_t - \hat{y}_t)^2

During training, the network will:

Make predictions using current coefficients
Calculate the loss (how far predictions are from actual values)
Adjust its weights to reduce the loss
Repeat until the coefficients produce accurate forecasts

In[12]:

1import torch.optim as optim
2
3# Convert to PyTorch tensors
4X_tensor = torch.FloatTensor(X)
5y_tensor = torch.FloatTensor(y)
6
7# Training setup
8optimizer = optim.Adam(model.parameters(), lr=0.001)
9criterion = nn.MSELoss()
10
11# Training loop
12epochs = 1000
13model.train()
14training_losses = []
15
16for epoch in range(epochs):
17    optimizer.zero_grad()
18    
19    # Forward pass
20    predictions = model(X_tensor)
21    
22    # Calculate loss
23    loss = criterion(predictions, y_tensor)
24    
25    # Backward pass
26    loss.backward()
27    optimizer.step()
28    
29    training_losses.append(loss.item())

1import torch.optim as optim
2
3# Convert to PyTorch tensors
4X_tensor = torch.FloatTensor(X)
5y_tensor = torch.FloatTensor(y)
6
7# Training setup
8optimizer = optim.Adam(model.parameters(), lr=0.001)
9criterion = nn.MSELoss()
10
11# Training loop
12epochs = 1000
13model.train()
14training_losses = []
15
16for epoch in range(epochs):
17    optimizer.zero_grad()
18    
19    # Forward pass
20    predictions = model(X_tensor)
21    
22    # Calculate loss
23    loss = criterion(predictions, y_tensor)
24    
25    # Backward pass
26    loss.backward()
27    optimizer.step()
28    
29    training_losses.append(loss.item())

Out[13]:

Epoch 0, Loss: 18131.4375
Epoch 200, Loss: 28.0093
Epoch 400, Loss: 14.2194
Epoch 600, Loss: 7.9382
Epoch 800, Loss: 4.9016
Final training loss: 3.1185

The training loss decreases over epochs, indicating that the model is learning to capture the underlying patterns in the sales data. The final loss value shows how well the model fits the training data, with lower values indicating better fit.

What's happening mathematically: As the loss decreases, the neural network is learning the optimal coefficients $\alpha_{k,h}$ , $\beta_{k,h}$ , and $\gamma_{k,h}$ that minimize prediction error. The network is simultaneously learning:

How to separate trend from seasonality
Which polynomial degrees are needed for the trend
Which harmonics are needed for the seasonality
The appropriate coefficients for each basis function

Step 5: Make Predictions

Finally, let's use our trained model to make forecasts. This is where all the pieces come together:

The neural network takes the last 12 months as input ( $\mathbf{x}_t$ )
It outputs the learned coefficients ( $\boldsymbol{\theta} = \text{NN}(\mathbf{x}_t)$ )
These coefficients are applied to the basis functions ( $g_k^t$ and $g_k^s$ )
The trend and seasonal components are combined additively to produce the forecast

This is the decomposition formula in action. For each forecast horizon $h$ , we compute:

\begin{align*} \hat{y}_{t+h} &= \text{trend}_h + \text{seasonal}_h \\ &= \sum_{k=1}^{K} \alpha_{k,h} \cdot \left(\frac{h}{H}\right)^k \\ &\quad + \sum_{k=1}^{K} \left[ \beta_{k,h} \cdot \sin\left(\frac{2\pi k h}{P}\right) + \gamma_{k,h} \cdot \cos\left(\frac{2\pi k h}{P}\right) \right] \end{align*}

In[14]:

1# Prepare last 12 months for prediction
2last_window = sales[-12:].reshape(1, -1)
3last_window_tensor = torch.FloatTensor(last_window)
4
5# Make prediction
6model.eval()
7with torch.no_grad():
8    forecast = model(last_window_tensor)
9    forecast_np = forecast.numpy().flatten()

1# Prepare last 12 months for prediction
2last_window = sales[-12:].reshape(1, -1)
3last_window_tensor = torch.FloatTensor(last_window)
4
5# Make prediction
6model.eval()
7with torch.no_grad():
8    forecast = model(last_window_tensor)
9    forecast_np = forecast.numpy().flatten()

Out[15]:

N-BEATS Forecast for next 6 months:
Month 1: 157.84
Month 2: 158.58
Month 3: 161.23
Month 4: 172.19
Month 5: 174.55
Month 6: 145.49

Out[16]:

Visualization

N-BEATS predictions for the example sales data. The plot shows the training data (blue), the model's forecast for the next 6 months (green), and the actual future values (black, if available). The forecast combines learned trend and seasonal components, demonstrating how N-BEATS extrapolates patterns from historical data to make interpretable predictions.

The forecast shows the predicted sales values for the next 6 months. These values combine the learned trend and seasonal patterns from the historical data. The model extrapolates the linear growth trend and applies the seasonal pattern learned from the 24 months of training data.

What we've accomplished: This example demonstrates the complete N-BEATS workflow:

Decomposition: The model learned to separate the time series into trend and seasonal components, following the formula $y_t = \sum_{k=1}^{K} g_k^t(x_t) + \sum_{k=1}^{K} g_k^s(x_t) + \epsilon_t$
Coefficient learning: The neural network automatically learned the coefficients $\alpha_{k,h}$ , $\beta_{k,h}$ , and $\gamma_{k,h}$ that best fit the data, without us having to specify them manually
Basis function application: The learned coefficients were applied to polynomial trend functions and trigonometric seasonal functions, creating interpretable components
Additive combination: The trend and seasonal components were combined additively to produce forecasts that we can understand and trust

This is the power of N-BEATS: it combines mathematical rigor (the basis expansion framework) with practical learning (the neural network) to create forecasts that are both accurate and interpretable.

Implementation in PyTorch

Now that we understand the mathematical framework and have seen a working example, let's build a complete, production-ready implementation. This implementation will translate the formulas we've discussed into actual PyTorch code that you can use for your own time series forecasting projects.

What we'll implement: We'll create a full N-BEATS model that includes:

The neural network architecture that learns coefficients ( $\boldsymbol{\theta} = \text{NN}(\mathbf{x}_t)$ )
The trend basis functions ( $g_k^t$ with polynomial terms)
The seasonal basis functions ( $g_k^s$ with trigonometric terms)
The training loop that minimizes the loss function ( $\mathcal{L} = \frac{1}{T} \sum_{t=1}^{T} (y_t - \hat{y}_t)^2$ )

This implementation follows the mathematical framework we've established, making it easy to see how the formulas translate into code.

Step 1: Import Required Libraries

In[17]:

1import torch
2import torch.nn as nn
3import torch.optim as optim
4import numpy as np
5import pandas as pd
6from sklearn.preprocessing import StandardScaler
7import matplotlib.pyplot as plt

1import torch
2import torch.nn as nn
3import torch.optim as optim
4import numpy as np
5import pandas as pd
6from sklearn.preprocessing import StandardScaler
7import matplotlib.pyplot as plt

Step 2: Define the N-BEATS Architecture

In[18]:

1class NBEATSBlock(nn.Module):
2    """Single N-BEATS block for trend or seasonal decomposition"""
3    
4    def __init__(self, input_size, theta_size, hidden_size=64):
5        super(NBEATSBlock, self).__init__()
6        
7        # Neural network to learn coefficients
8        self.network = nn.Sequential(
9            nn.Linear(input_size, hidden_size),
10            nn.ReLU(),
11            nn.Linear(hidden_size, hidden_size),
12            nn.ReLU(),
13            nn.Linear(hidden_size, theta_size)
14        )
15        
16        self.theta_size = theta_size
17    
18    def forward(self, x):
19        return self.network(x)
20
21class NBEATS(nn.Module):
22    """Complete N-BEATS model"""
23    
24    def __init__(self, input_size=12, forecast_horizon=6, hidden_size=64):
25        super(NBEATS, self).__init__()
26        
27        # Trend block
28        self.trend_block = NBEATSBlock(
29            input_size=input_size,
30            theta_size=forecast_horizon * 2,  # Linear and quadratic coefficients
31            hidden_size=hidden_size
32        )
33        
34        # Seasonal block
35        self.seasonal_block = NBEATSBlock(
36            input_size=input_size,
37            theta_size=forecast_horizon * 2,  # Sine and cosine coefficients
38            hidden_size=hidden_size
39        )
40        
41        self.forecast_horizon = forecast_horizon
42    
43    def forward(self, x):
44        # Get trend coefficients
45        trend_coeffs = self.trend_block(x)
46        trend_coeffs = trend_coeffs.view(-1, self.forecast_horizon, 2)
47        
48        # Get seasonal coefficients
49        seasonal_coeffs = self.seasonal_block(x)
50        seasonal_coeffs = seasonal_coeffs.view(-1, self.forecast_horizon, 2)
51        
52        # Generate forecasts
53        forecasts = []
54        for h in range(self.forecast_horizon):
55            # Trend component
56            trend_h = trend_coeffs[:, h, 0] * (h + 1) + trend_coeffs[:, h, 1] * ((h + 1) ** 2)
57            
58            # Seasonal component
59            seasonal_h = seasonal_coeffs[:, h, 0] * torch.sin(2 * torch.pi * torch.tensor(h + 1) / 12) + \
60                        seasonal_coeffs[:, h, 1] * torch.cos(2 * torch.pi * torch.tensor(h + 1) / 12)
61            
62            forecast_h = trend_h + seasonal_h
63            forecasts.append(forecast_h)
64        
65        return torch.stack(forecasts, dim=1)
66
67# Initialize model
68model = NBEATS(input_size=12, forecast_horizon=6, hidden_size=64)
69total_params = sum(p.numel() for p in model.parameters())

1class NBEATSBlock(nn.Module):
2    """Single N-BEATS block for trend or seasonal decomposition"""
3    
4    def __init__(self, input_size, theta_size, hidden_size=64):
5        super(NBEATSBlock, self).__init__()
6        
7        # Neural network to learn coefficients
8        self.network = nn.Sequential(
9            nn.Linear(input_size, hidden_size),
10            nn.ReLU(),
11            nn.Linear(hidden_size, hidden_size),
12            nn.ReLU(),
13            nn.Linear(hidden_size, theta_size)
14        )
15        
16        self.theta_size = theta_size
17    
18    def forward(self, x):
19        return self.network(x)
20
21class NBEATS(nn.Module):
22    """Complete N-BEATS model"""
23    
24    def __init__(self, input_size=12, forecast_horizon=6, hidden_size=64):
25        super(NBEATS, self).__init__()
26        
27        # Trend block
28        self.trend_block = NBEATSBlock(
29            input_size=input_size,
30            theta_size=forecast_horizon * 2,  # Linear and quadratic coefficients
31            hidden_size=hidden_size
32        )
33        
34        # Seasonal block
35        self.seasonal_block = NBEATSBlock(
36            input_size=input_size,
37            theta_size=forecast_horizon * 2,  # Sine and cosine coefficients
38            hidden_size=hidden_size
39        )
40        
41        self.forecast_horizon = forecast_horizon
42    
43    def forward(self, x):
44        # Get trend coefficients
45        trend_coeffs = self.trend_block(x)
46        trend_coeffs = trend_coeffs.view(-1, self.forecast_horizon, 2)
47        
48        # Get seasonal coefficients
49        seasonal_coeffs = self.seasonal_block(x)
50        seasonal_coeffs = seasonal_coeffs.view(-1, self.forecast_horizon, 2)
51        
52        # Generate forecasts
53        forecasts = []
54        for h in range(self.forecast_horizon):
55            # Trend component
56            trend_h = trend_coeffs[:, h, 0] * (h + 1) + trend_coeffs[:, h, 1] * ((h + 1) ** 2)
57            
58            # Seasonal component
59            seasonal_h = seasonal_coeffs[:, h, 0] * torch.sin(2 * torch.pi * torch.tensor(h + 1) / 12) + \
60                        seasonal_coeffs[:, h, 1] * torch.cos(2 * torch.pi * torch.tensor(h + 1) / 12)
61            
62            forecast_h = trend_h + seasonal_h
63            forecasts.append(forecast_h)
64        
65        return torch.stack(forecasts, dim=1)
66
67# Initialize model
68model = NBEATS(input_size=12, forecast_horizon=6, hidden_size=64)
69total_params = sum(p.numel() for p in model.parameters())

Out[19]:

Model parameters: 11544

The model contains a moderate number of parameters, which balances model capacity with computational efficiency. For this configuration with input_size=12, forecast_horizon=6, and hidden_size=64, the parameter count is reasonable for most time series forecasting tasks. This size allows the model to learn complex patterns without requiring excessive computational resources or risking overfitting on smaller datasets.

Step 3: Data Preparation and Training

In[20]:

1def prepare_data(data, input_size=12, output_size=6):
2    """Prepare data for N-BEATS training"""
3    X, y = [], []
4    
5    for i in range(len(data) - input_size - output_size + 1):
6        X.append(data[i:i + input_size])
7        y.append(data[i + input_size:i + input_size + output_size])
8    
9    return np.array(X), np.array(y)
10
11# Generate sample data
12np.random.seed(42)
13months = 100
14t = np.arange(months)
15
16# Create time series with trend and seasonality
17trend = 100 + 0.5 * t
18seasonal = 10 * np.sin(2 * np.pi * t / 12) + 5 * np.cos(2 * np.pi * t / 12)
19noise = np.random.normal(0, 2, months)
20data = trend + seasonal + noise
21
22# Prepare training data
23X, y = prepare_data(data, input_size=12, output_size=6)
24
25# Convert to PyTorch tensors
26X_tensor = torch.FloatTensor(X)
27y_tensor = torch.FloatTensor(y)

1def prepare_data(data, input_size=12, output_size=6):
2    """Prepare data for N-BEATS training"""
3    X, y = [], []
4    
5    for i in range(len(data) - input_size - output_size + 1):
6        X.append(data[i:i + input_size])
7        y.append(data[i + input_size:i + input_size + output_size])
8    
9    return np.array(X), np.array(y)
10
11# Generate sample data
12np.random.seed(42)
13months = 100
14t = np.arange(months)
15
16# Create time series with trend and seasonality
17trend = 100 + 0.5 * t
18seasonal = 10 * np.sin(2 * np.pi * t / 12) + 5 * np.cos(2 * np.pi * t / 12)
19noise = np.random.normal(0, 2, months)
20data = trend + seasonal + noise
21
22# Prepare training data
23X, y = prepare_data(data, input_size=12, output_size=6)
24
25# Convert to PyTorch tensors
26X_tensor = torch.FloatTensor(X)
27y_tensor = torch.FloatTensor(y)

Out[21]:

Training data shape: torch.Size([83, 12])
Target data shape: torch.Size([83, 6])
Number of training samples: 83

The training data consists of input sequences of length 12 (one year of monthly data) and target sequences of length 6 (6-month forecasts). The number of training samples is determined by the sliding window approach, which creates overlapping sequences from the historical data. With 100 months of data, the sliding window creates multiple training examples, allowing the model to learn from different segments of the time series. This approach maximizes the use of available data and helps the model generalize better.

Step 4: Training Loop

In[22]:

1# Training setup
2optimizer = optim.Adam(model.parameters(), lr=0.001)
3criterion = nn.MSELoss()
4
5# Training loop
6epochs = 1000
7model.train()
8training_losses = []
9
10for epoch in range(epochs):
11    optimizer.zero_grad()
12    
13    # Forward pass
14    predictions = model(X_tensor)
15    
16    # Calculate loss
17    loss = criterion(predictions, y_tensor)
18    
19    # Backward pass
20    loss.backward()
21    optimizer.step()
22    
23    training_losses.append(loss.item())

1# Training setup
2optimizer = optim.Adam(model.parameters(), lr=0.001)
3criterion = nn.MSELoss()
4
5# Training loop
6epochs = 1000
7model.train()
8training_losses = []
9
10for epoch in range(epochs):
11    optimizer.zero_grad()
12    
13    # Forward pass
14    predictions = model(X_tensor)
15    
16    # Calculate loss
17    loss = criterion(predictions, y_tensor)
18    
19    # Backward pass
20    loss.backward()
21    optimizer.step()
22    
23    training_losses.append(loss.item())

Out[23]:

Epoch 0, Loss: 18451.0273
Epoch 200, Loss: 22.1700
Epoch 400, Loss: 10.0080
Epoch 600, Loss: 8.1038
Epoch 800, Loss: 7.4347
Final training loss: 6.8766

Out[24]:

Visualization

Training curve showing the loss decreasing over epochs during N-BEATS training. The steady decrease in loss indicates that the model is successfully learning to capture the trend and seasonal patterns in the time series data. The convergence behavior helps diagnose training effectiveness and potential overfitting.

The training loss decreases steadily over epochs, indicating that the model is successfully learning to capture the trend and seasonal patterns in the data. The steady decline suggests the model is converging rather than overfitting, as the loss continues to decrease without signs of instability. The final loss value provides a baseline for comparison. Lower values indicate better fit to the training data, though you should also evaluate on validation data to assess generalization performance.

Step 5: Making Predictions

In[25]:

1# Make prediction on last window
2model.eval()
3with torch.no_grad():
4    last_window = data[-12:].reshape(1, -1)
5    last_window_tensor = torch.FloatTensor(last_window)
6    
7    forecast = model(last_window_tensor)
8    forecast_np = forecast.numpy().flatten()
9
10# Generate "actual" future values for comparison (using the same generating process)
11np.random.seed(43)  # Different seed for future data
12t_future = np.arange(months, months + 6)
13trend_future = 100 + 0.5 * t_future
14seasonal_future = 10 * np.sin(2 * np.pi * t_future / 12) + 5 * np.cos(2 * np.pi * t_future / 12)
15noise_future = np.random.normal(0, 2, 6)
16actual_future = trend_future + seasonal_future + noise_future
17
18# Calculate performance metrics
19mae = np.mean(np.abs(forecast_np - actual_future))
20rmse = np.sqrt(np.mean((forecast_np - actual_future) ** 2))

1# Make prediction on last window
2model.eval()
3with torch.no_grad():
4    last_window = data[-12:].reshape(1, -1)
5    last_window_tensor = torch.FloatTensor(last_window)
6    
7    forecast = model(last_window_tensor)
8    forecast_np = forecast.numpy().flatten()
9
10# Generate "actual" future values for comparison (using the same generating process)
11np.random.seed(43)  # Different seed for future data
12t_future = np.arange(months, months + 6)
13trend_future = 100 + 0.5 * t_future
14seasonal_future = 10 * np.sin(2 * np.pi * t_future / 12) + 5 * np.cos(2 * np.pi * t_future / 12)
15noise_future = np.random.normal(0, 2, 6)
16actual_future = trend_future + seasonal_future + noise_future
17
18# Calculate performance metrics
19mae = np.mean(np.abs(forecast_np - actual_future))
20rmse = np.sqrt(np.mean((forecast_np - actual_future) ** 2))

Out[26]:

N-BEATS Forecast:
Month 1: 157.56
Month 2: 153.53
Month 3: 147.26
Month 4: 142.23
Month 5: 141.72
Month 6: 142.19

Performance Metrics:
MAE: 1.60
RMSE: 2.02

Out[27]:

Visualization

N-BEATS PyTorch implementation predictions compared to actual values. The plot shows the training data (blue), the model's 6-month forecast (green), and the actual future values (black). The vertical line separates the training period from the forecast period, demonstrating the model's ability to extrapolate learned trend and seasonal patterns.

The forecast shows the predicted values for the next 6 months, extrapolating the learned trend and seasonal patterns from the training data. The MAE (Mean Absolute Error) measures the average absolute difference between predictions and actual values, while RMSE (Root Mean Squared Error) penalizes larger errors more heavily. Lower values for both metrics indicate better forecast accuracy. For this synthetic dataset, the relatively low MAE and RMSE values suggest the model has successfully learned the underlying trend and seasonal patterns. In practice, compare these metrics to baseline methods (such as naive forecasts or seasonal averages) to assess whether N-BEATS provides meaningful improvements.

Key Parameters

Below are some of the main parameters that affect how N-BEATS works and performs.

input_size: Number of historical time series values used as input (default: 12). Should match or exceed the seasonal period. Use 12 for monthly data with yearly seasonality, 7 or 14 for daily data with weekly patterns. Larger windows capture more historical context but may include irrelevant distant patterns.
forecast_horizon: Number of future periods to forecast (default: 6). Shorter horizons (1-6 periods) are typically more accurate but may not provide sufficient lead time. Longer horizons (12-24 periods) are useful for strategic planning but require more historical data. Keep the forecast horizon to at most half the length of your training data.
hidden_size: Size of hidden layers in the neural network (default: 64). Controls model complexity and capacity. Larger values (128-256) can capture more complex patterns but increase computation time and risk overfitting with limited data. Smaller values (32-64) work well for simpler patterns and smaller datasets.
learning_rate: Learning rate for the Adam optimizer (default: 0.001). Controls how quickly the model learns. Lower values (0.0001-0.001) provide more stable training but may require more epochs. Higher values (0.01-0.1) train faster but may be unstable. Start with 0.001 and adjust based on training behavior.
epochs: Number of training iterations (default: 1000). More epochs allow the model to learn better but increase training time. Monitor training and validation loss to determine when to stop. Use early stopping if validation loss stops improving.
random_state: Seed for reproducibility (default: None). Set to an integer to ensure consistent results across runs. Important for debugging and comparing different model configurations.

Key Methods

The following are the most commonly used methods for interacting with N-BEATS models.

forward(x): Generates forecasts for input sequences. Takes a tensor of shape (batch_size, input_size) containing historical time series values and returns predictions of shape (batch_size, forecast_horizon). This is called automatically when using model(x).
train(): Sets the model to training mode, enabling dropout and batch normalization updates. Call this before training to ensure the model learns from the data.
eval(): Sets the model to evaluation mode, disabling dropout and freezing batch normalization statistics. Call this before making predictions to ensure consistent, deterministic outputs.
parameters(): Returns an iterator over all model parameters (weights and biases). Used by optimizers to update parameters during training. Can also be used to inspect or modify specific parameters.

Practical Implications

N-BEATS is well-suited for forecasting problems where both accuracy and interpretability are important. The model's ability to decompose time series into separate trend and seasonal components makes it particularly effective in business contexts where stakeholders need to understand what drives predictions. In retail and demand forecasting, for example, distinguishing between underlying growth trends and seasonal patterns helps with inventory management and capacity planning decisions.

The model's interpretable architecture also makes it valuable in regulated industries where explainable models are required. In financial forecasting, N-BEATS can provide clear insights into whether market movements reflect long-term trends or cyclical patterns, supporting portfolio allocation and risk management decisions. The separate trend and seasonal components facilitate regulatory reporting and help stakeholders understand the basis for forecasts.

N-BEATS works best with time series that exhibit clear trend and seasonal patterns. The model is less effective for highly irregular or chaotic time series, or for data with very short histories that don't contain complete seasonal cycles. When working with multiple related time series (such as product-level forecasts in retail), N-BEATS can be applied to each series independently, making it suitable for scenarios requiring many parallel forecasts.

Best Practices

To achieve optimal results with N-BEATS, set the input window size to match your seasonal period. For monthly data with yearly seasonality, use an input window of 12 months to capture one complete cycle. For daily data with weekly patterns, use 7 or 14 days. The input window should be at least as large as the seasonal period, but avoid making it much larger than necessary, as this can introduce noise from distant historical patterns.

Set the forecast horizon based on your planning needs and data availability. Shorter horizons (1-6 periods) typically provide more accurate forecasts but may not offer sufficient lead time. Longer horizons (12-24 periods) are useful for strategic planning but require more historical data and may have higher uncertainty. As a guideline, keep the forecast horizon to at most half the length of your training data to maintain reliability.

Normalize your time series before training by subtracting the mean and dividing by the standard deviation. This improves training stability and convergence, especially when working with time series of different scales or when training on multiple series simultaneously. Use the same normalization parameters during inference that were used during training to maintain consistency.

Monitor the learned trend and seasonal components during training to verify the model is capturing meaningful patterns. If components appear random or unstable, reduce model complexity (fewer basis functions or hidden units), increase training data, or adjust the learning rate. The interpretable nature of N-BEATS makes it straightforward to diagnose issues by examining the decomposition components.

Data Requirements and Preprocessing

N-BEATS requires sufficient historical data to learn meaningful patterns. Plan for at least 2-3 times the forecast horizon in historical data. For a 12-month forecast, provide 24-36 months of history. This ensures the model can identify both trend and seasonal patterns. Additionally, the time series should contain at least one complete seasonal cycle in the training data for the model to learn seasonal patterns effectively.

Handle missing values before training using forward-filling, backward-filling, or interpolation methods. For time series with extensive missing data (more than 10-15% missing), consider more sophisticated imputation techniques such as seasonal decomposition-based imputation or collecting additional data. The model works best with continuous, regularly sampled time series without large gaps.

The model handles trend-stationary time series well, where trends are present but the variance remains relatively stable. For data with changing variance or structural breaks, apply log transformations to stabilize variance, but avoid differencing as it removes trend information that N-BEATS is designed to capture. If your time series exhibits multiple seasonal patterns (e.g., both daily and weekly seasonality), you'll need to extend the standard architecture or use variants that support multiple seasonal components, as the base implementation focuses on a single seasonal period.

Common Pitfalls

Using an input window that doesn't match the seasonal period is a frequent mistake. If the window is smaller than the seasonal period, the model cannot learn complete seasonal patterns. If it's much larger than necessary, distant historical patterns may introduce noise. Set the input window to match the seasonal period (e.g., 12 for monthly data with yearly seasonality) or a small multiple of it.

Setting the forecast horizon too long relative to available data leads to unreliable predictions. N-BEATS extrapolates patterns from historical data, and forecasts extending far beyond the training history become increasingly uncertain. Keep the forecast horizon to at most half the training data length. For example, with 24 months of data, limit forecasts to 12 months or less.

Ignoring structural breaks or regime changes can degrade forecast quality. If your time series exhibits sudden changes in trend or seasonality (such as after major events or policy changes), the model may not adapt well. In these cases, train separate models for different regimes, use change point detection to segment the data, or apply techniques designed to handle structural breaks. Overfitting can also occur when the model complexity exceeds what the data supports. With limited historical data (less than 2-3 times the forecast horizon), use simpler architectures with fewer basis functions or hidden units. Monitor validation loss during training and implement early stopping if validation loss increases while training loss decreases.

Computational Considerations

N-BEATS's computational complexity is determined by the neural network architecture and basis function count. For an implementation with input window size $W$ , forecast horizon $H$ , and hidden layer size $D$ , the forward pass complexity is approximately $O(W \cdot D + D^2 + D \cdot H \cdot K)$ , where $K$ is the number of basis functions per component. Training time scales with training samples and epochs, typically requiring minutes to hours depending on dataset size and hardware. For datasets with more than 10,000 training samples, consider using GPU acceleration or reducing model complexity.

Memory requirements are moderate, driven primarily by neural network weights and intermediate activations. A model with hidden size 64 and forecast horizon 12 typically uses a few megabytes per instance. When training on multiple time series, memory scales linearly with batch size. For large-scale deployments forecasting hundreds or thousands of time series, consider batch processing or parallel training across series to manage memory efficiently.

Inference time is fast (typically milliseconds per forecast) since it requires only a single forward pass through the neural network. This makes N-BEATS suitable for real-time applications requiring frequent forecast updates. For batch forecasting of many time series, the model can process multiple input windows simultaneously, making it efficient for scenarios like product-level demand forecasting in retail.

Performance and Deployment Considerations

Evaluate N-BEATS performance using metrics appropriate for your forecasting task. For point forecasts, use mean absolute error (MAE), root mean squared error (RMSE), or mean absolute percentage error (MAPE). For interval forecasts, consider coverage probability and interval width. The model's interpretable architecture enables component-level evaluation, allowing you to assess trend and seasonal decompositions separately to identify which components contribute most to forecast errors.

Implement a retraining schedule based on how quickly your time series patterns change. For stable time series with consistent patterns, monthly or quarterly retraining may suffice. For rapidly changing patterns or time series affected by external factors, consider weekly or even daily retraining. Monitor the learned trend and seasonal components over time; unexpected changes in these components or declining forecast accuracy indicate the model needs updating.

When deploying in production, ensure preprocessing steps (normalization, missing value handling) are applied consistently between training and inference. Any changes to the preprocessing pipeline require model retraining to maintain consistency. The interpretable nature of N-BEATS facilitates validation and debugging. Examine the learned components to understand model behavior, verify that trend and seasonal patterns match domain expectations, and identify potential issues before they affect forecast quality.

Summary

N-BEATS combines the interpretability of traditional statistical methods with the flexibility of deep learning. The model's basis expansion approach allows it to decompose time series into interpretable trend and seasonal components, providing transparency that is important for business applications and regulatory compliance.

The architecture's strength lies in its ability to automatically learn appropriate basis functions for different types of time series patterns, making it robust and general-purpose across diverse forecasting scenarios. Unlike black-box deep learning models, N-BEATS provides clear insights into how forecasts are generated, enabling practitioners to understand and trust the model's predictions.

While N-BEATS may require more computational resources than traditional statistical methods, its interpretability and strong performance make it an excellent choice for applications where understanding the forecasting process is as important as accuracy. The model's modular design also allows for easy customization and extension to specific forecasting problems, making it a valuable tool for practitioners working with diverse time series data.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about N-BEATS.

Loading component...

Back to Data Science Handbook

Previous Chapter

Prophet

Next Chapter

N-HiTS

Reference

BIBTEXAcademic

@misc{nbeatsneuralbasisexpansionanalysisfortimeseriesforecasting, author = {Michael Brenndoerfer}, title = {N-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting}, year = {2025}, url = {https://mbrenndoerfer.com/writing/nbeats-neural-basis-expansion-analysis-time-series-forecasting}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-21} }

APAAcademic

Michael Brenndoerfer (2025). N-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting. Retrieved from https://mbrenndoerfer.com/writing/nbeats-neural-basis-expansion-analysis-time-series-forecasting

MLAAcademic

Michael Brenndoerfer. "N-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting." 2025. Web. 11/21/2025. <https://mbrenndoerfer.com/writing/nbeats-neural-basis-expansion-analysis-time-series-forecasting>.

CHICAGOAcademic

Michael Brenndoerfer. "N-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting." Accessed 11/21/2025. https://mbrenndoerfer.com/writing/nbeats-neural-basis-expansion-analysis-time-series-forecasting.

HARVARDAcademic

Michael Brenndoerfer (2025) 'N-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting'. Available at: https://mbrenndoerfer.com/writing/nbeats-neural-basis-expansion-analysis-time-series-forecasting (Accessed: 11/21/2025).

SimpleBasic

Michael Brenndoerfer (2025). N-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting. https://mbrenndoerfer.com/writing/nbeats-neural-basis-expansion-analysis-time-series-forecasting

Direct link:

https://mbrenndoerfer.com/writing/nbeats-neural-basis-expansion-analysis-time-series-forecasting

Part of Data Science Handbook

This article is part of the free-to-read Data Science Handbook

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

InteractiveN-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting

N-BEATS: Neural Basis Expansion Analysis for Time Series

Advantages

Disadvantages

Formula

The Core Insight: Decomposition as the Foundation

Formalizing the Decomposition

Trend Basis Functions: Capturing Growth Patterns

Seasonal Basis Functions: Modeling Periodic Patterns

Neural Network Architecture: Learning Coefficients from Data

Loss Function: Guiding the Learning Process

Why This Mathematical Framework Works: The Complete Picture

Visualizing N-BEATS

Example

Step 1: Prepare the Input Data

Step 2: Define the N-BEATS Architecture

Step 3: Prepare Training Data

Step 4: Train the Model

Step 5: Make Predictions

Implementation in PyTorch

Step 1: Import Required Libraries

Step 2: Define the N-BEATS Architecture

Step 3: Data Preparation and Training

Step 4: Training Loop

Step 5: Making Predictions

Key Parameters

Key Methods

Practical Implications

Best Practices

Data Requirements and Preprocessing

Common Pitfalls

Computational Considerations

Performance and Deployment Considerations

Summary

Quiz

Prophet

N-HiTS

Reference

About the author: Michael Brenndoerfer

Related Content

NHITS: Neural Hierarchical Interpolation for Time Series Forecasting with Multi-Scale Decomposition & Implementation

HDBSCAN Clustering: Complete Guide to Hierarchical Density-Based Clustering with Automatic Cluster Selection

Hierarchical Clustering: Complete Guide with Dendrograms, Linkage Criteria & Implementation

Stay updated