Standardization: Normalizing Features for Fair Comparison

Michael Brenndoerfer

Data, Analytics & AI Software Engineering Machine Learning Data Science Handbook

A comprehensive guide to standardization in machine learning, covering mathematical foundations, practical implementation, and Python examples. Learn how to properly standardize features for fair comparison across different scales and units.

Part of Data Science Handbook

This article is part of the free-to-read Data Science Handbook

View full handbook

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Standardization is a crucial preprocessing technique that rescales features to have mean 0 and variance 1, ensuring that machine learning algorithms treat all features fairly regardless of their original units or scales. This process is essential for many algorithms, particularly those that rely on distance calculations or regularization penalties.

Introduction

In real-world datasets, features often have vastly different scales and units. For example, a dataset might contain house prices (in thousands of dollars), square footage (in hundreds of square feet), and number of bedrooms (single digits). Without standardization, algorithms like LASSO regression or k-means clustering would be dominated by features with larger numeric values, leading to biased results and poor model performance.

Standardization transforms each feature to have a mean of 0 and standard deviation of 1, putting all features on the same scale. This ensures that:

Regularization penalties treat all features equally
Distance-based algorithms work correctly
Gradient-based optimization converges more efficiently
Model coefficients become comparable across features

Mathematical Foundation

The Standardization Formula

For a feature $j$ with $n$ observations, standardization transforms each value $x_{ij}$ to $z_{ij}$ using:

z_{ij} = \frac{x_{ij} - \bar{x}_j}{s_j}

where:

$z_{ij}$ is the standardized value of feature $j$ for observation $i$
$\bar{x}_j = \frac{1}{n}\sum_{i=1}^n x_{ij}$ is the mean of feature $j$
$s_j = \sqrt{\frac{1}{n}\sum_{i=1}^n (x_{ij} - \bar{x}_j)^2}$ is the standard deviation of feature $j$

Key Properties

After standardization, each feature in your dataset is transformed so that it has a mean of zero and a standard deviation of one. This transformation ensures that all features, regardless of their original scale or units, are directly comparable and contribute equally to the analysis. In practical terms, this means that no single feature will dominate the learning process simply because it has larger numeric values. Instead, every feature is centered and scaled, allowing algorithms—especially those sensitive to feature magnitude, such as regularized regression or clustering—to perform optimally and fairly.

Mean: $\bar{z}_j = 0$ for all features
Standard deviation: $s_{z_j} = 1$ for all features
Variance: $\text{Var}(z_j) = 1$ for all features

This ensures that all features contribute equally to distance calculations and regularization penalties.

Visual Example

This example demonstrates how standardization transforms features with different scales into a common scale:

Out[1]:

Visualization

Original features with vastly different scales. House size ranges from 1000-2000 square feet, while number of bedrooms ranges from 2-4. Without standardization, algorithms would be dominated by the house size feature due to its larger numeric values.

Standardized features on the same scale. Both features now have mean 0 and standard deviation 1, ensuring fair treatment by machine learning algorithms. The relative relationships within each feature are preserved while making them comparable.

Original data:
House Size - Mean: 1516.67 Std: 338.71
Bedrooms - Mean: 3.0 Std: 0.82

Standardized data:
House Size - Mean: -0.0 Std: 1.0
Bedrooms - Mean: 0.0 Std: 1.0

Example: Step-by-Step Calculation

Let's work through a detailed example with two features:

x1: house size in square feet → [1000, 1500, 2000]
x2: number of bedrooms → [2, 3, 4]

Step 1: Calculate means

$\bar{x}_1 = (1000 + 1500 + 2000)/3 = 1500$
$\bar{x}_2 = (2 + 3 + 4)/3 = 3$

Step 2: Calculate standard deviations

$s_1 = \sqrt{((1000-1500)^2 + (1500-1500)^2 + (2000-1500)^2)/3} = \sqrt{(250000 + 0 + 250000)/3} \approx 408.25$
$s_2 = \sqrt{((2-3)^2 + (3-3)^2 + (4-3)^2)/3} = \sqrt{(1 + 0 + 1)/3} \approx 0.82$

Step 3: Apply standardization formula

For the first feature ( $j=1$ ):

$z_{11} = \frac{1000 - 1500}{408.25} \approx -1.225$
$z_{21} = \frac{1500 - 1500}{408.25} = 0.000$
$z_{31} = \frac{2000 - 1500}{408.25} \approx 1.225$

For the second feature ( $j=2$ ):

$z_{12} = \frac{2 - 3}{0.82} \approx -1.225$
$z_{22} = \frac{3 - 3}{0.82} = 0.000$
$z_{32} = \frac{4 - 3}{0.82} \approx 1.225$

Result: Both features are now on the same scale:

x1: [1000, 1500, 2000] → [-1.225, 0.000, 1.225]
x2: [2, 3, 4] → [-1.225, 0.000, 1.225]

Practical Implementation

Proper Train-Test Split with Standardization

This example demonstrates the correct way to apply standardization in a machine learning pipeline:

In[2]:

1import numpy as np
2from sklearn.preprocessing import StandardScaler
3from sklearn.linear_model import Lasso
4from sklearn.model_selection import train_test_split
5from sklearn.metrics import mean_squared_error
6
7# Set random seed for reproducibility
8np.random.seed(42)
9
10# Create sample dataset
11X = np.random.randn(100, 3)
12X[:, 0] *= 1000  # Scale first feature to be much larger
13X[:, 1] *= 10  # Scale second feature moderately
14# Third feature remains small scale
15
16# Create target variable
17y = 2 * X[:, 0] + 3 * X[:, 1] + 0.5 * X[:, 2] + np.random.randn(100) * 0.1
18
19# Split data
20X_train, X_test, y_train, y_test = train_test_split(
21    X, y, test_size=0.2, random_state=42
22)
23
24# CORRECT: Fit scaler only on training data
25scaler = StandardScaler()
26X_train_scaled = scaler.fit_transform(X_train)
27X_test_scaled = scaler.transform(X_test)
28
29# Train model
30model = Lasso(alpha=0.1)
31model.fit(X_train_scaled, y_train)
32
33# Make predictions
34# The test data is not scaled, so we need to scale it using the same scaler
35y_pred = model.predict(X_test_scaled)
36mse = mean_squared_error(y_test, y_pred)
37
38print("Model coefficients:", model.coef_)
39print("Test MSE:", mse)
40print("Training data mean:", np.mean(X_train_scaled, axis=0))
41print("Training data std:", np.std(X_train_scaled, axis=0))

1import numpy as np
2from sklearn.preprocessing import StandardScaler
3from sklearn.linear_model import Lasso
4from sklearn.model_selection import train_test_split
5from sklearn.metrics import mean_squared_error
6
7# Set random seed for reproducibility
8np.random.seed(42)
9
10# Create sample dataset
11X = np.random.randn(100, 3)
12X[:, 0] *= 1000  # Scale first feature to be much larger
13X[:, 1] *= 10  # Scale second feature moderately
14# Third feature remains small scale
15
16# Create target variable
17y = 2 * X[:, 0] + 3 * X[:, 1] + 0.5 * X[:, 2] + np.random.randn(100) * 0.1
18
19# Split data
20X_train, X_test, y_train, y_test = train_test_split(
21    X, y, test_size=0.2, random_state=42
22)
23
24# CORRECT: Fit scaler only on training data
25scaler = StandardScaler()
26X_train_scaled = scaler.fit_transform(X_train)
27X_test_scaled = scaler.transform(X_test)
28
29# Train model
30model = Lasso(alpha=0.1)
31model.fit(X_train_scaled, y_train)
32
33# Make predictions
34# The test data is not scaled, so we need to scale it using the same scaler
35y_pred = model.predict(X_test_scaled)
36mse = mean_squared_error(y_test, y_pred)
37
38print("Model coefficients:", model.coef_)
39print("Test MSE:", mse)
40print("Training data mean:", np.mean(X_train_scaled, axis=0))
41print("Training data std:", np.std(X_train_scaled, axis=0))

Out[2]:

Model coefficients: [1.68083062e+03 2.67552877e+01 4.19099894e-01]
Test MSE: 0.04795636619638803
Training data mean: [-2.67147415e-17  3.46944695e-17 -5.68989300e-17]
Training data std: [1. 1. 1.]

Key Implementation Guidelines

Standardization is a simple but crucial step in the machine learning workflow. Here are the most important guidelines to follow:

Fit the scaler only on the training data.
This ensures that information from the test set does not leak into the model during training. Fitting on the entire dataset (including the test set) can lead to overly optimistic performance estimates and poor generalization.
Transform both training and test data using the same scaler.
After fitting the scaler on the training data, use it to transform both the training and test sets. This guarantees that the scaling parameters (mean and standard deviation) are consistent and based solely on the training data.
Never fit the scaler on the entire dataset before splitting.
Doing so introduces data leakage, as the test set statistics influence the scaling of the training data.
Use pipelines to automate and safeguard the process.
Scikit-learn pipelines help ensure that standardization and modeling steps are applied correctly and in the right order, reducing the risk of data leakage and making your workflow more reproducible.

By following these guidelines, you ensure that your model evaluation is fair and that your results will generalize well to new, unseen data.

In[3]:

1from sklearn.pipeline import Pipeline
2
3# Create pipeline with standardization and model
4pipeline = Pipeline([("scaler", StandardScaler()), ("model", Lasso(alpha=0.1))])
5
6# Fit pipeline on training data
7pipeline.fit(X_train, y_train)
8
9# Make predictions on test data
10y_pred_pipeline = pipeline.predict(X_test)
11mse_pipeline = mean_squared_error(y_test, y_pred_pipeline)
12
13print("Pipeline MSE:", mse_pipeline)

1from sklearn.pipeline import Pipeline
2
3# Create pipeline with standardization and model
4pipeline = Pipeline([("scaler", StandardScaler()), ("model", Lasso(alpha=0.1))])
5
6# Fit pipeline on training data
7pipeline.fit(X_train, y_train)
8
9# Make predictions on test data
10y_pred_pipeline = pipeline.predict(X_test)
11mse_pipeline = mean_squared_error(y_test, y_pred_pipeline)
12
13print("Pipeline MSE:", mse_pipeline)

Out[3]:

Pipeline MSE: 0.04795636619638803

When to Use Standardization

Standardization is especially important for certain types of machine learning algorithms, while for others it is less critical.

Algorithms that require standardization include:

LASSO and Ridge regression: These methods use regularization penalties that assume all features are on the same scale. Without standardization, features with larger numeric ranges can dominate the penalty and skew the model.
k-means clustering: Since this algorithm relies on distance calculations, features must be on comparable scales to ensure that no single variable dominates the clustering process.
Support Vector Machines: The kernel functions used in SVMs are distance-based, so standardized features are essential for fair and effective separation.
Neural networks: Gradient-based optimization in neural networks converges more efficiently when inputs are normalized.
Principal Component Analysis (PCA): As PCA is based on variance, standardizing features ensures that each variable contributes appropriately to the dimensionality reduction.

When standardization is less critical:

Decision trees: These algorithms are scale-invariant, meaning their splitting criteria are unaffected by the magnitude of input variables.
Random Forest: As an ensemble of decision trees, Random Forests inherit this scale invariance.
Logistic regression: While standardization is not strictly required, it becomes important when regularization is used, as the penalty terms are sensitive to feature scale.

Limitations and Considerations

When applying standardization, keep in mind several important limitations:

Sparse data: For sparse matrices, standardizing by subtracting the mean can convert the data into a dense format, which is inefficient and memory-intensive.
- Tip: Use StandardScaler(with_mean=False) to avoid densifying the matrix.
Outliers: Standardization is sensitive to outliers, as extreme values can disproportionately affect the mean and standard deviation, potentially distorting the transformation.
Categorical variables: Do not standardize one-hot encoded or ordinal categorical variables, as this can destroy their intended meaning.
Target variable: In most cases, avoid standardizing the target variable unless there is a specific reason to do so.

Be aware of common pitfalls:

Data leakage: Fitting the scaler on the entire dataset (including the test set) introduces information from the test data into the training process, leading to overly optimistic performance estimates.
- Best practice: Always fit the scaler only on the training data, then use it to transform both the training and test sets.
Inconsistent scaling: Using different scalers for training and test data can result in mismatched feature distributions.
Over-standardization: Standardizing features that are already normalized can be unnecessary or even harmful.
Categorical confusion: Standardizing categorical variables that should remain discrete can undermine their interpretability and utility.

Loading component...

Practical Applications

Standardization is essential in many real-world scenarios, such as:

Financial modeling: Combining features like stock prices, trading volumes, and economic indicators, which may be on vastly different scales.
Image processing: Normalizing pixel values across different image formats.
Natural language processing: Integrating word counts, document lengths, and TF-IDF scores.
Healthcare analytics: Combining lab values, vital signs, and demographic data, all of which may have different units and ranges.
Recommendation systems: Merging user ratings, item features, and interaction counts.

Summary

Standardization is a fundamental preprocessing step that ensures features are treated fairly in machine learning algorithms. By transforming features to have a mean of zero and a standard deviation of one, standardization:

Enables fair comparison across variables with different scales
Improves the performance of distance-based and regularization methods
Prevents bias toward features with larger numeric values
Stabilizes optimization by normalizing gradients

The key to successful standardization is proper implementation: always fit the scaler on training data only, then transform both training and test sets using the fitted scaler. This prevents data leakage and ensures realistic model evaluation.

Back to Data Science Handbook

Previous Chapter

R-squared

Next Chapter

Normalization

Reference

BIBTEXAcademic

@misc{standardizationnormalizingfeaturesforfaircomparisoncompleteguidewithmathformulaspythonimplementation, author = {Michael Brenndoerfer}, title = {Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation}, year = {2025}, url = {https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-21} }

APAAcademic

Michael Brenndoerfer (2025). Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation. Retrieved from https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn

MLAAcademic

Michael Brenndoerfer. "Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation." 2025. Web. 11/21/2025. <https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn>.

CHICAGOAcademic

Michael Brenndoerfer. "Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation." Accessed 11/21/2025. https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation'. Available at: https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn (Accessed: 11/21/2025).

SimpleBasic

Michael Brenndoerfer (2025). Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation. https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn

Direct link:

https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn

Part of Data Science Handbook

This article is part of the free-to-read Data Science Handbook

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

InteractiveStandardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation