Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation

Michael BrenndoerferApril 25, 202511 min read

A comprehensive guide to standardization in machine learning, covering mathematical foundations, practical implementation, and Python examples. Learn how to properly standardize features for fair comparison across different scales and units.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Standardization: Normalizing Features for Fair Comparison

Standardization is a crucial preprocessing technique that rescales features to have mean 0 and variance 1, ensuring that machine learning algorithms treat all features fairly regardless of their original units or scales. This process is essential for many algorithms, particularly those that rely on distance calculations or regularization penalties.

Introduction

In real-world datasets, features often have vastly different scales and units. For example, a dataset might contain house prices (in thousands of dollars), square footage (in hundreds of square feet), and number of bedrooms (single digits). Without standardization, algorithms like LASSO regression or k-means clustering would be dominated by features with larger numeric values, leading to biased results and poor model performance.

Standardization transforms each feature to have a mean of 0 and standard deviation of 1, putting all features on the same scale. This ensures that:

  • Regularization penalties treat all features equally
  • Distance-based algorithms work correctly
  • Gradient-based optimization converges more efficiently
  • Model coefficients become comparable across features

Mathematical Foundation

The Standardization Formula

For a feature jj with nn observations, standardization transforms each value xijx_{ij} to zijz_{ij} using:

zij=xijxˉjsjz_{ij} = \frac{x_{ij} - \bar{x}_j}{s_j}

where:

  • zijz_{ij} is the standardized value of feature jj for observation ii
  • xˉj=1ni=1nxij\bar{x}_j = \frac{1}{n}\sum_{i=1}^n x_{ij} is the mean of feature jj
  • sj=1ni=1n(xijxˉj)2s_j = \sqrt{\frac{1}{n}\sum_{i=1}^n (x_{ij} - \bar{x}_j)^2} is the standard deviation of feature jj

Key Properties

After standardization, each feature in your dataset is transformed so that it has a mean of zero and a standard deviation of one. This transformation ensures that all features, regardless of their original scale or units, are directly comparable and contribute equally to the analysis. In practical terms, this means that no single feature will dominate the learning process simply because it has larger numeric values. Instead, every feature is centered and scaled, allowing algorithms—especially those sensitive to feature magnitude, such as regularized regression or clustering—to perform optimally and fairly.

  • Mean: zˉj=0\bar{z}_j = 0 for all features
  • Standard deviation: szj=1s_{z_j} = 1 for all features
  • Variance: Var(zj)=1\text{Var}(z_j) = 1 for all features

This ensures that all features contribute equally to distance calculations and regularization penalties.

Visual Example

This example demonstrates how standardization transforms features with different scales into a common scale:

Out[2]:
Visualization
Scatter plot showing original features with different scales.
Original features with vastly different scales. House size ranges from 1000-2000 square feet, while number of bedrooms ranges from 2-4. Without standardization, algorithms would be dominated by the house size feature due to its larger numeric values.
Scatter plot showing standardized features on same scale.
Standardized features on the same scale. Both features now have mean 0 and standard deviation 1, ensuring fair treatment by machine learning algorithms. The relative relationships within each feature are preserved while making them comparable.
Original data:
House Size - Mean: 1516.67 Std: 338.71
Bedrooms - Mean: 3.0 Std: 0.82

Standardized data:
House Size - Mean: -0.0 Std: 1.0
Bedrooms - Mean: 0.0 Std: 1.0

Example: Step-by-Step Calculation

Let's work through a detailed example with two features:

  • x1: house size in square feet → [1000, 1500, 2000]
  • x2: number of bedrooms → [2, 3, 4]

Step 1: Calculate means

  • xˉ1=(1000+1500+2000)/3=1500\bar{x}_1 = (1000 + 1500 + 2000)/3 = 1500
  • xˉ2=(2+3+4)/3=3\bar{x}_2 = (2 + 3 + 4)/3 = 3

Step 2: Calculate standard deviations

  • s1=((10001500)2+(15001500)2+(20001500)2)/3=(250000+0+250000)/3408.25s_1 = \sqrt{((1000-1500)^2 + (1500-1500)^2 + (2000-1500)^2)/3} = \sqrt{(250000 + 0 + 250000)/3} \approx 408.25
  • s2=((23)2+(33)2+(43)2)/3=(1+0+1)/30.82s_2 = \sqrt{((2-3)^2 + (3-3)^2 + (4-3)^2)/3} = \sqrt{(1 + 0 + 1)/3} \approx 0.82

Step 3: Apply standardization formula

For the first feature (j=1j=1):

  • z11=10001500408.251.225z_{11} = \frac{1000 - 1500}{408.25} \approx -1.225
  • z21=15001500408.25=0.000z_{21} = \frac{1500 - 1500}{408.25} = 0.000
  • z31=20001500408.251.225z_{31} = \frac{2000 - 1500}{408.25} \approx 1.225

For the second feature (j=2j=2):

  • z12=230.821.225z_{12} = \frac{2 - 3}{0.82} \approx -1.225
  • z22=330.82=0.000z_{22} = \frac{3 - 3}{0.82} = 0.000
  • z32=430.821.225z_{32} = \frac{4 - 3}{0.82} \approx 1.225

Result: Both features are now on the same scale:

  • x1: [1000, 1500, 2000] → [-1.225, 0.000, 1.225]
  • x2: [2, 3, 4] → [-1.225, 0.000, 1.225]

Practical Implementation

Proper Train-Test Split with Standardization

This example demonstrates the correct way to apply standardization in a machine learning pipeline:

In[3]:
Code
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Set random seed for reproducibility
np.random.seed(42)

# Create sample dataset
X = np.random.randn(100, 3)
X[:, 0] *= 1000  # Scale first feature to be much larger
X[:, 1] *= 10  # Scale second feature moderately
# Third feature remains small scale

# Create target variable
y = 2 * X[:, 0] + 3 * X[:, 1] + 0.5 * X[:, 2] + np.random.randn(100) * 0.1

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# CORRECT: Fit scaler only on training data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train model
model = Lasso(alpha=0.1)
model.fit(X_train_scaled, y_train)

# Make predictions
# The test data is not scaled, so we need to scale it using the same scaler
y_pred = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)

print("Model coefficients:", model.coef_)
print("Test MSE:", mse)
print("Training data mean:", np.mean(X_train_scaled, axis=0))
print("Training data std:", np.std(X_train_scaled, axis=0))
Out[3]:
Console
Model coefficients: [1.68083062e+03 2.67552877e+01 4.19099894e-01]
Test MSE: 0.04795636619638803
Training data mean: [-2.67147415e-17  3.46944695e-17 -5.68989300e-17]
Training data std: [1. 1. 1.]

Key Implementation Guidelines

Standardization is a simple but crucial step in the machine learning workflow. Here are the most important guidelines to follow:

  • Fit the scaler only on the training data.
    This ensures that information from the test set does not leak into the model during training. Fitting on the entire dataset (including the test set) can lead to overly optimistic performance estimates and poor generalization.

  • Transform both training and test data using the same scaler.
    After fitting the scaler on the training data, use it to transform both the training and test sets. This guarantees that the scaling parameters (mean and standard deviation) are consistent and based solely on the training data.

  • Never fit the scaler on the entire dataset before splitting.
    Doing so introduces data leakage, as the test set statistics influence the scaling of the training data.

  • Use pipelines to automate and safeguard the process.
    Scikit-learn pipelines help ensure that standardization and modeling steps are applied correctly and in the right order, reducing the risk of data leakage and making your workflow more reproducible.

By following these guidelines, you ensure that your model evaluation is fair and that your results will generalize well to new, unseen data.

In[4]:
Code
from sklearn.pipeline import Pipeline

# Create pipeline with standardization and model
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', Lasso(alpha=0.1))
])

# Fit pipeline on training data
pipeline.fit(X_train, y_train)

# Make predictions on test data
y_pred_pipeline = pipeline.predict(X_test)
mse_pipeline = mean_squared_error(y_test, y_pred_pipeline)

print("Pipeline MSE:", mse_pipeline)
Out[4]:
Console
Pipeline MSE: 0.04795636619638803

When to Use Standardization

Standardization is especially important for certain types of machine learning algorithms, while for others it is less critical.

Algorithms that require standardization include:

  • LASSO and Ridge regression: These methods use regularization penalties that assume all features are on the same scale. Without standardization, features with larger numeric ranges can dominate the penalty and skew the model.
  • k-means clustering: Since this algorithm relies on distance calculations, features must be on comparable scales to ensure that no single variable dominates the clustering process.
  • Support Vector Machines: The kernel functions used in SVMs are distance-based, so standardized features are essential for fair and effective separation.
  • Neural networks: Gradient-based optimization in neural networks converges more efficiently when inputs are normalized.
  • Principal Component Analysis (PCA): As PCA is based on variance, standardizing features ensures that each variable contributes appropriately to the dimensionality reduction.

When standardization is less critical:

  • Decision trees: These algorithms are scale-invariant, meaning their splitting criteria are unaffected by the magnitude of input variables.
  • Random Forest: As an ensemble of decision trees, Random Forests inherit this scale invariance.
  • Logistic regression: While standardization is not strictly required, it becomes important when regularization is used, as the penalty terms are sensitive to feature scale.

Limitations and Considerations

When applying standardization, keep in mind several important limitations:

  • Sparse data: For sparse matrices, standardizing by subtracting the mean can convert the data into a dense format, which is inefficient and memory-intensive.
    • Tip: Use StandardScaler(with_mean=False) to avoid densifying the matrix.
  • Outliers: Standardization is sensitive to outliers, as extreme values can disproportionately affect the mean and standard deviation, potentially distorting the transformation.
  • Categorical variables: Do not standardize one-hot encoded or ordinal categorical variables, as this can destroy their intended meaning.
  • Target variable: In most cases, avoid standardizing the target variable unless there is a specific reason to do so.

Be aware of common pitfalls:

  • Data leakage: Fitting the scaler on the entire dataset (including the test set) introduces information from the test data into the training process, leading to overly optimistic performance estimates.
    • Best practice: Always fit the scaler only on the training data, then use it to transform both the training and test sets.
  • Inconsistent scaling: Using different scalers for training and test data can result in mismatched feature distributions.
  • Over-standardization: Standardizing features that are already normalized can be unnecessary or even harmful.
  • Categorical confusion: Standardizing categorical variables that should remain discrete can undermine their interpretability and utility.
When to use with_mean=False

For sparse inputs, set StandardScaler(with_mean=False) to avoid densifying the matrix. You typically do not standardize the target y for LASSO unless you have a specific reason to do so.

Practical Applications

Standardization is essential in many real-world scenarios, such as:

  • Financial modeling: Combining features like stock prices, trading volumes, and economic indicators, which may be on vastly different scales.
  • Image processing: Normalizing pixel values across different image formats.
  • Natural language processing: Integrating word counts, document lengths, and TF-IDF scores.
  • Healthcare analytics: Combining lab values, vital signs, and demographic data, all of which may have different units and ranges.
  • Recommendation systems: Merging user ratings, item features, and interaction counts.

Summary

Standardization is a fundamental preprocessing step that ensures features are treated fairly in machine learning algorithms. By transforming features to have a mean of zero and a standard deviation of one, standardization:

  • Enables fair comparison across variables with different scales
  • Improves the performance of distance-based and regularization methods
  • Prevents bias toward features with larger numeric values
  • Stabilizes optimization by normalizing gradients

The key to successful standardization is proper implementation: always fit the scaler on training data only, then transform both training and test sets using the fitted scaler. This prevents data leakage and ensures realistic model evaluation.

Quiz

Ready to test your understanding of standardization? Take this quick quiz to reinforce what you've learned about feature scaling.

Loading component...

Reference

BIBTEXAcademic
@misc{standardizationnormalizingfeaturesforfaircomparisoncompleteguidewithmathformulaspythonimplementation, author = {Michael Brenndoerfer}, title = {Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation}, year = {2025}, url = {https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-01-01} }
APAAcademic
Michael Brenndoerfer (2025). Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation. Retrieved from https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn
MLAAcademic
Michael Brenndoerfer. "Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation." 2026. Web. today. <https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn>.
CHICAGOAcademic
Michael Brenndoerfer. "Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation." Accessed today. https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation'. Available at: https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn (Accessed: today).
SimpleBasic
Michael Brenndoerfer (2025). Standardization: Normalizing Features for Fair Comparison - Complete Guide with Math Formulas & Python Implementation. https://mbrenndoerfer.com/writing/standardization-normalizing-features-fair-comparison-machine-learning-math-formulas-python-scikit-learn