
For
Data scientists, ML engineers, AI engineers, researchers, students, quants, and anyone serious about understanding machine learning at a fundamental level.
Machine Learning from Scratch
A Complete Guide to Machine Learning, Optimization and AI: Mathematical Foundations and Practical Implementations
About This Book
What separates a data scientist who truly understands their craft from one who merely applies black-box tools? The answer lies in mastering the mathematics and intuition behind every algorithm. This comprehensive handbook bridges the gap between theoretical foundations and production-ready implementations, giving you the deep understanding that transforms good practitioners into exceptional ones.
From the elegant simplicity of linear regression to the sophisticated power of gradient boosting and neural networks, every concept is built from first principles. You won't just learn how to use scikit-learn. You'll understand exactly what happens under the hood when you call fit() and predict(). Each algorithm is derived mathematically, explained intuitively, and implemented in clean, production-quality Python code.
Table of Contents
Part I: Statistics 101
11 chapters
Part I: Statistics 101
Types of Data
Complete guide to data classification - quantitative, qualitative, discrete & continuous
Descriptive Statistics
Complete guide to summarizing and understanding data with measures of central tendency, variability, and distribution shape
Probability Basics
Foundation of statistical reasoning covering random variables, probability distributions, expected value, variance, and conditional probability
Central Limit Theorem
Foundation of statistical inference covering convergence behavior, sample size requirements, and practical applications in data science
Data Sampling
Complete guide to sampling theory and methods covering simple random sampling, stratified sampling, cluster sampling, sampling error, and uncertainty quantification
Variable Relationships
Complete guide to covariance, correlation, and regression analysis covering how to measure, model, and interpret variable associations
Probability Distributions
Complete guide to normal, t-distribution, binomial, Poisson, exponential, and log-normal distributions with practical applications
Data Visualization
Complete guide to histograms, box plots, and scatter plots for exploratory data analysis
Data Quality
Complete guide to data quality and outliers covering measurement error, bias, missing data, and imputation
Statistical Inference
Complete guide to drawing conclusions from data covering point and interval estimation, confidence intervals, hypothesis testing, and p-values
Statistical Modelling
Complete guide to building and evaluating predictive models covering model fit metrics, bias-variance tradeoff, and cross-validation
Part II: Foundations
6 chapters
Part II: Foundations
Sum of Squared Errors (SSE)
The fundamental metric for measuring regression model performance and prediction accuracy
R-squared
Understanding the coefficient of determination and model fit metrics
Standardization
Normalizing features for fair comparison in machine learning algorithms
Normalization
Min-max scaling to transform features to a common [0, 1] range for neural networks and distance-based algorithms
Gauss-Markov Assumptions
Foundation of linear regression and OLS estimation covering linearity, independence, homoscedasticity, normality, and practical testing methods
Multicollinearity
Understanding the impact of multicollinearity on regression models
Part III: Regression Models
12 chapters
Part III: Regression Models
Simple Linear Regression
Mathematical foundations, formulas, and step-by-step implementation
Ordinary Least Squares (OLS)
Vector notation and matrix operations for regression
Multiple Linear Regression
Extending linear regression to multiple predictors
Lasso Regularization (L1 Regularization)
L1 penalty for feature selection and overfitting prevention
Ridge Regularization (L2 Regularization)
L2 penalty for handling multicollinearity and overfitting
Elastic Net Regularization
Combining L1 and L2 penalties for optimal regularization
Polynomial Regression
Modeling non-linear relationships with polynomial features
Generalized Linear Models
Extending linear regression to non-normal distributions
Logistic Regression
Binary classification using the logistic function
Spline Regression
Flexible non-parametric regression using piecewise polynomials
Poisson Regression
Modeling count data with Poisson distribution
Multinomial Logistic Regression
Multi-class classification extension of logistic regression
Part IV: Tree-Based Models
7 chapters
Part IV: Tree-Based Models
CART (Classification and Regression Trees)
Decision trees with greedy splitting algorithms
Random Forest
Ensemble method combining multiple decision trees with bagging
Boosted Trees
Gradient boosting for improved predictive performance
XGBoost
Optimized gradient boosting with advanced regularization techniques
LightGBM
Fast gradient boosting with leaf-wise tree growth
CatBoost
Gradient boosting with categorical feature handling
Isolation Forest
Unsupervised anomaly detection using random trees
Part V: Explainability
5 chapters
Part V: Explainability
SHAP (SHapley Additive exPlanations)
Unified framework for model interpretability
LIME (Local Interpretable Model-agnostic Explanations)
Local model explanations for individual predictions
PCA (Principal Component Analysis)
Dimensionality reduction and feature extraction
UMAP (Uniform Manifold Approximation and Projection)
Non-linear dimensionality reduction preserving local and global structure
t-SNE (t-Distributed Stochastic Neighbor Embedding)
Non-linear visualization technique
Part VI: Unsupervised Learning
4 chapters
Part VI: Unsupervised Learning
K-means Clustering
Partitioning data into k clusters using centroid-based approach
DBSCAN (Density-Based Spatial Clustering)
Density-based clustering for arbitrary shaped clusters
HDBSCAN (Hierarchical DBSCAN)
Hierarchical density-based clustering with varying density
Hierarchical Clustering
Tree-based clustering with agglomerative or divisive methods
Part VII: Time Series
5 chapters
Part VII: Time Series
ETS (Exponential Smoothing)
Classical time series forecasting with trend and seasonality
SARIMA (Seasonal ARIMA)
Autoregressive integrated moving average with seasonal components
Prophet
Facebook's forecasting tool for business time series with holidays
N-BEATS
Neural basis expansion analysis for interpretable time series forecasting
N-HiTS
Neural hierarchical interpolation for time series forecasting
Part VIII: Optimization
5 chapters
Part VIII: Optimization
CP-SAT Rostering
Constraint programming for employee scheduling and rostering
MILP Factory
Mixed integer linear programming for production planning
Min Cost Flow Slotting
Network flow optimization for resource allocation
VRPTW Routing
Vehicle routing problem with time windows for logistics
QP Portfolio
Quadratic programming for portfolio optimization and risk management
Reference
Stay Updated
Get notified when new chapters are published.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.