
For
Data scientists, ML engineers, AI engineers, researchers, students, quants, and anyone serious about understanding machine learning at a fundamental level.
Machine Learning from Scratch
A Complete Guide to Machine Learning, Optimization and AI: Mathematical Foundations and Practical Implementations
About This Book
What separates a data scientist who truly understands their craft from one who merely applies black-box tools? The answer lies in mastering the mathematics and intuition behind every algorithm. This comprehensive handbook bridges the gap between theoretical foundations and black-box function calling, giving you the deep understanding that transforms good practitioners into exceptional ones.
From the elegant simplicity of linear regression to the sophisticated power of gradient boosting and neural networks, every concept is built from first principles. You won't just learn how to use scikit-learn. You'll understand exactly what happens under the hood when you call fit() and predict(). Each algorithm is derived mathematically, explained intuitively, and implemented in clean, Python code.
Track Your Progress
Sign in to mark chapters as complete, track quiz scores, and see your reading journey
What's Inside
Statistics 101
Hypothesis Testing
Foundations
Regression Models
Tree-Based Models
Explainability
Table of Contents
Statistics 101
11 chapters
Statistics 101
Types of Data
Complete guide to data classification - quantitative, qualitative, discrete & continuous
Descriptive Statistics
Complete guide to summarizing and understanding data with measures of central tendency, variability, and distribution shape
Probability Basics
Foundation of statistical reasoning covering random variables, probability distributions, expected value, variance, and conditional probability
Central Limit Theorem
Foundation of statistical inference covering convergence behavior, sample size requirements, and practical applications in data science
Data Sampling
Complete guide to sampling theory and methods covering simple random sampling, stratified sampling, cluster sampling, sampling error, and uncertainty quantification
Variable Relationships
Complete guide to covariance, correlation, and regression analysis covering how to measure, model, and interpret variable associations
Probability Distributions
Complete guide to normal, t-distribution, binomial, Poisson, exponential, and log-normal distributions with practical applications
Data Visualization
Complete guide to histograms, box plots, and scatter plots for exploratory data analysis
Data Quality
Complete guide to data quality and outliers covering measurement error, bias, missing data, and imputation
Statistical Inference
Complete guide to drawing conclusions from data covering point and interval estimation, confidence intervals, hypothesis testing, and p-values
Statistical Modelling
Complete guide to building and evaluating predictive models covering model fit metrics, bias-variance tradeoff, and cross-validation
Hypothesis Testing
11 chapters
Hypothesis Testing
P-values and Hypothesis Test Setup
Foundation of hypothesis testing covering p-values, null and alternative hypotheses, one-sided vs two-sided tests, and test statistics
Confidence Intervals and Test Assumptions
Mathematical equivalence between confidence intervals and hypothesis tests, test assumptions, and choosing between z and t tests
The Z-Test
Complete guide to z-tests including one-sample, two-sample, and proportion tests
The T-Test
Complete guide to t-tests including one-sample, two-sample (pooled and Welch), paired tests, assumptions, and decision framework
The F-Test and F-Distribution
F-distribution, F-test for comparing variances, F-test in regression, and nested model comparison
ANOVA (Analysis of Variance)
One-way ANOVA, post-hoc tests, assumptions, and when to use ANOVA
Type I and Type II Errors
Understanding false positives, false negatives, statistical power, and the tradeoff between error types
Sample Size, Minimum Detectable Effect, and Power
Power analysis, sample size determination, MDE calculation, and avoiding underpowered studies
Effect Sizes and Statistical Significance
Cohen's d, practical significance, interpreting effect sizes, and why tiny p-values can mean tiny effects
Multiple Comparisons
Family-wise error rate, false discovery rate, Bonferroni correction, Holm's method, and Benjamini-Hochberg procedure
Summary and Practical Guide to Hypothesis Testing
Practical reporting guidelines, summary of key concepts, test selection parameters table, multiple comparison corrections table, and scipy.stats functions reference
Foundations
6 chapters
Foundations
Sum of Squared Errors (SSE)
The fundamental metric for measuring regression model performance and prediction accuracy
R-squared
Understanding the coefficient of determination and model fit metrics
Standardization
Normalizing features for fair comparison in machine learning algorithms
Normalization
Min-max scaling to transform features to a common [0, 1] range for neural networks and distance-based algorithms
Gauss-Markov Assumptions
Foundation of linear regression and OLS estimation covering linearity, independence, homoscedasticity, normality, and practical testing methods
Multicollinearity
Understanding the impact of multicollinearity on regression models
Regression Models
12 chapters
Regression Models
Simple Linear Regression
Mathematical foundations, formulas, and step-by-step implementation
Ordinary Least Squares (OLS)
Vector notation and matrix operations for regression
Multiple Linear Regression
Extending linear regression to multiple predictors
Lasso Regularization (L1 Regularization)
L1 penalty for feature selection and overfitting prevention
Ridge Regularization (L2 Regularization)
L2 penalty for handling multicollinearity and overfitting
Elastic Net Regularization
Combining L1 and L2 penalties for optimal regularization
Polynomial Regression
Modeling non-linear relationships with polynomial features
Generalized Linear Models
Extending linear regression to non-normal distributions
Logistic Regression
Binary classification using the logistic function
Spline Regression
Flexible non-parametric regression using piecewise polynomials
Poisson Regression
Modeling count data with Poisson distribution
Multinomial Logistic Regression
Multi-class classification extension of logistic regression
Tree-Based Models
6 chapters
Tree-Based Models
CART (Classification and Regression Trees)
Decision trees with greedy splitting algorithms
Random Forest
Ensemble method combining multiple decision trees with bagging
Boosted Trees
Gradient boosting for improved predictive performance
XGBoost
Optimized gradient boosting with advanced regularization techniques
LightGBM
Fast gradient boosting with leaf-wise tree growth
CatBoost
Gradient boosting with categorical feature handling
Explainability
5 chapters
Explainability
SHAP (SHapley Additive exPlanations)
Unified framework for model interpretability
LIME (Local Interpretable Model-agnostic Explanations)
Local model explanations for individual predictions
PCA (Principal Component Analysis)
Dimensionality reduction and feature extraction
UMAP (Uniform Manifold Approximation and Projection)
Non-linear dimensionality reduction preserving local and global structure
t-SNE (t-Distributed Stochastic Neighbor Embedding)
Non-linear visualization technique
Unsupervised Learning
5 chapters
Unsupervised Learning
K-means Clustering
Partitioning data into k clusters using centroid-based approach
DBSCAN (Density-Based Spatial Clustering)
Density-based clustering for arbitrary shaped clusters and noise detection
HDBSCAN (Hierarchical DBSCAN)
Hierarchical density-based clustering with varying density
Hierarchical Clustering
Tree-based clustering with agglomerative or divisive methods
Isolation Forest
Unsupervised anomaly detection using random trees
Time Series
5 chapters
Time Series
ETS (Exponential Smoothing)
Classical time series forecasting with trend and seasonality
SARIMA (Seasonal ARIMA)
Autoregressive integrated moving average with seasonal components
Prophet
Facebook's forecasting tool for business time series with holidays
N-BEATS
Neural basis expansion analysis for interpretable time series forecasting
N-HiTS
Neural hierarchical interpolation for time series forecasting
Optimization
5 chapters
Optimization
CP-SAT Rostering
Constraint programming for employee scheduling and rostering
MILP Factory
Mixed integer linear programming for production planning
Min Cost Flow Slotting
Network flow optimization for resource allocation
VRPTW Routing
Vehicle routing problem with time windows for logistics
QP Portfolio
Quadratic programming for portfolio optimization and risk management
Appendix
1 chapter
Appendix
Applications of CLT
SoonReference
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.
No spam, unsubscribe anytime.
Create a free account to unlock exclusive features, track your progress, and join the conversation.