Polynomial Regression: Complete Guide with Math, Implementation & Best Practices
Back to Writing

Polynomial Regression: Complete Guide with Math, Implementation & Best Practices

Michael BrenndoerferOctober 20, 202529 min read7,072 wordsInteractive

A comprehensive guide covering polynomial regression, including mathematical foundations, implementation in Python, bias-variance trade-offs, and practical applications. Learn how to model non-linear relationships using polynomial features.

Data Science Handbook Cover
Part of Data Science Handbook

This article is part of the free-to-read Data Science Handbook

View full handbook
Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Loading component...
Out[2]:
Visualization
Notebook output

Exponential growth of polynomial features as the number of variables and polynomial degree increase. The heatmap shows how quickly the feature space explodes - with just 5 variables and degree 4, you have 126 features to estimate. This visualization demonstrates why polynomial regression is typically limited to low degrees and few variables in practice.

Summary Table of Terms

DegreeExample Terms (for x1x_1, x2x_2)Description
1x1x_1, x2x_2Linear
2x12x_1^2, x22x_2^2, x1x2x_1 x_2Quadratic & Interactions
3x13x_1^3, x23x_2^3, x12x2x_1^2 x_2, x1x22x_1 x_2^2Cubic & Higher Interactions

In Summary, with multiple variables, polynomial regression models include all combinations of variables multiplied together up to the specified degree. This flexibility allows you to model complex, non-linear relationships, but it also means the number of terms can grow very quickly as you add more variables or increase the degree.

Out[3]:
Visualization
Notebook output

3D visualization of a quadratic polynomial regression surface with two variables. The surface shows how the model captures both individual variable effects (x₁² and x₂²) and interaction effects (x₁x₂). The curvature in multiple directions demonstrates the model's ability to capture complex non-linear relationships between multiple predictors and the response variable.

Loading component...
Out[4]:
Visualization
Notebook output

Linear regression (degree 1) showing the best-fitting straight line through the data. This model captures only the general trend but misses the underlying cubic relationship, demonstrating underfitting.

Notebook output

Quadratic regression (degree 2) showing improved fit with some curvature. While better than linear, it still oversimplifies the true cubic relationship in the data.

Notebook output

Cubic regression (degree 3) closely matching the true underlying function. This demonstrates the optimal degree for this dataset, capturing the true pattern without overfitting.

Notebook output

Fifth-degree polynomial showing overfitting with unnecessary wiggles and complexity. Higher R² doesn't always mean better generalization, as this model fits noise rather than the true pattern.

Loading component...
In[5]:
1import numpy as np
2import matplotlib.pyplot as plt
3from sklearn.preprocessing import PolynomialFeatures
4from sklearn.linear_model import LinearRegression
5from sklearn.pipeline import Pipeline
6from sklearn.model_selection import train_test_split
7from sklearn.metrics import mean_squared_error, r2_score
8
9# Generate synthetic data with cubic relationship
10np.random.seed(42)
11n_samples = 100
12x = np.linspace(-2, 2, n_samples)
13y_true = 2 * x**3 - 3 * x**2 + x + 1
14noise = np.random.normal(0, 0.5, n_samples)
15y = y_true + noise
16
17# Split into training and test sets
18X_train, X_test, y_train, y_test = train_test_split(
19    x.reshape(-1, 1), y, test_size=0.3, random_state=42
20)

The data follows a cubic polynomial with added Gaussian noise. By splitting into training (70%) and test (30%) sets, we can evaluate how well the model generalizes to unseen data.

Building a Polynomial Regression Pipeline

The most efficient way to implement polynomial regression in scikit-learn is using a Pipeline that combines the PolynomialFeatures transformer with LinearRegression. This ensures that the same polynomial transformation is consistently applied to both training and test data.

In[6]:
1# Create pipeline with degree-3 polynomial features
2poly_reg = Pipeline(
3    [
4        ("poly", PolynomialFeatures(degree=3, include_bias=True)),
5        ("linear", LinearRegression()),
6    ]
7)
8
9# Train the model
10poly_reg.fit(X_train, y_train)
11
12# Make predictions
13y_pred = poly_reg.predict(X_test)
14
15# Calculate performance metrics
16mse = mean_squared_error(y_test, y_pred)
17r2 = r2_score(y_test, y_pred)
Out[7]:
Model Performance:
Mean Squared Error: 0.1395
R² Score: 0.9980

The low MSE and high R² score (close to 1.0) indicate excellent model performance. Since our data was generated from a cubic function and we used a degree-3 polynomial, the model successfully captures the true underlying relationship. The R² score tells us that the model explains nearly all the variance in the test data.

Out[8]:

Model Coefficients:
Intercept: 0.8867
Coefficients: [ 0.          1.2487505  -2.96269465  1.90648781]

The coefficients show the contribution of each polynomial term. Comparing these to our true function (2x33x2+x+12x^3 - 3x^2 + x + 1), we can see the model has successfully recovered values close to the true parameters, despite the added noise.

Comparing Different Polynomial Degrees

A critical question in polynomial regression is choosing the right degree. Let's compare models with different degrees to understand the bias-variance trade-off.

In[9]:
1# Compare polynomial degrees from 1 to 5
2degrees = [1, 2, 3, 4, 5]
3results = []
4
5for degree in degrees:
6    # Create and fit model
7    model = Pipeline(
8        [
9            ("poly", PolynomialFeatures(degree=degree, include_bias=True)),
10            ("linear", LinearRegression()),
11        ]
12    )
13
14    model.fit(X_train, y_train)
15    y_pred_deg = model.predict(X_test)
16
17    # Calculate metrics
18    mse_deg = mean_squared_error(y_test, y_pred_deg)
19    r2_deg = r2_score(y_test, y_pred_deg)
20    n_features = model.named_steps["poly"].n_output_features_
21
22    results.append(
23        {"degree": degree, "mse": mse_deg, "r2": r2_deg, "n_features": n_features}
24    )
Out[10]:
Comparison of Polynomial Degrees:

Degree   MSE          R²           Features  
---------------------------------------------
1        21.7172      0.6895       2         
2        6.3839       0.9087       3         
3        0.1395       0.9980       4         
4        0.1366       0.9980       5         
5        0.1388       0.9980       6         

Best degree: 4 (MSE = 0.1366)

The comparison reveals important insights about model selection:

  • Degree 1 (Linear): High MSE and low R² indicate underfitting. A straight line cannot capture the cubic relationship.
  • Degree 2 (Quadratic): Improved performance but still underfits the true cubic function.
  • Degree 3 (Cubic): Optimal performance with the lowest MSE. This matches the true data-generating process.
  • Degrees 4-5: Similar or slightly worse performance than degree 3, suggesting potential overfitting to noise in the training data.

The degree-3 model achieves the best balance between bias and variance, as expected since the true relationship is cubic. Higher degrees don't improve performance and risk overfitting, especially with limited training data.

Visualizing Model Predictions

Let's visualize how different polynomial degrees fit the data to better understand their behavior.

Out[11]:
Visualization
Notebook output

Linear polynomial (degree 1) showing underfitting with a straight line that cannot capture the cubic relationship in the data. The low R² score indicates poor model performance.

Notebook output

Quadratic polynomial (degree 2) showing improved fit with some curvature, but still underfitting the true cubic relationship. Performance is better than linear but not optimal.

Notebook output

Cubic polynomial (degree 3) achieving optimal fit by matching the true data-generating process. The high R² score indicates excellent model performance without overfitting.

Notebook output

Fifth-degree polynomial showing potential overfitting with a more complex curve that may fit training noise. Performance is similar to degree 3, suggesting unnecessary complexity.

The visualizations clearly show the progression from underfitting (degree 1) to optimal fit (degree 3) to potential overfitting (degree 5). The degree-3 model smoothly captures the underlying cubic trend, while the degree-5 model shows slight oscillations that fit noise rather than the true pattern.

Key Parameters

Below are the main parameters that control polynomial regression behavior and performance.

PolynomialFeatures Parameters:

  • degree: The maximum degree of polynomial features to generate (default: 2). This is the most critical parameter. Start with low values (2-3) and increase only if needed. Higher degrees capture more complex patterns but risk overfitting.
  • include_bias: Whether to include the intercept column of ones (default: True). Typically set to True when using with LinearRegression, which expects the bias term in the feature matrix.
  • interaction_only: If True, only interaction features are produced, not powers (default: False). Use this when you want to model interactions between variables without including squared or higher-power terms.

LinearRegression Parameters:

  • fit_intercept: Whether to calculate the intercept (default: True). When using PolynomialFeatures with include_bias=True, this is typically set to False to avoid redundancy, though scikit-learn handles this automatically.
  • copy_X: Whether to copy the input data (default: True). Set to False to save memory if you don't need the original data preserved.

Pipeline Benefits:

  • Automatically applies the same polynomial transformation to training and test data
  • Simplifies cross-validation and hyperparameter tuning
  • Ensures reproducible preprocessing steps
  • Makes code more maintainable and less error-prone

Key Methods

The following methods are used to train and apply polynomial regression models.

  • fit(X, y): Trains the polynomial regression model on input features X and target values y. This applies the polynomial transformation and fits the linear regression coefficients.
  • predict(X): Returns predicted values for input data X. The polynomial transformation is automatically applied before making predictions.
  • score(X, y): Returns the R² score (coefficient of determination) on the given test data. Values closer to 1.0 indicate better fit.
  • get_params(): Returns the parameters of the pipeline components. Useful for inspecting or modifying the model configuration.
  • **set_params(**params)**: Sets the parameters of the pipeline. Commonly used with grid search for hyperparameter tuning.

Practical Applications

When to Use Polynomial Regression

Polynomial regression is particularly effective when you have continuous target variables and suspect non-linear relationships that can be approximated with smooth polynomial curves. The method excels in engineering applications where physical relationships often follow polynomial patterns, such as modeling stress-strain curves in materials science, temperature effects on material properties, or fluid dynamics problems. These domains benefit from polynomial regression because the underlying physics often produces smooth, continuous relationships that can be well-represented by polynomial functions of moderate degree.

In economics and finance, polynomial regression proves valuable for modeling relationships with natural curvature, such as the risk-return trade-off in portfolio optimization or the relationship between advertising spend and sales revenue. Marketing analysts frequently use quadratic or cubic polynomials to capture diminishing returns, where initial investments yield strong results but additional spending produces progressively smaller gains. The method is also useful in dose-response studies in pharmacology, where the effect of a drug often follows a non-linear but smooth relationship with dosage.

The approach works best when you have moderate amounts of data (hundreds to thousands of observations) and when the underlying relationship is genuinely smooth rather than highly irregular or discontinuous. Polynomial regression is less suitable for problems with sharp transitions, categorical relationships, or highly complex patterns that would require very high-degree polynomials. In such cases, methods like decision trees, splines, or neural networks may be more appropriate. The key advantage of polynomial regression is its balance between flexibility and interpretability—it can model non-linear relationships while maintaining the familiar statistical framework and diagnostic tools of linear regression.

Best Practices

To achieve optimal results with polynomial regression, begin by selecting an appropriate polynomial degree through systematic cross-validation rather than visual inspection alone. Start with low degrees (typically 2 or 3) and incrementally increase while monitoring both training and validation performance. Use k-fold cross-validation with at least 5 folds to assess generalization performance, and watch for signs of overfitting where training error continues to decrease but validation error increases. The optimal degree typically occurs where validation error reaches its minimum, and you should favor simpler models when performance differences are marginal.

Feature scaling is essential before applying polynomial transformations, particularly for degrees higher than 2. Use StandardScaler to center and scale your features to unit variance, or MinMaxScaler if you need features in a specific range. Apply scaling before generating polynomial features to prevent numerical instability that arises when features like x5x^5 have vastly different magnitudes than xx. When working with multiple input variables, be mindful of the combinatorial explosion in feature count—with 5 variables and degree 3, you generate 56 features. In such cases, consider using interaction_only=True in PolynomialFeatures to include only interaction terms without higher powers, or apply regularization techniques like Ridge or Lasso to manage the increased model complexity.

Always use pipelines to ensure consistent preprocessing between training and prediction. A pipeline that combines StandardScaler, PolynomialFeatures, and LinearRegression guarantees that test data receives identical transformations as training data, preventing subtle bugs and ensuring reproducibility. Set random_state parameters for reproducibility, and evaluate your model using multiple metrics—R² score for overall fit, mean squared error for prediction accuracy, and residual plots to check for patterns that might indicate model misspecification. When comparing models of different degrees, consider both statistical measures and domain knowledge to ensure your chosen model makes practical sense for your application.

Data Requirements and Preprocessing

Polynomial regression requires continuous numerical features and a continuous target variable. The method assumes that relationships between variables are smooth and differentiable, making it unsuitable for categorical predictors without proper encoding or for target variables with discrete jumps. Your data should have sufficient observations relative to the number of polynomial features you plan to generate—as a general guideline, aim for at least 10-20 observations per feature to avoid overfitting. With degree 3 polynomials on a single variable (4 features including intercept), this means at least 40-80 observations, though more is always preferable.

Missing values must be handled before applying polynomial regression, as the method cannot work with incomplete data. Imputation strategies depend on your domain—mean or median imputation works for data missing at random, while more sophisticated approaches like K-nearest neighbors imputation may be appropriate for structured missingness patterns. Outliers can have disproportionate influence on polynomial regression, especially with higher degrees, since polynomial terms amplify extreme values. Examine your data for outliers using box plots or z-scores, and consider whether they represent genuine extreme observations or data quality issues. If outliers are legitimate, robust regression techniques or data transformations may help reduce their influence.

The distribution of your predictor variables affects polynomial regression performance. Ideally, predictors should have reasonable coverage across their range without large gaps, as polynomials can behave erratically when extrapolating beyond the training data range. If your data is heavily skewed, consider log or square root transformations before generating polynomial features, as these can stabilize variance and improve model fit. However, be cautious with transformations, as they change the interpretation of coefficients and may complicate communication of results to non-technical stakeholders.

Common Pitfalls

One of the most frequent mistakes is choosing polynomial degree based solely on training performance or visual fit without proper validation. This often leads to overfitting, where the model captures noise rather than true underlying patterns. High-degree polynomials can fit training data nearly perfectly while performing poorly on new data, especially near the boundaries of the data range where polynomial curves can exhibit wild oscillations. Always use cross-validation to select the degree, and be skeptical of models that require degrees higher than 4 or 5—such complexity often indicates that polynomial regression may not be the right approach for your problem.

Failing to scale features before polynomial transformation is another common error that can cause numerical instability and poor convergence. When you raise unscaled features to high powers, the resulting values can become extremely large or small, leading to overflow errors or ill-conditioned matrices that are difficult to invert accurately. This problem becomes more severe with higher degrees and can produce unreliable coefficient estimates even when the algorithm appears to converge. The solution is straightforward: always apply feature scaling before polynomial feature generation, and use pipelines to ensure this preprocessing happens consistently.

A subtler pitfall involves using polynomial regression with multiple correlated predictors, which can lead to severe multicollinearity when interaction terms are included. For example, if x1x_1 and x2x_2 are highly correlated, their polynomial terms like x12x_1^2, x1x2x_1 x_2, and x22x_2^2 will be even more correlated, making coefficient estimates unstable and difficult to interpret. In such cases, consider regularization methods like Ridge regression, which can handle multicollinearity more gracefully, or use principal component analysis to decorrelate your features before applying polynomial transformations. Finally, avoid the temptation to extrapolate far beyond your training data range—polynomial models often behave unpredictably outside the region where they were fitted, and predictions in these regions should be treated with considerable skepticism.

Computational Considerations

Polynomial regression has computational complexity that depends primarily on the number of polynomial features rather than the polynomial degree itself. For a dataset with nn observations and pp input features at degree dd, the number of polynomial features grows as (p+dd)\binom{p+d}{d}, and the computational cost of fitting scales as O(nm2+m3)O(n \cdot m^2 + m^3) where m=(p+dd)m = \binom{p+d}{d} is the number of features. For small to moderate feature counts (up to a few dozen features), this is quite manageable on modern hardware. However, with 10 input features at degree 4, you generate 1,001 polynomial features, making the problem substantially more expensive.

For datasets with fewer than 10,000 observations and moderate feature counts (under 50 polynomial features), polynomial regression typically runs in seconds on standard hardware. Memory requirements are generally modest, as you only need to store the design matrix and intermediate calculations. However, with very large datasets (millions of observations) or high-dimensional feature spaces (hundreds of polynomial features), memory can become a constraint. In such cases, consider using mini-batch or online learning approaches, though these are less common for polynomial regression than for other methods. Alternatively, reduce dimensionality by selecting only the most important input features or using lower polynomial degrees.

When working with multiple input variables, the combinatorial explosion in feature count can make polynomial regression impractical beyond degree 3 or 4. For 20 input features at degree 3, you would generate 1,771 polynomial features, requiring substantial memory and computation time. In these scenarios, consider alternative approaches such as generalized additive models (GAMs) that model non-linear effects for each variable separately, or use feature selection to identify the most important variables before applying polynomial transformations. Sparse polynomial regression, which selects only relevant polynomial terms, can also help manage complexity in high-dimensional settings.

Performance Evaluation and Deployment

Evaluating polynomial regression performance requires attention to both statistical metrics and practical considerations. The R² score provides a measure of overall fit, with values above 0.7 generally indicating good predictive power, though acceptable thresholds vary by domain. However, R² alone can be misleading with polynomial regression, as it always increases with polynomial degree on training data. Instead, focus on cross-validated R² or adjusted R², which penalizes model complexity. Mean squared error (MSE) and root mean squared error (RMSE) provide interpretable measures of prediction accuracy in the original units of your target variable, making them valuable for communicating model performance to stakeholders.

Residual analysis is particularly important for polynomial regression. Plot residuals against predicted values to check for patterns—residuals should appear randomly scattered around zero without systematic trends. Patterns in residual plots often indicate that your chosen polynomial degree is too low (underfitting) or that important variables are missing from the model. Also examine residuals across the range of predictor variables to ensure the model fits well throughout the data range, not just in the center. Polynomial models sometimes fit poorly near the boundaries of the data, where polynomial curves can exhibit unexpected behavior.

When deploying polynomial regression models, ensure that your production pipeline includes all preprocessing steps in the correct order: feature scaling, polynomial transformation, and prediction. Using scikit-learn pipelines simplifies deployment by encapsulating all transformations in a single object that can be serialized and loaded in production environments. Monitor model performance over time, as polynomial models can degrade if the data distribution shifts or if predictions are needed outside the original training range. For real-time applications, polynomial regression offers fast prediction times since it only requires matrix multiplication, making it suitable for latency-sensitive deployments. However, be cautious about extrapolation—if production data extends beyond training ranges, consider retraining the model or implementing safeguards that flag out-of-range predictions for manual review.

Summary

Polynomial regression serves as a powerful bridge between simple linear regression and more complex non-linear modeling techniques. By extending the linear regression framework to include polynomial terms, it allows you to capture curved relationships while maintaining the interpretability and statistical rigor of linear models. The method is particularly valuable when you're dealing with smooth, continuous relationships that can be approximated by polynomial curves.

Successful polynomial regression requires careful model selection and validation. While the technique can capture complex non-linear patterns, it's susceptible to overfitting, especially with higher degrees. Balancing model complexity with generalization performance through cross-validation helps select appropriate polynomial degrees while monitoring for signs of overfitting. The computational efficiency and interpretability of polynomial regression make it an excellent choice for many practical applications, particularly in domains where understanding the relationship between variables is as important as prediction accuracy.

When you implement it thoughtfully with proper preprocessing, feature scaling, and model validation, polynomial regression provides a robust and interpretable approach to modeling non-linear relationships. It serves as an essential tool in your data science toolkit, offering a stepping stone to more advanced techniques while maintaining the analytical transparency that makes linear regression so valuable in practice.

Reference

BIBTEXAcademic
@misc{polynomialregressioncompleteguidewithmathimplementationbestpractices, author = {Michael Brenndoerfer}, title = {Polynomial Regression: Complete Guide with Math, Implementation & Best Practices}, year = {2025}, url = {https://mbrenndoerfer.com/writing/polynomial-regression-complete-guide-math-implementation-python-scikit-learn}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-02} }
APAAcademic
Michael Brenndoerfer (2025). Polynomial Regression: Complete Guide with Math, Implementation & Best Practices. Retrieved from https://mbrenndoerfer.com/writing/polynomial-regression-complete-guide-math-implementation-python-scikit-learn
MLAAcademic
Michael Brenndoerfer. "Polynomial Regression: Complete Guide with Math, Implementation & Best Practices." 2025. Web. 11/2/2025. <https://mbrenndoerfer.com/writing/polynomial-regression-complete-guide-math-implementation-python-scikit-learn>.
CHICAGOAcademic
Michael Brenndoerfer. "Polynomial Regression: Complete Guide with Math, Implementation & Best Practices." Accessed 11/2/2025. https://mbrenndoerfer.com/writing/polynomial-regression-complete-guide-math-implementation-python-scikit-learn.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Polynomial Regression: Complete Guide with Math, Implementation & Best Practices'. Available at: https://mbrenndoerfer.com/writing/polynomial-regression-complete-guide-math-implementation-python-scikit-learn (Accessed: 11/2/2025).
SimpleBasic
Michael Brenndoerfer (2025). Polynomial Regression: Complete Guide with Math, Implementation & Best Practices. https://mbrenndoerfer.com/writing/polynomial-regression-complete-guide-math-implementation-python-scikit-learn
Michael Brenndoerfer

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

Stay updated

Get notified when I publish new articles on data and AI, private equity, technology, and more.