Chinchilla Scaling Laws: Compute-Optimal Training and Resource Allocation for Large Language Models

Michael Brenndoerfer

History of Language AI Machine Learning Data, Analytics & AI

A comprehensive guide to the Chinchilla scaling laws introduced in 2022. Learn how compute-optimal training balances model size and training data, the 20:1 token-to-parameter ratio, and how these scaling laws transformed language model development by revealing the undertraining problem in previous models.

Part of History of Language AI

This article is part of the free-to-read History of Language AI book

View full handbook

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Loading component...

Reference

BIBTEXAcademic

@misc{chinchillascalinglawscomputeoptimaltrainingandresourceallocationforlargelanguagemodels, author = {Michael Brenndoerfer}, title = {Chinchilla Scaling Laws: Compute-Optimal Training and Resource Allocation for Large Language Models}, year = {2025}, url = {https://mbrenndoerfer.com/writing/chinchilla-scaling-laws-compute-optimal-training-resource-allocation}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-02} }

APAAcademic

Michael Brenndoerfer (2025). Chinchilla Scaling Laws: Compute-Optimal Training and Resource Allocation for Large Language Models. Retrieved from https://mbrenndoerfer.com/writing/chinchilla-scaling-laws-compute-optimal-training-resource-allocation

MLAAcademic

Michael Brenndoerfer. "Chinchilla Scaling Laws: Compute-Optimal Training and Resource Allocation for Large Language Models." 2025. Web. 11/2/2025. <https://mbrenndoerfer.com/writing/chinchilla-scaling-laws-compute-optimal-training-resource-allocation>.

CHICAGOAcademic

Michael Brenndoerfer. "Chinchilla Scaling Laws: Compute-Optimal Training and Resource Allocation for Large Language Models." Accessed 11/2/2025. https://mbrenndoerfer.com/writing/chinchilla-scaling-laws-compute-optimal-training-resource-allocation.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Chinchilla Scaling Laws: Compute-Optimal Training and Resource Allocation for Large Language Models'. Available at: https://mbrenndoerfer.com/writing/chinchilla-scaling-laws-compute-optimal-training-resource-allocation (Accessed: 11/2/2025).

SimpleBasic

Michael Brenndoerfer (2025). Chinchilla Scaling Laws: Compute-Optimal Training and Resource Allocation for Large Language Models. https://mbrenndoerfer.com/writing/chinchilla-scaling-laws-compute-optimal-training-resource-allocation

Direct link:

https://mbrenndoerfer.com/writing/chinchilla-scaling-laws-compute-optimal-training-resource-allocation

Part of History of Language AI

This article is part of the free-to-read History of Language AI book

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

Related Content

t-SNE: Complete Guide to Dimensionality Reduction & High-Dimensional Data Visualization

Notebook

Data, Analytics & AIMachine Learning

t-SNE: Complete Guide to Dimensionality Reduction & High-Dimensional Data Visualization

Nov 2, 2025•23 min read

A comprehensive guide covering t-SNE (t-Distributed Stochastic Neighbor Embedding), including mathematical foundations, probability distributions, KL divergence optimization, and practical implementation. Learn how to visualize complex high-dimensional datasets effectively.

LIME Explainability: Complete Guide to Local Interpretable Model-Agnostic Explanations

Notebook

Data, Analytics & AIMachine Learning

LIME Explainability: Complete Guide to Local Interpretable Model-Agnostic Explanations

Nov 2, 2025•25 min read

A comprehensive guide covering LIME (Local Interpretable Model-Agnostic Explanations), including mathematical foundations, implementation strategies, and practical applications. Learn how to explain any machine learning model's predictions with interpretable local approximations.

UMAP: Complete Guide to Uniform Manifold Approximation and Projection for Dimensionality Reduction

Notebook

Data, Analytics & AIMachine Learning

UMAP: Complete Guide to Uniform Manifold Approximation and Projection for Dimensionality Reduction

Nov 2, 2025•26 min read

A comprehensive guide covering UMAP dimensionality reduction, including mathematical foundations, fuzzy simplicial sets, manifold learning, and practical implementation. Learn how to preserve both local and global structure in high-dimensional data visualization.

Show more articles

Stay updated

Get notified when I publish new articles on data and AI, private equity, technology, and more.

InteractiveChinchilla Scaling Laws: Compute-Optimal Training and Resource Allocation for Large Language Models

Reference

About the author: Michael Brenndoerfer

Related Content

t-SNE: Complete Guide to Dimensionality Reduction & High-Dimensional Data Visualization

LIME Explainability: Complete Guide to Local Interpretable Model-Agnostic Explanations

UMAP: Complete Guide to Uniform Manifold Approximation and Projection for Dimensionality Reduction

Stay updated