Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python

Michael Brenndoerfer

Data, Analytics & AI Machine Learning Data Science Handbook

A comprehensive guide covering descriptive statistics fundamentals, including measures of central tendency (mean, median, mode), variability (variance, standard deviation, IQR), and distribution shape (skewness, kurtosis). Learn how to choose appropriate statistics for different data types and apply them effectively in data science.

Part of Data Science Handbook

This article is part of the free-to-read Data Science Handbook

View full handbook

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Descriptive Statistics: Summarizing and Understanding Data

Descriptive statistics are fundamental tools that allow us to summarize, describe, and understand the essential characteristics of datasets. These methods transform raw data into meaningful summaries that reveal patterns, central tendencies, and variability, forming the foundation for all subsequent statistical analysis and machine learning work.

Introduction

Before diving into complex modeling or inference, data scientists must first understand the basic properties of their data. Descriptive statistics provide this critical first lens, offering quantitative measures that capture the shape, center, and spread of distributions. These summaries enable quick assessment of data quality, identification of potential issues like outliers or skewness, and communication of findings to stakeholders who may not be familiar with technical details.

Descriptive statistics provide different types of summary information about data, organized into several key categories:

Measures of central tendency (e.g., mean, median, mode): Identify typical or representative values in a dataset.
Measures of variability (e.g., variance, standard deviation, range, interquartile range): Quantify how spread out the data points are.
Measures of distribution shape (e.g., skewness, kurtosis): Describe the asymmetry and tail behavior of the data, highlighting departures from the classic bell-shaped normal distribution.

Understanding these descriptive measures is important not only for exploratory data analysis but also for diagnosing potential problems in modeling. For instance, high skewness might suggest the need for data transformation, while extreme kurtosis could indicate the presence of outliers that warrant investigation. This chapter explores each of these fundamental concepts, providing the statistical foundation necessary for effective data science practice.

Measures of Central Tendency

Central tendency measures answer a fundamental question: what is a typical or representative value in this dataset? Three main measures, mean, median, and mode, each capture this concept differently, and the choice among them depends on the data's properties and the analysis goals.

The Mean

The arithmetic mean, commonly called the average, is calculated by summing all values and dividing by the count of observations:

\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i

Where:

$\bar{x}$ : Sample mean
$x_i$ : Individual observation
$n$ : Number of observations

The mean represents the balance point of a distribution and incorporates information from every data point. This sensitivity to all values makes the mean an efficient estimator when data follow a symmetric distribution, but it also means the mean is heavily influenced by outliers. A single extremely large or small value can pull the mean far from where most data points cluster, potentially misrepresenting the typical value.

For example, consider household income in a neighborhood where most families earn between $40,000 and $80,000 annually, but one household earns $5 million. The mean income might be $150,000, far above what most residents actually earn, while the median would remain around $60,000, better representing the typical household.

The Median

The median is the middle value when observations are arranged in order. For datasets with an odd number of observations, it is simply the center value; for even counts, it is the average of the two middle values:

\text{Median} = \begin{cases} x_{(n+1)/2} & \text{if } n \text{ is odd} \\ \frac{x_{n/2} + x_{(n/2)+1}}{2} & \text{if } n \text{ is even} \end{cases}

Where $x_{(i)}$ denotes the $i$ -th value in the sorted dataset.

The median's key advantage is its robustness to outliers. Because it depends only on the position of values rather than their magnitude, extreme values do not distort it. This makes the median particularly valuable for skewed distributions or data with outliers. In fields like real estate, income analysis, and cost estimation, the median often provides a more meaningful measure of central tendency than the mean.

The Mode

The mode is the most frequently occurring value in a dataset. Unlike mean and median, which always produce a single value for continuous data, a distribution can have multiple modes, such as bimodal or multimodal distributions, or no mode at all if all values occur with equal frequency.

The mode is most useful for categorical data where mean and median are undefined, such as the most common color preference in a survey or the most frequent diagnosis in a medical dataset. For continuous numerical data, the mode is less commonly used, though it can help identify peaks in a distribution's density.

In practice, for continuous data, we often examine the mode through histograms or kernel density estimates rather than computing the exact most frequent value, especially when data have many unique values.

Measures of Spread

While central tendency measures locate the center of a distribution, measures of spread quantify how much data points vary around that center. Understanding variability is crucial because two datasets can have identical means but vastly different spreads, leading to different implications for analysis and decision-making.

Variance and Standard Deviation

Variance measures the average squared deviation from the mean, quantifying how far data points typically fall from the center:

s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2

Where:

$s^2$ : Sample variance
$x_i$ : Individual observation
$\bar{x}$ : Sample mean
$n$ : Number of observations

The division by $n-1$ rather than $n$ is known as Bessel's correction, which provides an unbiased estimate of the population variance from sample data. Variance has an important mathematical property: it is additive for independent variables. However, its units are squared, such as dollars squared, which makes direct interpretation challenging.

Standard deviation addresses this interpretation issue by taking the square root of variance, returning to the original units:

s = \sqrt{s^2} = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2}

Standard deviation provides an intuitive measure of typical spread. For approximately normal distributions, about 68% of data fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations. This 68-95-99.7 rule makes standard deviation a practical tool for understanding data dispersion and identifying unusual values.

Example: The 68-95-99.7 Rule

The empirical rule provides an intuitive way to understand standard deviation. For normally distributed data, approximately 68% of observations fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.

Out[1]:

Visualization

Visual demonstration of the 68-95-99.7 rule (empirical rule) for normal distributions. The shaded regions show that approximately 68% of data fall within ±1 standard deviation (blue), 95% within ±2 standard deviations (green overlay), and 99.7% within ±3 standard deviations (yellow overlay) from the mean. This rule provides an intuitive framework for understanding standard deviation and identifying unusual observations in approximately normal data.

Range and Interquartile Range

The range is the simplest measure of spread, calculated as the difference between the maximum and minimum values:

\text{Range} = x_{\text{max}} - x_{\text{min}}

While easy to compute and interpret, the range uses only two data points and is extremely sensitive to outliers. A single erroneous or extreme value can inflate the range dramatically, making it an unreliable measure of typical spread.

The interquartile range (IQR) provides a more robust alternative by measuring the spread of the middle 50% of the data:

\text{IQR} = Q_3 - Q_1

Where:

$Q_1$ : First quartile (25th percentile)
$Q_3$ : Third quartile (75th percentile)

Because the IQR focuses on the central portion of the distribution and ignores the tails, it remains stable even when outliers are present. This robustness makes the IQR particularly valuable for exploratory data analysis and outlier detection. The common outlier detection rule identifies values below $Q_1 - 1.5 \times \text{IQR}$ or above $Q_3 + 1.5 \times \text{IQR}$ as potential outliers, a criterion widely used in box plots and data cleaning procedures.

Example: Measures of Spread Visualization

This example illustrates how different measures of spread, including range, IQR, and standard deviation, capture variability in data, and how their relationships change depending on the presence of outliers.

Out[2]:

Visualization

Data without outliers showing standard deviation bands (±1 SD and ±2 SD from mean) encompassing most observations. The IQR and range provide similar information about spread, and all measures align well with visual intuition.

Data with outliers demonstrating how range becomes inflated by extreme values while IQR remains stable, capturing the spread of the central data. Standard deviation increases due to squared deviations from extreme points. This highlights the IQR's robustness for datasets with outliers.

Example: Box Plot Comparison

Box plots provide a compact visual summary of distribution characteristics, highlighting median, quartiles, range, and outliers simultaneously. This example compares three distributions using box plots to emphasize how the IQR captures central spread and identifies extreme values.

Out[3]:

Visualization

/var/folders/m2/7r11fhr1199ctk5wkxz65jcc0000gp/T/ipykernel_45408/3753051251.py:67: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
  bp = plt.boxplot(box_data, labels=labels, patch_artist=True, notch=False)

Box plot comparison of three distributions showing median (orange line), interquartile range (box boundaries), whiskers extending to 1.5×IQR, and outliers (circles beyond whiskers). The symmetric normal distribution shows balanced whiskers and few outliers. The skewed distribution displays asymmetric whiskers with outliers primarily on the high end. The heavy-tailed distribution exhibits numerous outliers on both sides, reflecting its propensity for extreme values. This visualization demonstrates how box plots efficiently summarize multiple descriptive statistics in a single compact display.

Measures of Distribution Shape

Beyond center and spread, the shape of a distribution provides important information about data behavior. Two key measures, skewness and kurtosis, quantify asymmetry and tail heaviness, respectively, helping identify departures from the symmetric, moderate-tailed normal distribution.

Skewness

Skewness measures the asymmetry of a distribution around its mean:

\text{Skewness} = \frac{\frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^3}{s^3}

Where $s$ is the standard deviation. This formula computes the standardized third moment, producing a dimensionless measure that can be compared across different datasets.

A skewness of zero indicates a perfectly symmetric distribution like the normal distribution. Positive skewness, or right skew, occurs when the distribution has a long tail extending to the right, with most data concentrated on the left side and the mean pulled higher than the median. Income distributions often exhibit positive skewness, with many people earning moderate amounts and a few earning extremely high incomes.

Negative skewness, or left skew, describes distributions with long left tails, where most values cluster at the high end and a few low values pull the mean below the median. This pattern appears less frequently but can occur in data like test scores on easy exams, where most students score high but a few struggle.

As a rough guideline, skewness values between -0.5 and 0.5 suggest approximate symmetry, values between 0.5 and 1.0 (or -1.0 and -0.5) indicate moderate skewness, and values beyond 1.0 (or below -1.0) suggest high skewness. However, these thresholds are not strict rules and should be interpreted in context with the specific application.

Kurtosis

Kurtosis measures the "tailedness" of a distribution, specifically how much probability mass lies in the tails versus the center compared to a normal distribution:

\text{Kurtosis} = \frac{\frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^4}{s^4}

This fourth moment captures tail behavior that variance and skewness miss. The normal distribution has a kurtosis of 3, so many software packages report excess kurtosis, which is kurtosis minus 3, to make interpretation more intuitive. In this case, zero indicates normal-like tails.

High kurtosis, found in leptokurtic distributions, indicates heavy tails and a sharp peak, meaning the distribution produces outliers more frequently than a normal distribution would. Financial returns often exhibit high kurtosis, with extreme market movements occurring more often than normal theory predicts. This phenomenon is important for risk management.

Low kurtosis, found in platykurtic distributions, indicates light tails and a flatter peak, with fewer extreme values than the normal distribution. Uniform distributions have low kurtosis, as values are spread evenly across the range with no concentration in the center or tails.

Understanding kurtosis is particularly important when assessing risk, as high-kurtosis distributions produce more extreme events. Many statistical methods assume normality and may perform poorly when applied to high-kurtosis data without appropriate adjustments or transformations.

Example: Comparing Distributions with Different Properties

This visualization compares three distributions: a symmetric normal distribution, a right-skewed distribution, and a high-kurtosis distribution with heavy tails. Each plot includes reference lines showing the mean with a solid line and the median with a dashed line to illustrate how these measures respond to distributional shape.

Out[4]:

Visualization

Symmetric normal distribution (skewness ≈ 0) showing mean and median aligned at the center. The standard deviation captures typical spread, with most data falling within ±1 SD of the mean. This represents the ideal case where all measures of central tendency coincide.

Right-skewed distribution (positive skewness) with mean pulled above median by the long right tail. The IQR captures the central data spread more reliably than range, which is inflated by extreme values. This pattern is common in income, real estate prices, and response times.

High-kurtosis distribution with heavy tails and sharp peak. Notice more extreme values beyond ±2 SD than the normal distribution would predict. The larger standard deviation reflects increased variability, while the mean and median remain close due to symmetry. This pattern appears in financial returns and error distributions.

Choosing Appropriate Descriptive Statistics

Selecting which descriptive statistics to report depends on the data's characteristics and the analysis goals. For symmetric distributions without outliers, the mean and standard deviation provide efficient summaries that leverage all available data. These measures work well for many naturally occurring phenomena and are familiar to most audiences.

When distributions are skewed or contain outliers, median and IQR often provide more representative summaries. The median better captures typical values when extreme observations would distort the mean, while the IQR describes spread more robustly than standard deviation or range. Financial data, real estate prices, and response times commonly exhibit these characteristics.

For categorical data or discrete counts with few unique values, the mode provides the most interpretable measure of central tendency. When dealing with multimodal distributions that have multiple peaks, reporting all modes may be more informative than a single central tendency measure that falls between clusters.

Reporting multiple statistics together rather than relying on a single measure provides a more complete picture. Presenting mean alongside median reveals skewness at a glance; reporting both standard deviation and IQR shows whether outliers are present; including skewness and kurtosis values alerts readers to non-normal behavior. This comprehensive approach provides a richer picture of data characteristics and prevents misleading simplifications.

Practical Applications

Descriptive statistics find applications across every domain of data science and analytics. In quality control manufacturing, engineers monitor process means and standard deviations to ensure products meet specifications, using control charts that flag when measurements drift beyond expected ranges. A sudden increase in variance might indicate equipment malfunction, while a shift in the mean could signal a raw material change.

Healthcare researchers use median survival times and interquartile ranges to report treatment outcomes, recognizing that survival data are typically right-skewed with some patients living much longer than typical. Reporting median survival provides a more interpretable and robust summary than mean survival, which could be inflated by a few exceptional cases.

Financial analysts examine skewness and kurtosis of investment returns to assess risk. High positive skewness in a portfolio's return distribution suggests occasional large gains, while negative skewness indicates potential for severe losses. High kurtosis warns of fat tails and indicates more extreme events than normal distributions predict. This information is important for risk management and portfolio construction.

Marketing teams analyze customer behavior metrics like purchase amounts, time on site, and conversion rates. These distributions often exhibit strong right skew, with most customers making small purchases and a few high-value customers contributing disproportionate revenue. Understanding this through median and IQR helps set realistic targets and identify the most valuable customer segments.

Climate scientists describe temperature distributions using multiple descriptive measures to characterize typical conditions, variability, and extremes. Changes in mean temperature indicate warming or cooling trends, while changes in variance or tail behavior reveal whether extreme weather events are becoming more common. This information is important for adaptation planning.

Summary

Descriptive statistics provide the essential toolkit for understanding and communicating data characteristics. Measures of central tendency, including mean, median, and mode, identify typical values, with each measure offering different strengths depending on whether data are symmetric, skewed, or categorical. Measures of spread, including variance, standard deviation, range, and interquartile range, quantify variability, with standard deviation and IQR emerging as the most widely used due to their interpretability and mathematical properties. Measures of distribution shape such as skewness and kurtosis reveal asymmetry and tail behavior, alerting analysts to departures from normality that may affect subsequent analyses.

No single descriptive statistic tells the complete story. Effective data summarization requires choosing statistics appropriate to the data's properties and reporting multiple measures that together provide a comprehensive picture. Symmetric data without outliers are well described by mean and standard deviation. In contrast, skewed or outlier-prone data call for median and IQR. Examining skewness and kurtosis helps determine whether standard parametric methods are appropriate or whether data transformation or robust methods are needed.

Mastering descriptive statistics is not merely about calculating numbers. It is about developing the judgment to select appropriate summaries, the insight to recognize patterns they reveal, and the communication skills to present findings clearly. These skills form the foundation for all subsequent statistical analysis, machine learning, and data-driven decision making.

Quiz

Ready to test your understanding of descriptive statistics? Challenge yourself with this quiz covering central tendency, variability, and distribution shape. Good luck!

Loading component...

Back to Data Science Handbook

Previous Chapter

Types of Data

Next Chapter

Probability Basics

Reference

BIBTEXAcademic

@misc{descriptivestatisticscompleteguidetosummarizingandunderstandingdatawithpython, author = {Michael Brenndoerfer}, title = {Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python}, year = {2025}, url = {https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-24} }

APAAcademic

Michael Brenndoerfer (2025). Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python. Retrieved from https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis

MLAAcademic

Michael Brenndoerfer. "Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python." 2025. Web. 11/24/2025. <https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis>.

CHICAGOAcademic

Michael Brenndoerfer. "Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python." Accessed 11/24/2025. https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python'. Available at: https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis (Accessed: 11/24/2025).

SimpleBasic

Michael Brenndoerfer (2025). Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python. https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis

Direct link:

https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis

Part of Data Science Handbook

This article is part of the free-to-read Data Science Handbook

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

InteractiveDescriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python

Descriptive Statistics: Summarizing and Understanding Data

Introduction

Measures of Central Tendency

The Mean

The Median

The Mode

Measures of Spread

Variance and Standard Deviation

Range and Interquartile Range

Measures of Distribution Shape

Skewness

Kurtosis

Choosing Appropriate Descriptive Statistics

Practical Applications

Summary

Quiz

Types of Data

Probability Basics

Reference

About the author: Michael Brenndoerfer

Related Content

Minimum Cost Flow Slotting: Complete Guide to Network Flow Optimization & Resource Allocation

Mixed Integer Linear Programming (MILP) for Factory Optimization: Complete Guide with Mathematical Foundations & Implementation

CP-SAT Rostering: Complete Guide to Constraint Programming for Workforce Scheduling

Stay updated