Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python

Michael BrenndoerferUpdated December 25, 202526 min read

A comprehensive guide covering descriptive statistics fundamentals, including measures of central tendency (mean, median, mode), variability (variance, standard deviation, IQR), and distribution shape (skewness, kurtosis). Learn how to choose appropriate statistics for different data types and apply them effectively in data science.

Reading Level

Choose your expertise level to adjust how many terms are explained. Beginners see more tooltips, experts see fewer to maintain reading flow. Hover over underlined terms for instant definitions.

Descriptive Statistics: Summarizing and Understanding Data

Descriptive statistics help us summarize and understand data characteristics. These methods transform raw data into useful summaries that show patterns, typical values, and variability. They provide the basis for all further statistical analysis and machine learning work.

Introduction

Before diving into complex modeling or inference, you must first understand the basic properties of your data. Descriptive statistics give us this first way to understand data, showing us the shape, center, and spread of distributions. These summaries let us quickly assess data quality, spot potential issues like outliers or skewness, and communicate findings to stakeholders who may not be familiar with technical details.

Descriptive statistics provide different types of summary information about data, organized into several key categories:

  • Measures of central tendency (e.g., mean, median, mode): Identify typical or representative values in a dataset.
  • Measures of variability (e.g., variance, standard deviation, range, interquartile range): Quantify how spread out the data points are.
  • Measures of distribution shape (e.g., skewness, kurtosis): Describe the asymmetry and tail behavior of the data, highlighting departures from the classic bell-shaped normal distribution.

Understanding these descriptive measures is important not only for exploratory data analysis but also for diagnosing potential problems in modeling. For instance, high skewness might suggest the need for data transformation, while extreme kurtosis could indicate the presence of outliers that warrant investigation. This chapter explores each of these fundamental concepts, giving the statistical basics needed for effective data analysis and modeling.

Measures of Central Tendency

Central tendency measures answer a key question: what is a typical value in this dataset? Three main measures, mean, median, and mode, each capture this concept differently, and the choice among them depends on the data's properties and the analysis goals.

The Mean

The arithmetic mean, commonly called the average, is calculated by summing all values and dividing by the count of observations:

xˉ=1ni=1nxi\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i

Where:

  • xˉ\bar{x}: Sample mean
  • xix_i: Individual observation
  • nn: Number of observations

The mean represents the balance point of a distribution and incorporates information from every data point. This sensitivity to all values makes the mean an efficient estimator when data follow a symmetric distribution, but it also means the mean is heavily influenced by outliers. A single extremely large or small value can pull the mean far from where most data points cluster, potentially misrepresenting the typical value.

For example, consider household income in a neighborhood where most families earn between 40,000and40,000 and 80,000 annually, but one household earns 5million.Themeanincomemightbe5 million. The mean income might be 150,000, far above what most residents actually earn, while the median would remain around $60,000, better representing the typical household.

The Median

The median is the middle value when observations are arranged in order. For datasets with an odd number of observations, it is simply the center value; for even counts, it is the average of the two middle values:

Median={x(n+1)/2if n is oddxn/2+x(n/2)+12if n is even\text{Median} = \begin{cases} x_{(n+1)/2} & \text{if } n \text{ is odd} \\ \frac{x_{n/2} + x_{(n/2)+1}}{2} & \text{if } n \text{ is even} \end{cases}

Where x(i)x_{(i)} denotes the ii-th value in the sorted dataset.

For example, consider the sorted dataset: 12, 15, 18, 22, 25, 28, 31. With seven observations (odd count), the median is the fourth value: 22. If we add an eighth observation to get 12, 15, 18, 22, 25, 28, 31, 45, the median becomes the average of the fourth and fifth values: (22 + 25) / 2 = 23.5.

The median's main advantage is that it handles outliers well. Because it depends only on the position of values rather than their magnitude, extreme values do not distort it. This makes the median particularly valuable for skewed distributions or data with outliers. In fields like real estate, income analysis, and cost estimation, the median often provides a more meaningful measure of central tendency than the mean.

The Mode

The mode is the most frequently occurring value in a dataset. Unlike mean and median, which always produce a single value for continuous data, a distribution can have multiple modes. A dataset is called bimodal when two distinct values share the highest frequency, such as exam scores of 75 and 90 each appearing 8 times while no other score appears more than 5 times. Multimodal distributions have three or more peaks, which often indicate the presence of distinct subgroups within the data, like heights in a mixed population of children and adults. A distribution has no mode if all values occur with equal frequency.

The mode is most useful for categorical data where mean and median are undefined, such as the most common color preference in a survey or the most frequent diagnosis in a medical dataset. For continuous numerical data, the mode is less commonly used, though it can help identify peaks in a distribution's density.

In practice, for continuous data, we often examine the mode through histograms or kernel density estimates rather than computing the exact most frequent value, especially when data have many unique values.

Kernel Density Estimation

Kernel density estimation (KDE) is a non-parametric method for estimating the probability density function of a continuous random variable. Rather than assuming a specific distribution shape, KDE places a smooth "kernel" (typically a Gaussian curve) at each data point and sums them to create a continuous density curve. This technique is particularly useful for visualizing the underlying distribution of data and identifying modes in continuous datasets where exact value repetition is rare.

Don't worry if this concept seems unclear at this point. It's not essential to understand right now. Feel free to continue and revisit it later if needed.

Measures of Spread

While central tendency measures locate the center of a distribution, measures of spread quantify how much data points vary around that center. Understanding variability is crucial because two datasets can have identical means but vastly different spreads, leading to different implications for analysis and decision-making.

Variance and Standard Deviation

Variance measures the average squared deviation from the mean, quantifying how far data points typically fall from the center:

s2=1n1i=1n(xixˉ)2s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2

Where:

  • s2s^2: Sample variance
  • xix_i: Individual observation
  • xˉ\bar{x}: Sample mean
  • nn: Number of observations

The division by n1n-1 rather than nn is known as Bessel's correction, which provides an unbiased estimate of the population variance from sample data. Variance has a useful mathematical property: it adds up for independent variables. However, its units are squared, such as dollars squared, which makes direct interpretation challenging.

Standard deviation addresses this interpretation issue by taking the square root of variance, returning to the original units:

s=s2=1n1i=1n(xixˉ)2s = \sqrt{s^2} = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2}

Standard deviation gives an easy-to-understand measure of typical spread. For approximately normal distributions, about 68% of data fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations. This 68-95-99.7 rule makes standard deviation useful for understanding data spread and spotting unusual values.

Population vs. Sample Standard Deviation: σ vs. s

You may notice that standard deviation is sometimes denoted as σ\sigma (sigma) and sometimes as ss. This distinction reflects whether we are working with a population or a sample:

  • Population standard deviation (σ\sigma): Used when you have data for the entire population. The formula divides by NN (the population size).
  • Sample standard deviation (ss): Used when you have a sample drawn from a larger population. The formula divides by n1n-1 (Bessel's correction) to provide an unbiased estimate of the population standard deviation.

In practice, we almost always work with samples rather than complete populations, so ss with the n1n-1 denominator is the standard choice. Use σ\sigma only when you genuinely have access to every member of the population, such as all students in a specific class or all transactions in a closed system. When in doubt, use the sample formula with n1n-1.

Range and Interquartile Range

The range is the simplest measure of spread, calculated as the difference between the maximum and minimum values:

Range=xmaxxmin\text{Range} = x_{\text{max}} - x_{\text{min}}

While easy to compute and interpret, the range uses only two data points and is extremely sensitive to outliers. A single erroneous or extreme value can inflate the range dramatically, making it an unreliable measure of typical spread.

The interquartile range (IQR) is more robust, measuring the spread of the middle 50% of the data:

IQR=Q3Q1\text{IQR} = Q_3 - Q_1

Where:

  • Q1Q_1: First quartile (25th percentile)
  • Q3Q_3: Third quartile (75th percentile)

Because the IQR focuses on the central portion of the distribution and ignores the tails, it remains stable even when outliers are present. This robustness makes the IQR particularly valuable for exploratory data analysis and outlier detection. The common outlier detection rule identifies values below Q11.5×IQRQ_1 - 1.5 \times \text{IQR} or above Q3+1.5×IQRQ_3 + 1.5 \times \text{IQR} as potential outliers, a criterion widely used in box plots and data cleaning procedures.

For example, consider a dataset of daily website visits over a week: 120, 135, 142, 128, 155, 130, 138. We can calculate each measure of spread:

  • Mean: (120 + 135 + 142 + 128 + 155 + 130 + 138) / 7 = 135.43
  • Median: 135
  • Mode: None (all values are unique)
  • Range: 155 - 120 = 35
  • IQR: Q3 - Q1 = 142 - 128 = 14
  • Standard Deviation: 11.22

This dataset is relatively symmetric, with a mean and median close to each other. The range is 35, indicating moderate variability, and the IQR is 14, showing a consistent spread around the median. The standard deviation of 11.22 is also moderate, reflecting the typical deviation from the mean.

Measures of Distribution Shape

Beyond center and spread, distribution shape gives useful information about data behavior. Two key measures, skewness and kurtosis, quantify asymmetry and tail heaviness, respectively, helping identify departures from the symmetric, moderate-tailed normal distribution.

Skewness

Skewness measures the asymmetry of a distribution around its mean:

Skewness=1ni=1n(xixˉ)3s3\text{Skewness} = \frac{\frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^3}{s^3}

Where ss is the standard deviation. This formula computes the standardized third moment, giving a unitless measure that can be compared across different datasets.

A skewness of zero indicates a perfectly symmetric distribution like the normal distribution. Positive skewness, or right skew, occurs when the distribution has a long tail extending to the right, with most data concentrated on the left side and the mean pulled higher than the median. Income distributions often exhibit positive skewness, with many people earning moderate amounts and a few earning extremely high incomes.

Negative skewness, or left skew, describes distributions with long left tails, where most values cluster at the high end and a few low values pull the mean below the median. This pattern appears less frequently but can occur in data like test scores on easy exams, where most students score high but a few struggle.

As a general guide, skewness values between -0.5 and 0.5 suggest approximate symmetry, values between 0.5 and 1.0 (or -1.0 and -0.5) indicate moderate skewness, and values beyond 1.0 (or below -1.0) suggest high skewness. However, these thresholds are not strict rules and should be interpreted in context with the specific application.

Kurtosis

Kurtosis measures the "tailedness" of a distribution, specifically how much probability mass lies in the tails versus the center compared to a normal distribution:

Kurtosis=1ni=1n(xixˉ)4s4\text{Kurtosis} = \frac{\frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^4}{s^4}

This fourth moment captures tail behavior that variance and skewness miss. The normal distribution has a kurtosis of 3, so many software packages report excess kurtosis, which is kurtosis minus 3, to make interpretation more intuitive. In this case, zero indicates normal-like tails.

High kurtosis, found in leptokurtic distributions, indicates heavy tails and a sharp peak, meaning the distribution produces outliers more frequently than a normal distribution would. Financial returns often exhibit high kurtosis, with extreme market movements occurring more often than normal theory predicts. This phenomenon is important for risk management.

Low kurtosis, found in platykurtic distributions, indicates light tails and a flatter peak, with fewer extreme values than the normal distribution. Uniform distributions have low kurtosis, as values are spread evenly across the range with no concentration in the center or tails.

Understanding kurtosis helps when assessing risk, as high-kurtosis distributions produce more extreme events. Many statistical methods assume normality and may perform poorly when applied to high-kurtosis data without appropriate adjustments or transformations.

Why Do Higher Moments Capture Shape?

You might wonder why the third and fourth powers in the skewness and kurtosis formulas capture asymmetry and tail behavior, respectively. The key lies in how exponentiation treats deviations from the mean:

  • Odd powers (3rd moment for skewness): Preserve the sign of deviations. Negative deviations remain negative, positive remain positive. When a distribution has a long right tail, the large positive deviations (cubed) dominate the sum, producing positive skewness. The opposite happens with left-skewed distributions.

  • Even powers (4th moment for kurtosis): Make all deviations positive, but extreme values are amplified dramatically. Raising a deviation to the fourth power makes large outliers contribute disproportionately to the sum. Heavy-tailed distributions have more extreme values, so their fourth moment is larger relative to their variance squared.

The standardization by s3s^3 or s4s^4 makes these measures unitless and comparable across datasets with different scales. Essentially, moments are a mathematical tool for systematically capturing different aspects of a distribution's shape.

Visual Examples

Visualizing descriptive statistics makes these concepts clearer, helping us see patterns more easily. The following examples demonstrate how different distributions exhibit varying characteristics in terms of central tendency, spread, and shape.

Example: Comparing Distributions with Different Properties

This visualization compares three distributions: a symmetric normal distribution, a right-skewed distribution, and a high-kurtosis distribution with heavy tails. Each plot includes reference lines showing the mean with a solid line and the median with a dashed line to illustrate how these measures respond to distributional shape.

Out[2]:
Visualization
Histogram showing symmetric normal distribution.
Symmetric normal distribution (skewness ≈ 0) showing mean and median aligned at the center. The standard deviation captures typical spread, with most data falling within ±1 SD of the mean. This represents the ideal case where all measures of central tendency coincide.
Histogram showing right-skewed distribution.
Right-skewed distribution (positive skewness) with mean pulled above median by the long right tail. The IQR captures the central data spread more reliably than range, which is inflated by extreme values. This pattern is common in income, real estate prices, and response times.
Histogram showing high-kurtosis distribution.
High-kurtosis distribution with heavy tails and sharp peak. Notice more extreme values beyond ±2 SD than the normal distribution would predict. The larger standard deviation reflects increased variability, while the mean and median remain close due to symmetry. This pattern appears in financial returns and error distributions.

Example: The 68-95-99.7 Rule

The empirical rule gives an easy way to understand standard deviation. For normally distributed data, approximately 68% of observations fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.

Out[3]:
Visualization
Normal distribution curve showing shaded regions for 68%, 95%, and 99.7% coverage within 1, 2, and 3 standard deviations.
Visual demonstration of the 68-95-99.7 rule (empirical rule) for normal distributions. The shaded regions show that approximately 68% of data fall within ±1 standard deviation (blue), 95% within ±2 standard deviations (green overlay), and 99.7% within ±3 standard deviations (yellow overlay) from the mean. This rule gives an easy way to understand standard deviation and identify unusual observations in approximately normal data.

Example: Box Plot Comparison

Box plots provide a compact visual summary of distribution characteristics, highlighting median, quartiles, range, and outliers simultaneously. This example compares the same three distributions using box plots to emphasize how the IQR captures central spread and identifies extreme values.

Out[4]:
Visualization
Box plot comparison showing three different distributions with medians, IQRs, whiskers, and outliers.
Box plot comparison of three distributions showing median (orange line), interquartile range (box boundaries), whiskers extending to 1.5×IQR, and outliers (circles beyond whiskers). The symmetric normal distribution shows balanced whiskers and few outliers. The skewed distribution displays asymmetric whiskers with outliers primarily on the high end. The heavy-tailed distribution exhibits numerous outliers on both sides, reflecting its propensity for extreme values. This visualization demonstrates how box plots efficiently summarize multiple descriptive statistics in a single compact display.

Example: Measures of Spread Visualization

This example shows how different measures of spread, including range, IQR, and standard deviation, capture variability in data. It also shows how these measures behave when outliers are present.

Out[5]:
Visualization
Scatter plot showing data without outliers and standard deviation bands.
Data without outliers showing standard deviation bands (±1 SD and ±2 SD from mean) encompassing most observations. The IQR and range provide similar information about spread, and all measures align well with visual intuition.
Scatter plot showing data with outliers and how different measures are affected.
Data with outliers demonstrating how range becomes inflated by extreme values while IQR remains stable, capturing the spread of the central data. Standard deviation increases due to squared deviations from extreme points. This highlights the IQR's robustness for datasets with outliers.

Choosing Appropriate Descriptive Statistics

Which descriptive statistics to report depends on the data's characteristics and your analysis goals. Consider these guidelines:

  • For symmetric distributions without outliers: Use mean and standard deviation. These measures provide efficient summaries that leverage all available data and work well for many naturally occurring phenomena.
  • For skewed distributions or data with outliers: Use median and IQR. The median better captures typical values when extreme observations would distort the mean, while the IQR describes spread more robustly than standard deviation or range.
  • For categorical data or discrete counts with few unique values: Use the mode. When dealing with multimodal distributions that have multiple peaks, reporting all modes may be more informative than a single central tendency measure that falls between clusters.

Reporting multiple statistics together gives a more complete picture than relying on a single measure. Presenting mean alongside median reveals skewness at a glance; reporting both standard deviation and IQR shows whether outliers are present; including skewness and kurtosis values alerts readers to non-normal behavior. This comprehensive approach provides a richer picture of data characteristics and prevents misleading simplifications.

Practical Applications

Descriptive statistics are useful in every area of data science and analytics:

  • Quality Control Manufacturing: Engineers monitor process means and standard deviations to ensure products meet specifications, using control charts that flag when measurements drift beyond expected ranges. A sudden increase in variance might indicate equipment malfunction, while a shift in the mean could signal a raw material change.
  • Healthcare Research: Researchers use median survival times and interquartile ranges to report treatment outcomes, recognizing that survival data are typically right-skewed with some patients living much longer than typical. Reporting median survival provides a more interpretable and robust summary than mean survival, which could be inflated by a few exceptional cases.
  • Financial Analysis: Analysts examine skewness and kurtosis of investment returns to assess risk. High positive skewness in a portfolio's return distribution suggests occasional large gains, while negative skewness indicates potential for severe losses. High kurtosis warns of fat tails and indicates more extreme events than normal distributions predict.
  • Marketing Analytics: Teams analyze customer behavior metrics like purchase amounts, time on site, and conversion rates. These distributions often exhibit strong right skew, with most customers making small purchases and a few high-value customers contributing disproportionate revenue. Understanding this through median and IQR helps set realistic targets and identify the most valuable customer segments.
  • Climate Science: Scientists describe temperature distributions using multiple descriptive measures to characterize typical conditions, variability, and extremes. Changes in mean temperature indicate warming or cooling trends, while changes in variance or tail behavior reveal whether extreme weather events are becoming more common.

However, before examining specific domain applications, consider these practical guidelines for working with descriptive statistics effectively.

Best Practices

Always visualize your data before computing summary statistics. Histograms, box plots, and scatter plots reveal patterns that numbers alone may hide, including multimodality, outliers, and unexpected gaps. The famous Anscombe's quartet demonstrates that datasets with identical means, variances, and correlations can have fundamentally different structures visible only through visualization.

Report uncertainty alongside point estimates when possible. A mean of 50 is more informative when accompanied by the standard deviation or confidence interval. For small samples, acknowledge that summary statistics may be unstable and could change substantially with additional data.

Document your choices about handling missing values and outliers. Different approaches, such as excluding missing data, imputing values, or trimming extremes, can produce substantially different summaries. Transparency about these decisions allows others to evaluate and reproduce your analysis.

Common Pitfalls

Averaging percentages or ratios directly often produces misleading results. If one store has a 10% return rate on 1000 items and another has 20% on 100 items, the overall return rate is not 15% but rather 120/1100 ≈ 10.9%. Always compute weighted averages when combining rates from groups of different sizes.

Standard deviation loses its intuitive interpretation for highly skewed data. Stating that income has a standard deviation of $30,000 when the distribution is heavily right-skewed may suggest that typical variation extends into negative values, which is impossible for income. For such distributions, the IQR or percentile ranges communicate spread more meaningfully.

Beware of Simpson's paradox, where trends apparent in aggregated data reverse or disappear when data are stratified by a confounding variable. A treatment might appear effective overall but harmful within every subgroup, or vice versa. Computing descriptive statistics at multiple levels of aggregation helps identify such reversals.

Summary

Descriptive statistics give us the tools to understand and communicate data characteristics. Measures of central tendency, including mean, median, and mode, identify typical values, with each measure offering different strengths depending on whether data are symmetric, skewed, or categorical. Measures of spread, including variance, standard deviation, range, and interquartile range, quantify variability, with standard deviation and IQR emerging as the most widely used due to their interpretability and mathematical properties. Measures of distribution shape such as skewness and kurtosis reveal asymmetry and tail behavior, alerting practitioners to departures from normality that may affect subsequent analyses.

No single descriptive statistic tells the whole story. Effective data summarization requires choosing statistics appropriate to the data's properties and reporting multiple measures that together give a complete picture. Symmetric data without outliers are well described by mean and standard deviation. In contrast, skewed or outlier-prone data call for median and IQR. Examining skewness and kurtosis helps determine whether standard parametric methods are appropriate or whether data transformation or robust methods are needed.

Descriptive statistics is not just about calculating numbers. It is about developing good judgment to select appropriate summaries, recognizing patterns they reveal, and communicating findings clearly. These skills form the foundation for all subsequent statistical analysis, machine learning, and data-driven decision making.

Quiz

Ready to test your understanding of descriptive statistics? Take this quiz to reinforce what you've learned about measures of central tendency, spread, and distribution shape.

Loading component...

Reference

BIBTEXAcademic
@misc{descriptivestatisticscompleteguidetosummarizingandunderstandingdatawithpython, author = {Michael Brenndoerfer}, title = {Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python}, year = {2025}, url = {https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-01-01} }
APAAcademic
Michael Brenndoerfer (2025). Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python. Retrieved from https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis
MLAAcademic
Michael Brenndoerfer. "Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python." 2026. Web. today. <https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis>.
CHICAGOAcademic
Michael Brenndoerfer. "Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python." Accessed today. https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python'. Available at: https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis (Accessed: today).
SimpleBasic
Michael Brenndoerfer (2025). Descriptive Statistics: Complete Guide to Summarizing and Understanding Data with Python. https://mbrenndoerfer.com/writing/descriptive-statistics-guide-python-data-analysis