You Have the Whole Population — Why Are You Still Computing Standard Errors?

Causal Inference
Econometrics
Statistics
Paper Review
A deep dive into the Abadie-Athey-Imbens-Wooldridge paper that asks a question most data scientists never think to ask: where does your uncertainty actually come from? The answer changes everything about how you report standard errors.
Author

Sean Lewis

Published

February 15, 2026

The Hook

You run a regression on data from all 50 U.S. states. Or all visits to your website last month. Or every transaction in your company’s database. You fit the model, get coefficients, and report standard errors with p-values.

Stop. Why are you computing standard errors at all?

Standard errors quantify sampling uncertainty — the idea that your sample is a random draw from some larger population, and a different draw would give different estimates. But you don’t have a sample. You have the entire population. There’s no “different draw.” The regression coefficient isn’t an estimate — it’s a fact about the data you have.

And yet, every textbook, every stats package, and every empirical paper defaults to reporting standard errors as though sampling variability is the only game in town. We do it on autopilot.

A landmark paper in Econometrica by Alberto Abadie, Susan Athey, Guido Imbens, and Jeffrey Wooldridge — four of the biggest names in modern econometrics — finally formalizes what’s going on. Their answer: sampling isn’t the only source of uncertainty. Design is another. And which one matters depends on what question you’re actually trying to answer.

The implications are concrete: the standard errors you’re reporting may be too large (overstating uncertainty) or of the wrong kind (capturing the wrong source of variation). And the fix changes how you should think about inference in nearly every applied setting.

The Argument

The paper’s logic unfolds in three clean steps.

Step 1: There are two distinct sources of uncertainty.

Sampling-based uncertainty arises when your data is a random subset of a larger population. You observe 1,000 of 10,000 customers. Different samples give different estimates — standard errors capture that variation.

Design-based uncertainty arises when some variable in your data (typically a treatment) was randomly assigned. Even if you observe the entire population, randomness in who got treated and who didn’t means the estimated treatment effect has uncertainty — not because of sampling, but because a different random assignment would give a different estimate.

Step 2: Most applied settings mix both — and the mixture matters.

Define \(\rho\) as the sampling fraction (what share of the population you observe). When \(\rho\) is small (a thin sample), sampling uncertainty dominates and conventional standard errors are approximately correct. When \(\rho = 1\) (the full population), there’s zero sampling uncertainty — but if you’re estimating a causal effect with random treatment assignment, design-based uncertainty remains.

The key formula decomposes total variance into:

\[V = (1 - \rho) \cdot V_{\text{sampling}} + V_{\text{design}}\]

When \(\rho \to 1\), the sampling term vanishes. What’s left depends entirely on whether your estimand is descriptive (a population summary) or causal (a treatment effect).

Step 3: The punchline.

For descriptive estimands (population regression coefficients, correlations, average outcomes), if you have the full population, there’s no uncertainty. Period. Your standard errors should be zero. The conventional EHW (Eicker-Huber-White) robust standard errors are overstating your uncertainty — sometimes dramatically.

For causal estimands (treatment effects under random assignment), uncertainty persists even with the full population, but the form of the standard errors changes. Conventional standard errors may still be conservative (too large), and the correct design-based variance has a specific structure that depends on treatment effect heterogeneity.

The practical takeaway: you’ve probably been reporting inflated standard errors whenever your data covers most or all of the population of interest.

The Lineage

This paper resolves a tension that’s been simmering in statistics for decades.

The Neyman tradition (1923) — Jerzy Neyman’s original framework for randomized experiments derived standard errors from the randomization distribution — the variation that arises from different possible treatment assignments, not from sampling. This is “design-based” inference in its purest form. It dominated agricultural experiments and survey sampling for most of the 20th century.

The superpopulation tradition (1960s-onward) — As regression became the workhorse of empirical social science, the dominant framework shifted to assuming the data is an i.i.d. sample from an infinite “superpopulation.” This is convenient mathematically — it justifies OLS standard errors, heteroskedasticity-robust (EHW) standard errors, and cluster-robust standard errors through central limit theorems. But it requires believing in a population that, in many applications, is hard to articulate. What is the “superpopulation” that the 50 U.S. states are drawn from?

The finite population gap — Researchers noticed the tension: reporting sampling-based standard errors on census-level data feels wrong, but there was no clean formal framework for what to do instead. Some practitioners quietly set standard errors to zero for population-level descriptive analyses, but without theoretical backing. Others just kept reporting EHW standard errors because that’s what referees expected.

Abadie et al. (2020) fills this gap by building a unified framework where both sampling and design uncertainty coexist, deriving the correct variance formulas for each case, and showing how standard estimators relate to the correct ones.

The paper was enormously influential (earlier versions circulated as “Finite Population Causal Standard Errors” since 2014), and it changed how a generation of applied economists think about reporting inference.

Where the Seminal Meets the Transducer

The seminal contribution is the conceptual framework itself — the decomposition of uncertainty into sampling and design components, and the formal proof that EHW standard errors are conservative. This is the part that changes how you think.

The transducer sections (less central for practitioners) are the detailed proofs for the vector-valued case and some of the more technical asymptotic refinements. These are important for the econometric theory audience but not needed to apply the ideas.

The Deep Dive

The Setup

Consider a finite population of \(N\) units indexed \(i = 1, \ldots, N\). Each unit has potential outcomes \(Y_i(0)\) and \(Y_i(1)\) (what would happen without and with treatment), covariates \(X_i\), and a treatment indicator \(W_i \in \{0, 1\}\).

Two layers of randomness:

  1. Sampling: We observe a random subset \(S\) of size \(n\) from the population (\(\rho = n/N\))
  2. Assignment: Treatment \(W_i\) is randomly assigned within the population

The observed outcome is \(Y_i = Y_i(W_i)\). We want to estimate the slope coefficient \(\beta\) in a regression of \(Y\) on \(W\) (and possibly \(X\)).

The Key Result: Variance Decomposition

┌─────────────────────────────────────────────────────┐
│              TOTAL VARIANCE OF β̂                     │
│                                                      │
│   V(β̂) = V_sampling + V_design                      │
│                                                      │
│   ┌─────────────────────┐  ┌──────────────────────┐ │
│   │   V_sampling         │  │   V_design            │ │
│   │                      │  │                       │ │
│   │ Scales as (1 - ρ)   │  │ Present even when     │ │
│   │                      │  │ ρ = 1 (full pop.)    │ │
│   │ → 0 as ρ → 1        │  │                       │ │
│   │                      │  │ Depends on treatment  │ │
│   │ Captured by EHW SEs  │  │ effect heterogeneity │ │
│   │ (but overstated)     │  │ τᵢ = Y(1) - Y(0)    │ │
│   └─────────────────────┘  └──────────────────────┘ │
└─────────────────────────────────────────────────────┘

The design-based variance has a beautiful structure. Define \(\tau_i = Y_i(1) - Y_i(0)\) as unit \(i\)’s individual treatment effect. The design-based variance of the estimated average treatment effect depends on the variance of \(\tau_i\) across units. If the treatment effect is the same for everyone (\(\tau_i = \tau\) for all \(i\)), the design-based variance is zero — even though treatment was randomly assigned.

This makes intuitive sense: if everyone responds identically to treatment, it doesn’t matter who got treated. Any randomization gives the same answer.

When Does This Matter in Practice?

The paper works through the implications for common scenarios:

Scenario \(\rho\) Estimand Conventional SEs Correct SEs
Survey sample, descriptive Small Population mean ≈ Correct Standard
Survey sample, causal (RCT) Small ATE ≈ Correct Standard
Full population, descriptive 1 Population coefficient Too large Zero
Full population, causal (RCT) 1 ATE Too large Design-based
Large fraction, descriptive ~0.8 Population coefficient Too large Shrunk by \((1-\rho)\)
Large fraction, causal ~0.8 ATE Mildly too large Adjusted

The “full population, descriptive” case is the extreme eye-opener. If you regress state-level GDP on state-level policy variables for all 50 states, and your research question is descriptive (“what is the association between X and Y in these 50 states?”), then your coefficient has zero uncertainty. There’s no population to generalize to — you computed an exact property of the data.

Cluster-Robust Standard Errors

The paper extends the framework to clustered settings — where units are grouped (students within schools, employees within firms) and treatment is assigned at the cluster level. The same logic applies: cluster-robust standard errors, which are already conservative in many settings, become even more conservative when the sampling fraction of clusters is large.

The authors derive the adjustment: replace the standard cluster-robust variance with a version scaled by \((1 - \rho_c)\) where \(\rho_c\) is the fraction of clusters in the sample. With all clusters observed, the sampling component again vanishes.

Rubber-Ducking the Jargon

EHW (Eicker-Huber-White) standard errors — The “robust” standard errors that every stats package computes. They handle heteroskedasticity (non-constant error variance) but are derived under the assumption of random sampling from an infinite population. This paper shows they remain valid (conservative) but can be inefficient (too wide) when \(\rho\) is large.

Superpopulation — An imaginary infinite population your sample is supposedly drawn from. For many datasets (all countries, all states, all employees of a company), defining this superpopulation requires philosophical contortions. The design-based framework avoids this problem.

Finite population correction (FPC) — The \((1 - \rho)\) factor. Well-known in survey statistics since the 1950s but largely ignored in regression analysis. This paper formally integrates FPC into the regression framework.

Neyman variance — The variance of a treatment effect estimator that arises purely from random assignment, not sampling. Named for Jerzy Neyman, who derived it in 1923 for agricultural experiments.

So What?

This paper should change your default behavior as a data scientist in three concrete ways:

First, ask yourself what your estimand is before reporting standard errors. If it’s descriptive and you have the full population, you might not need standard errors at all. The coefficient is just a fact.

Second, ask yourself where your uncertainty comes from. If you’re estimating a treatment effect from a randomized experiment with full population data, your uncertainty is design-based. The standard EHW formula is valid but conservative — you’re paying a precision penalty for no reason.

Third, consider the sampling fraction. Even when \(\rho < 1\), if it’s large (say, you have 80% of the relevant population), your conventional standard errors are inflated by a factor related to \(1/(1 - \rho)\). A simple finite population correction can meaningfully tighten your confidence intervals.

The deeper lesson is philosophical: uncertainty isn’t a property of your estimator. It’s a property of the question you’re asking and the process that generated your data. Getting that right isn’t a technicality — it determines whether your confidence intervals mean anything at all.


Paper: “Sampling-Based versus Design-Based Uncertainty in Regression Analysis” by Alberto Abadie, Susan Athey, Guido W. Imbens, and Jeffrey M. Wooldridge. Econometrica, Vol. 88, No. 1, January 2020, pp. 265-296.


Reproduction & Implementation

Environment Setup

# Core dependencies
pip install numpy>=1.24.0
pip install pandas>=2.0.0
pip install statsmodels>=0.14.0     # OLS, EHW standard errors
pip install scipy>=1.11.0           # Statistical distributions
pip install linearmodels>=5.0       # Panel data / IV estimation
pip install matplotlib>=3.7.0       # Visualization

Pseudo-Code: Finite Population Corrected Standard Errors

import numpy as np
import statsmodels.api as sm

def ehw_standard_errors(X, y):
    """
    Standard Eicker-Huber-White heteroskedasticity-robust SEs.
    (What statsmodels gives you with cov_type='HC1')
    """
    model = sm.OLS(y, sm.add_constant(X)).fit(cov_type='HC1')
    return model.bse, model.cov_params()


def finite_population_corrected_se(X, y, rho, estimand='descriptive'):
    """
    Adjust standard errors for finite population sampling.

    Parameters:
        X:        (n, k) regressor matrix
        y:        (n,)   outcome vector
        rho:      float, sampling fraction n/N (0 < rho <= 1)
        estimand: 'descriptive' or 'causal'

    Returns:
        Corrected standard errors
    """
    model = sm.OLS(y, sm.add_constant(X)).fit(cov_type='HC1')
    V_ehw = model.cov_params()     # EHW variance-covariance matrix

    if estimand == 'descriptive':
        # For descriptive estimands, sampling variance scales by (1 - rho)
        # When rho = 1, variance is exactly 0
        V_corrected = (1 - rho) * V_ehw
        corrected_se = np.sqrt(np.diag(V_corrected))
        return corrected_se

    elif estimand == 'causal':
        # For causal estimands, we need to decompose:
        #   V_total = (1 - rho) * V_sampling + V_design
        #
        # V_design depends on treatment effect heterogeneity.
        # Under constant treatment effects: V_design = 0
        # Under heterogeneous effects: V_design > 0
        #
        # Conservative approach: just apply FPC to sampling component
        # This is a lower bound on the correction
        V_corrected = (1 - rho) * V_ehw
        # Note: True design-based variance requires knowing or estimating
        # treatment effect heterogeneity (see Neyman variance below)
        corrected_se = np.sqrt(np.diag(V_corrected))
        return corrected_se


def neyman_variance_ate(Y, W):
    """
    Design-based (Neyman) variance for the difference-in-means
    estimator of the Average Treatment Effect under complete
    random assignment.

    This is the correct variance when rho = 1 (full population).

    Parameters:
        Y: (N,) observed outcomes
        W: (N,) binary treatment indicator (0 or 1)
    """
    Y1 = Y[W == 1]   # treated outcomes
    Y0 = Y[W == 0]   # control outcomes
    n1 = len(Y1)
    n0 = len(Y0)
    N  = n1 + n0

    s1_sq = np.var(Y1, ddof=1)   # sample variance, treated
    s0_sq = np.var(Y0, ddof=1)   # sample variance, control

    # Neyman variance (conservative — ignores covariance of
    # potential outcomes, which is unobservable)
    V_neyman = s1_sq / n1 + s0_sq / n0

    # Note: This is an upper bound. The exact design-based
    # variance also subtracts Var(tau_i)/N, but tau_i is
    # unobservable (fundamental problem of causal inference)

    return V_neyman


# ---- DEMONSTRATION ----

def demo_fpc_impact():
    """
    Show how conventional SEs compare to FPC-adjusted SEs
    across different sampling fractions.
    """
    np.random.seed(42)

    # Generate a "population" of N=1000
    N = 1000
    X_pop = np.random.randn(N)
    Y_pop = 2.0 + 3.0 * X_pop + np.random.randn(N)  # true beta = 3.0

    print(f"{'rho':>6} | {'EHW SE':>10} | {'FPC SE':>10} | {'Ratio':>8}")
    print("-" * 42)

    for rho in [0.01, 0.1, 0.3, 0.5, 0.8, 0.95, 1.0]:
        n = max(int(rho * N), 20)
        idx = np.random.choice(N, n, replace=False)
        X_sample = X_pop[idx]
        Y_sample = Y_pop[idx]

        ehw_se, _ = ehw_standard_errors(X_sample, Y_sample)
        fpc_se = finite_population_corrected_se(
            X_sample, Y_sample, rho=rho, estimand='descriptive'
        )

        ratio = fpc_se[1] / ehw_se[1] if ehw_se[1] > 0 else 0
        print(f"{rho:>6.2f} | {ehw_se[1]:>10.4f} | {fpc_se[1]:>10.4f} | {ratio:>8.3f}")

    # Expected output: as rho increases, FPC SE shrinks relative to EHW SE
    # At rho=1.0, FPC SE = 0 (descriptive estimand, full population)


# Run demonstration
demo_fpc_impact()

Cluster-Level Finite Population Correction

def cluster_robust_se_fpc(X, y, cluster_ids, rho_clusters):
    """
    Cluster-robust SEs with finite population correction at
    the cluster level.

    Parameters:
        X:             (n, k) regressors
        y:             (n,)   outcomes
        cluster_ids:   (n,)   cluster membership
        rho_clusters:  float, fraction of clusters in sample (G_s / G)
    """
    model = sm.OLS(y, sm.add_constant(X)).fit(
        cov_type='cluster',
        cov_kwds={'groups': cluster_ids}
    )

    V_cluster = model.cov_params()

    # Apply FPC at the cluster level
    V_corrected = (1 - rho_clusters) * V_cluster
    corrected_se = np.sqrt(np.diag(V_corrected))

    return corrected_se, model.bse  # corrected, conventional