In statistics, we often find ourselves gathering information from a sample in order to try and garner insights into populations. This is the basis of inferential statistics, and it starts with understanding estimations and confidence intervals. In a sense, we’re using measurements from the data we have in order to make educated guesses about the larger picture.

Estimation
Suppose we want to get an idea of what the average birth weight is for newborns born in the United States in 2020. Obviously, we can’t gather information on every single birth. So how can we do this?
Let’s say we randomly sample 500 newborns and record their weights at birth. We can take those observations and calculate the sample mean (let’s say 7.7 pounds). Assuming this is a truly representative sample, one could argue that this is a decent estimation of the average newborn’s birth weight. This is known as a point estimate. The real question though, is this: how sure are we that this is an accurate estimate of the true population parameter?
The problem with this is that we’ve made a big assumption here. We’re asserting that the 500 newborns that we sampled are a perfectly representative sample of the millions of births that occur each year. In practice, we could re-sample this data and repeat the calculation and get a sample mean of 7.79, 7.3, 9.2, or even 6.5487 pounds. In fact, it’s impossible to say with 100% certainty that the population mean birth weight is exactly 7.7 pounds. This is where the concepts of margin of error and confidence interval comes in.
Margin of Error and Confidence Intervals
The variation in birth weights discussed previously illustrates margin of error. In a nutshell, margin of error simply describes how far our observed result is off the expected value. This ties into confidence intervals and how we describe them.
When we talk about a confidence interval, we are talking about estimating the likelihood that the true population parameter falls within a given margin of error above and below our point estimate. This is expressed in terms of a confidence level, which is the probability that the true parameter lies within the interval, and is usually denoted as a percentage (say, 95%).
Tying It All Together
Using our birth weight example, we could analyze the data we gathered and proclaim that our data shows with 95% confidence that the population mean birth weight for all newborns in the United States in 2020 was between 7.5 and 7.9 pounds. The margin of error for the confidence interval is half the total width of the interval, in this case, 0.2 pounds. What this means is that someone could re-sample the population in the same way, again and again, and there is a 95% probability that the sample mean for every dataset will fall within 0.2 pounds above or below our estimated mean of 7.7 pounds, or in other words, between 7.5 and 7.9 pounds.
Leave a comment