COSMOS logo

Estimation

Zhaoxia Yu

Parameter estimation

For a population, two important parameters are population mean and population variance, denoted as \(\mu\) and \(\sigma^{2}\) respectively, of a random variable.
These quantities are unknown in general.
We refer to these unknown quantities parameters.
Estimation refers to the process of guessing the unknown value of a parameter (e.g., population mean) using the observed data.

Point estimation vs. interval estimation

Sometimes we only provide a single value as our estimate: sample mean for population mean and sample variance for population variance.
This is called point estimation.
We use \(\hat{\mu}\) and \(\hat{\sigma}^{2}\) to denote the point estimates for \(\mu\) and \(\sigma^{2}\).
Point estimates do not reflect our uncertainty.

Point estimation vs. interval estimation

To address this issue, we can present our estimates in terms of a range of possible values (as opposed to a single value).
This is called interval estimation.

Estimating population mean

Given \(n\) observed values, \(X_{1}, X_{2}, \ldots, X_{n}\), from the population, we can estimate the population mean \(\mu\) with the sample mean: \[ \begin{equation*} \bar{X} = \frac{\sum_{i=1}^{n}X_{i}}{n}. \end{equation*} \]
In this case, we say that \(\bar{X}\) is an estimator for \(\mu\).
The estimator itself is considered as a random variable since it value can change.

Estimating population mean

We usually have only one sample of size \(n\) from the population \(x_{1}, x_{2}, \ldots, x_{n}\).
Therefore, we only have one value for \(\bar{X}\), which we denote \(\bar{x}\): \[ \begin{equation*} \bar{x} = \frac{\sum_{i=1}^{n}x_{i}}{n} \end{equation*} \]

Law of Large Numbers (LLN)

The Law of Large Numbers (LLN) indicates that (under some general conditions such as independence of observations) the sample mean converges to the population mean, \(\bar{X}_{n} \to\mu\), as the sample size \(n\) increases, \(n \to\infty\).
Informally, this means that the difference between the sample mean and the population mean tends to become smaller and smaller as we increase the sample size.

Law of Large Numbers (LLN)

The Law of Large Numbers provides a theoretical justification for the use of sample mean as an estimator for the population mean.
The Law of Large Numbers is true regardless of the underlying distribution of the random variable.

Law of Large Numbers (LLN)

Suppose the true population mean for normal body temperature is 98.4F.

Law of Large Numbers (LLN)

The Law of Large Numbers is true regardless of the underlying distribution of the random variable.
Therefore, it justifies using the sample mean \(\bar{X}\) not only to estimate the population mean for continuous random variables, but also for discrete random variables, whose values are counts and for discrete binary variables, whose possible values are 0 and 1 only.

Law of Large Numbers (LLN)

For count variables, the mean is usually referred to as the rate (e.g., rate of traffic accidents).
For binary random variables, the mean is usually referred to as the proportion of the outcome of interest (denoted as 1).

Estimating population variance

Given \(n\) randomly sampled values \(X_{1}, X_{2}, \ldots, X_{n}\) from the population and their corresponding sample mean \(\bar{X}\), we estimate the population variance as follows: \[ \begin{equation*} S^{2} = {\frac{\sum_{i=1}^{n}(X_{i} - \bar{X})^{2}}{n-1}}. \end{equation*} \]
The sample standard deviation \(S\) (i.e., square root of \(S^2\)) is our estimator of the population standard deviation \({\sigma}\).

Estimating population variance

We regard the estimator \(S^{2}\) as a random variable.
In practice, we usually have one set of observed values, \(x_{1}, x_{2}, \ldots, x_{n}\), and therefore, only one value for \(S^{2}\):

\[ \begin{equation*} s^{2} = \frac{\sum_{i=1}^{n}(x_{i} - \bar{x})^{2}}{n-1}. \end{equation*} \]

Sampling distribution

The value of estimators discussed so far (and all estimators in general) depend on the specific sample selected from the population.
If we repeat our sampling, we are likely to obtain a different value for an estimator.
Therefore, we regard the estimators themselves as random variables.

Sampling distribution

As the result, similar to any other random variable, we can talk about their probability distribution.
Probability distributions for estimators are called sampling distributions.
We focus on the sampling distribution of the sample mean \(\bar{X}\).

Sampling distribution

We start by assuming that the random variable of interest, \(X\), has a normal \(N(\mu, \sigma^{2})\) distribution.
Further, we assume that the population variance \(\sigma^{2}\) is known, so the only parameter we want to estimate is \(\mu\).
In this case: \[\begin{equation*} \bar{X} \sim N\bigl(\mu, \sigma^{2}/n\bigr). \end{equation*}\] where \(n\) is the sample size.

Sampling distribution

Left: The (unknown) theoretical distribution of blood pressure: \(X\sim N(125, 15^2)\). Right: The density curve for the sampling distribution \(\bar{X} \sim N(125, 15^{2}/100)\) along with the histogram of 1000 sample means.

Confidence intervals for the population mean

It is common to express our point estimate along with its standard deviation to show how much the estimate could vary if different members of population were selected as our sample.
Alternatively, we can use the point estimate and its standard deviation to express our estimate as a range (interval) of possible values for the unknown parameter.

Confidence intervals for the population mean

We know that \(\bar{X} \sim N(\mu, \sigma^2/n)\).
Suppose that \(\sigma^2 = 15^2\) and sample size is \(n=100\).
Following the 68-95-99.7 rule, with 0.95 probability, the value of \(\bar{X}\) is within 2 standard deviations from its mean, \(\mu\),

\[ \mu- 2\times 1.5 \le\bar{X} \le\mu + 2 \times 1.5. \]

Confidence intervals for the population mean

With probability 0.95, \(\mu - 3 \le \bar X \le \mu + 3\)
We are, however, interested in estimating the population mean \(\mu\).
Rearrange the above inequality \[\bar{X} - 3 \le\mu\le\bar{X} + 3.\]
This gives a \(95\%\) confidence interval of \(\mu\): \[[\bar{X} - 3, \bar{X} + 3].\]

Confidence intervals for the population mean

For the blood pressure example, suppose that we have a sample of \(n=100\) people and that the sample mean is \(\bar{x}=123\). Therefore, we have one interval as follows: \[[123 - 3, 123 + 3] = [120, 126].\]
We refer to this interval as our 95% confidence interval for the population mean \(\mu\).

Interpretation of confidence interval

\(z\)-critical value

In general, when the population variance \(\sigma^{2}\) is known, the 95% confidence interval for \(\mu\) is obtained as follows: \[\bigl[\bar{x} - 2 \times\sigma/ \sqrt{n},\ \bar{x} + 2 \times\sigma/\sqrt{n}\,\bigr]\]
We need a wider interval if we want to be more confident, e.g., 99.7%: \[\bigl[\bar{x} - 3 \times\sigma/ \sqrt{n},\ \bar{x} + 3 \times\sigma/\sqrt{n}\,\bigr]\]

\(z\)-critical value

In general, the confidence interval for the population mean at \(c\) (e.g., 0.8) confidence level is

\[\bigl[ \bar {x} - z_{crit}\times \sigma / \sqrt{n}, \bar {x} + z_{crit} \times \sigma / \sqrt{n}\, \bigr]\]

\(z\)-critical value

We calculated \(z_{crit}\) by finding the \((1+c)/2\) quantile from the standard normal distribution.

Central limit theorem

So far, we have assumed that the random variable has normal distribution, so the sampling distribution of \(\bar{X}\) is normal too.
If the random variable is not normally distributed, the sampling distribution of \(\bar{X}\) can be considered as approximately normal using (assuming the samples are independent and the sample size is large enough) the central limit theorem (CLT): If the random variable \(X\) has the population mean \(\mu\) and the population variance \(\sigma^{2}\), then the sampling distribution of \(\bar{X}\) is approximately normal with mean \(\mu\) and variance \(\sigma^{2}/n\).

Population Proportion

Note that CLT is true regarding the underlying distribution of \(X\) so we can use it for random variables with Bernoulli and Binomial distributions too.
For binary random variables, we are interested in estimating the population proportion, \(\mu\).
For the corresponding Bernoulli distribution, the mean is \(\mu\) and the variance is \(\sigma^2 = \mu(1-\mu)\).
Therefore, if estimate the \(\mu\), we estimate both mean and variance (i.e., theoretical population mean and population variance).

Population Proportion

Given the sample proportion \(p\), the confidence interval for the population proportion is obtained as follows:

\[[p - z_{crit} \times \sqrt{p(1-p)/n}, \quad p + z_{crit} \times \sqrt{p(1-p)/n}]\]

Confidence Interval When the Population Variance is Unknown

So far, we have assumed the population variance, \(\sigma^{2}\), of the random variable is known or can be directly calculated using the mean for binary variables.
However, we almost always need to estimate \(\sigma^{2}\) along with the population mean \(\mu\).
For this, we use the sample variance \(s^{2}\).
As a result, the standard deviation for \(\bar{X}\) is estimated to be \(s/\sqrt{n}\).

Confidence Interval When the Population Variance is Unknown

To find confidence intervals for the population mean when the population variance is unknown, we follow similar steps as described above, but
- replace \(\sigma\) with \(s\),
- replace \(z_{\mathrm{crit}}\) (based on the standard normal distribution) with \(t_{\mathrm{crit}}\) (\(t\)-distribution with \(n-1\) degrees of freedom).
The confidence interval for the population mean at \(c\) confidence level is \[\bigl[\bar{x} - t_{\mathrm{crit}}\times s / \sqrt{n}, \ \bar{x} + t_{\mathrm{crit}} \times s / \sqrt{n}\,\bigr].\]

Margin of error

We refer to \(s/\sqrt{n}\) as the standard error of the sample mean \(\bar{X}\).
We can write the confidence interval as \[\bar{x} \pm t_{\mathrm{crit}}\times SE\]
The term \(t_{\mathrm{crit}}\times SE\) is called the margin of error for the given confidence level.
It is common to present interval estimates for a given confidence level as \[\textrm{Point estimate} \pm\textrm{Margin of error.}\]

Example

Estimate the volume of hippocampus for women between 40 and 50 years old

Code

#read data
alzheimer_data <- read.csv('data/alzheimer_data.csv')
dim(alzheimer_data)

[1] 2700   57

Code

#subset
alzheimer_subset <- alzheimer_data[alzheimer_data$age<=50 & alzheimer_data$age>40 & alzheimer_data$female==1, ]
#alternatively,
alzheimer_subset <- alzheimer_data %>%
  filter(age <= 50, age > 40, female == 1)

dim(alzheimer_subset)

[1] 70 57

Code

hippo=alzheimer_subset$lhippo + alzheimer_subset$rhippo

#estimate of mean
mean(hippo)

[1] 6.470666

Code

#standard error
sd(hippo)/sqrt(length(hippo))

[1] 0.1023539

Code

#a 95% c.i. using z-critical value
mean(hippo) - 2*sd(hippo)/sqrt(length(hippo))

[1] 6.265958

Code

mean(hippo) + 2*sd(hippo)/sqrt(length(hippo))

[1] 6.675374

Code

mean(hippo) - qnorm(0.975)*sd(hippo)/sqrt(length(hippo))

[1] 6.270056

Code

mean(hippo) + qnorm(0.975)*sd(hippo)/sqrt(length(hippo))

[1] 6.671276

Code

#a 95% c.i. using t-critical value
mean(hippo) - qt(0.975, df=length(hippo)-1)*sd(hippo)/sqrt(length(hippo))

[1] 6.266475

Code

mean(hippo) + qt(0.975, df=length(hippo)-1)*sd(hippo)/sqrt(length(hippo))

[1] 6.674856