COSMOS logo

Hypothesis Testing

Zhaoxia Yu

Load packages, read data

Code

library(tidyverse)
library(ggplot2)

Hypothesis

Scientific investigations often start by expressing a hypothesis.
A hypothesis is a statement about one or more random variables or their associated parameters.
For example, Mackowiak et al (1992) hypothesized that the average normal (i.e., for healthy people) body temperature is less than the widely accepted \(98.6F\).
If we denote the population mean of normal body temperature as \(\mu\), then we can express this hypothesis as \(\mu < 98.6\).

TYPES OF HYPOTHESES:

Null Hypothesis (\(H_0\))

Definition: A statement is often about no effect or no difference, or nothing of interest.
Purpose: Serves as the default or starting assumption.

Alternative Hypothesis (\(H_a\)) or \(H_1\))

Definition: A statement is often about an effect or a difference.
Purpose: Represents what we are trying to find evidence for.

HYPOTHESIS TESTING

We use statistics, known as test statistics, to evaluate our hypotheses.
To determine whether to reject the null hypothesis, we measure the empirical support provided by the observed data against the null hypothesis using some statistics.
A statistic is considered a test statistic if its sampling distribution under the null hypothesis is completely known (either exactly or approximately).
The distribution of test statistics under the null hypothesis is referred to as the null distribution.

Mean

Hypothesis testing for the population mean

Consider hippocampal volume in impaired males, where we want to examine the null hypothesis

\(H_{0}: \mu = 6\) against the alternative hypothesis \(H_A: \mu >6\).

The alternative hypothesis \(H_{A}: \mu >6\) is a one-sided alternative.
One-sided vs two-sided alternatives:
- One-sided: \(H_A: \mu > \mu_0\) or \(H_A: \mu < \mu_0\)
- Two-sided: \(H_A: \mu \neq \mu_0\)

Hypothesis testing for the population mean

We learned the sample mean is a reasonable estimator of the population mean.
The sample mean of hippocampal volume of impaired males is \(\bar x=6.1\).
What does the sample mean tell you? Should you reject \(H_0\) based on the sample mean?
Of course not, because a sample mean alone doesn’t tell us how unusual it is under \(H_0\).

Why Isn’t \(\bar{x} = 6.1\) Enough?

Suppose we want to test: \(H_0: \mu = 6.00\) vs. \(H_1: \mu > 6.00\)
The observed sample mean \(\bar{x} = 6.1\) is higher. But how much higher is “a lot”?
A difference of 0.1 could be:
- A large difference if there’s little variation
- A small difference if there’s a lot of variation
We need to quantify uncertainty around \(\bar{x}\) by considering how \(\bar{X}\) behaves under the null hypothesis \(H_0\).

p-value

We quantify ``how extreme” using the probability of values as or more extreme than the observed value, based on the null distribution in the direction supporting the alternative hypothesis.
This probability is also called the p-value and denoted \(p_{\mathrm{obs}}\).
For the above example, \[\begin{equation*} p = P(\bar{X} \ge \bar{x} | H_{0}), \end{equation*}\]

where \(\bar{x}=6.1\) in this example.

Interpretation of \(p\)-value

The \(p\)-value is the probability of extreme values (as or more extreme than what has been observed) of the test statistic conditional on that the null hypothesis is true.
When the \(p\)-value is small, say 0.01 for example, it is rare to find values as extreme as what we have observed (or more so).
As the \(p\)-value increases, it indicates that there is a good chance to find more extreme values (for the test statistic) than what has been observed. Then, we would be more reluctant to reject the null hypothesis.
A common mistake is to regard the \(p\)-value as the probability of null given the observed test statistic: \(P(H_{0} | \bar{X} = \bar{x})\).

The Null Distribution of \(\bar{X}\) (Known σ)

Suppose \(\sigma = 1\) and sample size is \(n = 25\)
Then, under the null hypothesis \(H_0: \mu = 6.00\):

\[ \bar{X} \sim N\left(\mu = 6.00,\ \frac{\sigma}{\sqrt{n}} = \frac{1}{5}=0.2 \right) \]

\(P(\bar X \ge 6.1 | H_0)\) is the area under the curve to the right of \(\bar{x} = 6.1\) in the null distribution.

\(P(\bar X \ge 6.1 | H_0)\)

Code

library(ggplot2)

# Define parameters
mu0 <- 6.00
se <- 0.2
x_obs <- 6.10

# Create a sequence of x values centered around mu0
x_vals <- seq(mu0 - 4 * se, mu0 + 4 * se, length.out = 1000)

# Compute the density
density_vals <- dnorm(x_vals, mean = mu0, sd = se)

# Create a data frame
df <- data.frame(x = x_vals, y = density_vals)

# Define the tail area (right of observed x)
df$tail <- ifelse(df$x >= x_obs, "Right Tail", "Main")

# Plot
ggplot(df, aes(x, y)) +
  geom_line(color = "darkblue", size = 1) +
  geom_area(data = subset(df, tail == "Right Tail"), aes(x, y),
            fill = "red", alpha = 0.4) +
  geom_vline(xintercept = x_obs, color = "red", linetype = "dashed") +
  annotate("text", x = x_obs + 0.1, y = 0-0.1,
           label = "sample mean = 6.1", color = "red", hjust = 0) +
  annotate("text", x = x_obs + 0.1, y = max(df$y) * 0.9,
           label = "p=0.31", color = "red", hjust = 0) +
  labs(title = "Sampling Distribution of Sample Mean (under H_0)",
       subtitle = expression(paste("N(", mu[0], " = 6.00, SE = 0.2)")),
       x = expression(bar(X)), y = "Density") +
  theme_minimal(base_size = 14)

Z-score

It is easier to use the z-score \[Z=\frac{\bar X - \mu_0}{\sigma/\sqrt{n}}, z=\frac{\bar x - \mu_0}{\sigma/\sqrt{n}}\]
This is because because
- \(P(\bar X \ge 6.1 | H_0) =P(Z\ge 0.5)\approx 0.31\) where \(Z\sim N(0,1)\).

\(P(Z \ge z)\)

Code

library(ggplot2)

# Define parameters
mu0 <- 0
se <- 1
x_obs <- 0.5

# Create a sequence of x values centered around mu0
x_vals <- seq(mu0 - 4 * se, mu0 + 4 * se, length.out = 1000)

# Compute the density
density_vals <- dnorm(x_vals, mean = mu0, sd = se)

# Create a data frame
df <- data.frame(x = x_vals, y = density_vals)

# Define the tail area (right of observed x)
df$tail <- ifelse(df$x >= x_obs, "Right Tail", "Main")

# Plot
ggplot(df, aes(x, y)) +
  geom_line(color = "darkblue", size = 1) +
  geom_area(data = subset(df, tail == "Right Tail"), aes(x, y),
            fill = "red", alpha = 0.4) +
  geom_vline(xintercept = x_obs, color = "red", linetype = "dashed") +
  annotate("text", x = x_obs + 0.1, y = 0-0.1,
           label = "z = 0.5", color = "red", hjust = 0) +
  annotate("text", x = x_obs + 0.1, y = max(df$y) * 0.9,
           label = "p=0.31", color = "red", hjust = 0) +
  labs(title = "Sampling Distribution of Sample Mean (under H_0)",
       subtitle = expression(paste("N(", mu[0], " = 0, SE = 1)")),
       x = expression(bar(Z)), y = "Density") +
  theme_minimal(base_size = 14)

What If σ Is Unknown?

In real life, we rarely know the population standard deviation \(\sigma\)
Instead, we estimate it with the sample standard deviation \(s\)
This changes the distribution of our test statistic:
- We no longer use the standard normal (\(z\))
- We use the \(t\) distribution

The \(t\) Distribution

If \(\sigma\) is unknown, and we use \(s\) to estimate \(\sigma\), then:

\[ T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \sim t_{n - 1} \]

The \(t\) distribution accounts for extra uncertainty from estimating \(\sigma\)
It’s wider than the normal, especially with small \(n\)
As \(n \to \infty\), \(t\) approaches the standard normal.

Why This Matters

Whether you use \(z\) or \(t\) affects:
- Which critical value you compare to
- How you compute the \(p\)-value
This distinction becomes crucial when sample size is small

One-sample t-test

So far, we have assumed that the population variance \(\sigma^{2}\) is known.
In reality, \(\sigma^{2}\) is almost always unknown, and we need to estimate it from the data.
As before, we estimate \(\sigma^{2}\) using the sample variance \(S^{2}\).
Similar to our approach for finding confidence intervals, we account for this additional source of uncertainty by using the \(t\)-distribution with \(n-1\) degrees of freedom instead of the standard normal distribution.
The hypothesis testing procedure is then called the t-test.

t-test

Hippocampus volume in impaired males

Code

#read data
alzheimer_subset <- read_csv("../data/alzheimer_data.csv") %>% select(diagnosis, lhippo, rhippo, age, female) %>% mutate(hippo=lhippo+rhippo) %>% filter(diagnosis==1, female==0)
glimpse(alzheimer_subset)

Rows: 327
Columns: 6
$ diagnosis <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ lhippo    <dbl> 2.9342, 3.5100, 2.5872, 3.2445, 1.8555, 3.5754, 3.2100, 2.56…
$ rhippo    <dbl> 3.2890, 3.7000, 2.3688, 3.1980, 2.6565, 3.7621, 3.6000, 2.46…
$ age       <dbl> 75, 78, 85, 79, 77, 79, 83, 83, 77, 66, 78, 73, 72, 82, 77, …
$ female    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ hippo     <dbl> 6.2232, 7.2100, 4.9560, 6.4425, 4.5120, 7.3375, 6.8100, 5.03…

Code

#calculate mean
mean(alzheimer_subset$hippo)

[1] 6.105257

Code

#compute t-test using mean and sd
z=mean(alzheimer_subset$hippo-6)/sd(alzheimer_subset$hippo)*sqrt(nrow(alzheimer_subset))

#compute one-sided p
pnorm(z, lower.tail = FALSE)

[1] 0.02269207

Code

#use t.test (one-sample t-test)
t.test(alzheimer_subset$hippo, mu=6, alternative="greater")


    One Sample t-test

data:  alzheimer_subset$hippo
t = 2.0011, df = 326, p-value = 0.02311
alternative hypothesis: true mean is greater than 6
95 percent confidence interval:
 6.018491      Inf
sample estimates:
mean of x 
 6.105257

One-sample t-test

Using the observed values of \(\bar{X}\) and \(S\), the observed value of the test statistic is obtained as follows: \(t = \frac{\bar{x} - \mu_{0}}{s/\sqrt{n}}\).
We refer to \(t\) as the \(t\)-score. Then, \[\begin{array}{l@{\quad}l} \mbox{if}\ H_{A}: \mu < \mu _0, & p_{\mathrm{obs}} = P(T \leq t), \\ \mbox{if}\ H_{A}: \mu > \mu _0, & p_{\mathrm{obs}} = P(T \geq t ), \\ \mbox{if}\ H_{A}: \mu \ne \mu _0, & p_{\mathrm{obs}} = 2 \times P\bigl(T \geq | t | \bigr), \end{array}\]
Here, \(T\) has a \(t\)-distribution with \(n-1\) degrees of freedom, and \(t\) is our observed \(t\)-score.

Proportion

Hypothesis testing for the proportion proportion

For a binary random variable \(X\) with possible values 0 and 1, we are typically interested in evaluating hypotheses regarding the population proportion of the outcome of interest, denoted as \(X=1\).
The population proportion is the same as the population mean for such binary variables.
If the sample size is large enough, we can assume that the population proportion is approximately normal according to CLT.
So we follow the same procedure as described above.

Hypothesis testing for population proportion

Note that for binary random variables, population variance is \[\sigma^{2}=\mu(1-\mu)\]
Therefore, by setting \(\mu=\mu_{0}\) according to the null hypothesis, we also specify the population variance as \[\sigma^{2} = \mu_{0}(1-\mu_{0})\]
If we assume that the null hypothesis is true, we have \[\begin{equation*} \bar{p}| H_{0} \dot \sim N\bigl(\mu_{0}, \mu_{0}(1-\mu_{0})/n\bigr). \end{equation*}\]