Zhaoxia Yu
Scientific investigations often start by expressing a hypothesis.
A hypothesis is a statement about one or more random variables or their associated parameters.
For example, Mackowiak et al (1992) hypothesized that the average normal (i.e., for healthy people) body temperature is less than the widely accepted \(98.6F\).
If we denote the population mean of normal body temperature as \(\mu\), then we can express this hypothesis as \(\mu < 98.6\).
Null Hypothesis (\(H_0\))
Definition: A statement is often about no effect or no difference, or nothing of interest.
Purpose: Serves as the default or starting assumption.
Alternative Hypothesis (\(H_a\)) or \(H_1\))
Definition: A statement is often about an effect or a difference.
Purpose: Represents what we are trying to find evidence for.
We use statistics, known as test statistics, to evaluate our hypotheses.
To determine whether to reject the null hypothesis, we measure the empirical support provided by the observed data against the null hypothesis using some statistics.
A statistic is considered a test statistic if its sampling distribution under the null hypothesis is completely known (either exactly or approximately).
The distribution of test statistics under the null hypothesis is referred to as the null distribution.
Mean
\(H_{0}: \mu = 6\) against the alternative hypothesis \(H_A: \mu >6\).
We quantify ``how extreme” using the probability of values as or more extreme than the observed value, based on the null distribution in the direction supporting the alternative hypothesis.
This probability is also called the p-value and denoted \(p_{\mathrm{obs}}\).
For the above example, \[\begin{equation*} p = P(\bar{X} \ge \bar{x} | H_{0}), \end{equation*}\]
where \(\bar{x}=6.1\) in this example.
The \(p\)-value is the probability of extreme values (as or more extreme than what has been observed) of the test statistic conditional on that the null hypothesis is true.
When the \(p\)-value is small, say 0.01 for example, it is rare to find values as extreme as what we have observed (or more so).
As the \(p\)-value increases, it indicates that there is a good chance to find more extreme values (for the test statistic) than what has been observed. Then, we would be more reluctant to reject the null hypothesis.
A common mistake is to regard the \(p\)-value as the probability of null given the observed test statistic: \(P(H_{0} | \bar{X} = \bar{x})\).
\[ \bar{X} \sim N\left(\mu = 6.00,\ \frac{\sigma}{\sqrt{n}} = \frac{1}{5}=0.2 \right) \]
library(ggplot2)
# Define parameters
mu0 <- 6.00
se <- 0.2
x_obs <- 6.10
# Create a sequence of x values centered around mu0
x_vals <- seq(mu0 - 4 * se, mu0 + 4 * se, length.out = 1000)
# Compute the density
density_vals <- dnorm(x_vals, mean = mu0, sd = se)
# Create a data frame
df <- data.frame(x = x_vals, y = density_vals)
# Define the tail area (right of observed x)
df$tail <- ifelse(df$x >= x_obs, "Right Tail", "Main")
# Plot
ggplot(df, aes(x, y)) +
geom_line(color = "darkblue", size = 1) +
geom_area(data = subset(df, tail == "Right Tail"), aes(x, y),
fill = "red", alpha = 0.4) +
geom_vline(xintercept = x_obs, color = "red", linetype = "dashed") +
annotate("text", x = x_obs + 0.1, y = 0-0.1,
label = "sample mean = 6.1", color = "red", hjust = 0) +
annotate("text", x = x_obs + 0.1, y = max(df$y) * 0.9,
label = "p=0.31", color = "red", hjust = 0) +
labs(title = "Sampling Distribution of Sample Mean (under H_0)",
subtitle = expression(paste("N(", mu[0], " = 6.00, SE = 0.2)")),
x = expression(bar(X)), y = "Density") +
theme_minimal(base_size = 14)
It is easier to use the z-score \[Z=\frac{\bar X - \mu_0}{\sigma/\sqrt{n}}, z=\frac{\bar x - \mu_0}{\sigma/\sqrt{n}}\]
This is because because
library(ggplot2)
# Define parameters
mu0 <- 0
se <- 1
x_obs <- 0.5
# Create a sequence of x values centered around mu0
x_vals <- seq(mu0 - 4 * se, mu0 + 4 * se, length.out = 1000)
# Compute the density
density_vals <- dnorm(x_vals, mean = mu0, sd = se)
# Create a data frame
df <- data.frame(x = x_vals, y = density_vals)
# Define the tail area (right of observed x)
df$tail <- ifelse(df$x >= x_obs, "Right Tail", "Main")
# Plot
ggplot(df, aes(x, y)) +
geom_line(color = "darkblue", size = 1) +
geom_area(data = subset(df, tail == "Right Tail"), aes(x, y),
fill = "red", alpha = 0.4) +
geom_vline(xintercept = x_obs, color = "red", linetype = "dashed") +
annotate("text", x = x_obs + 0.1, y = 0-0.1,
label = "z = 0.5", color = "red", hjust = 0) +
annotate("text", x = x_obs + 0.1, y = max(df$y) * 0.9,
label = "p=0.31", color = "red", hjust = 0) +
labs(title = "Sampling Distribution of Sample Mean (under H_0)",
subtitle = expression(paste("N(", mu[0], " = 0, SE = 1)")),
x = expression(bar(Z)), y = "Density") +
theme_minimal(base_size = 14)
\[ T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \sim t_{n - 1} \]
So far, we have assumed that the population variance \(\sigma^{2}\) is known.
In reality, \(\sigma^{2}\) is almost always unknown, and we need to estimate it from the data.
As before, we estimate \(\sigma^{2}\) using the sample variance \(S^{2}\).
Similar to our approach for finding confidence intervals, we account for this additional source of uncertainty by using the \(t\)-distribution with \(n-1\) degrees of freedom instead of the standard normal distribution.
The hypothesis testing procedure is then called the t-test.
Rows: 327
Columns: 6
$ diagnosis <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ lhippo <dbl> 2.9342, 3.5100, 2.5872, 3.2445, 1.8555, 3.5754, 3.2100, 2.56…
$ rhippo <dbl> 3.2890, 3.7000, 2.3688, 3.1980, 2.6565, 3.7621, 3.6000, 2.46…
$ age <dbl> 75, 78, 85, 79, 77, 79, 83, 83, 77, 66, 78, 73, 72, 82, 77, …
$ female <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ hippo <dbl> 6.2232, 7.2100, 4.9560, 6.4425, 4.5120, 7.3375, 6.8100, 5.03…
[1] 6.105257
[1] 0.02269207
One Sample t-test
data: alzheimer_subset$hippo
t = 2.0011, df = 326, p-value = 0.02311
alternative hypothesis: true mean is greater than 6
95 percent confidence interval:
6.018491 Inf
sample estimates:
mean of x
6.105257
Using the observed values of \(\bar{X}\) and \(S\), the observed value of the test statistic is obtained as follows: \(t = \frac{\bar{x} - \mu_{0}}{s/\sqrt{n}}\).
We refer to \(t\) as the \(t\)-score. Then, \[\begin{array}{l@{\quad}l} \mbox{if}\ H_{A}: \mu < \mu _0, & p_{\mathrm{obs}} = P(T \leq t), \\ \mbox{if}\ H_{A}: \mu > \mu _0, & p_{\mathrm{obs}} = P(T \geq t ), \\ \mbox{if}\ H_{A}: \mu \ne \mu _0, & p_{\mathrm{obs}} = 2 \times P\bigl(T \geq | t | \bigr), \end{array}\]
Here, \(T\) has a \(t\)-distribution with \(n-1\) degrees of freedom, and \(t\) is our observed \(t\)-score.
Proportion
For a binary random variable \(X\) with possible values 0 and 1, we are typically interested in evaluating hypotheses regarding the population proportion of the outcome of interest, denoted as \(X=1\).
The population proportion is the same as the population mean for such binary variables.
If the sample size is large enough, we can assume that the population proportion is approximately normal according to CLT.
So we follow the same procedure as described above.
Note that for binary random variables, population variance is \[\sigma^{2}=\mu(1-\mu)\]
Therefore, by setting \(\mu=\mu_{0}\) according to the null hypothesis, we also specify the population variance as \[\sigma^{2} = \mu_{0}(1-\mu_{0})\]
If we assume that the null hypothesis is true, we have \[\begin{equation*} \bar{p}| H_{0} \dot \sim N\bigl(\mu_{0}, \mu_{0}(1-\mu_{0})/n\bigr). \end{equation*}\]