COSMOS logo

Hypothesis Testing II

Zhaoxia Yu

Load packages, read data

Code
library(tidyverse)
library(ggplot2)
alzheimer_data <- read.csv('data/alzheimer_data.csv') %>% 
  select(id, diagnosis, age, educ, female, height, weight, lhippo, rhippo) %>% 
  mutate(diagnosis = as.factor(diagnosis), 
         female = as.factor(female),
         hippo=lhippo + rhippo)


alzheimer_healthy <- read.csv('data/alzheimer_data.csv') %>% 
  select(id, diagnosis, age, educ, female, height, weight, lhippo, rhippo) %>% 
  mutate(diagnosis = as.factor(diagnosis),
         female = as.factor(female), 
         hippo=lhippo+rhippo) %>% 
  filter(diagnosis==0)

A LIST OF QUESTIONS

  • Ex 1. Is the hippocampal volume of healthy males greater than 6 cm\(^3\)? (Done)

  • Ex 2. Is the proportion of healthy adults with a right hippocampal volume > 3cm\(^3\) equal to 50%? (can be done by using the method for testing a proportion: z-test)

  • Ex 3. Do healthy men and women have the same mean left hippocampal volume?

  • Ex 4. Is the left hippocampal volume equal to the right hippocampal volume in humans?

  • Ex 5. Is having a large hippocampus (> 7 cm\(^3\)) associated with gender?

  • Ex 6. Is hippocampal volume correlated with age in healthy adults?

  • Ex 7. Does left hippocampal volume differ across diagnostic groups?

EXAMPLE 3: Two-Sample t-test

Ex 3. Do healthy men and women have the same mean left hippocampal volume?

  • Categorical (gender) and Numerical (Left Hippocampus) :

    • Side by Side Boxplot

    • Side by Side Violin

    • 2 Histograms

Example 3: Visualization – boxplot

Code
ggplot(data= alzheimer_healthy,
       mapping= aes(x= female, 
                    y=lhippo,
                    fill=female)) +
  geom_boxplot()+
  labs(
    title="Boxplot of lhippo by Gender",
    x="Gender",
    y="LHippo volume (cm^3)") +  
  theme_minimal()

Example 3: Visualization – violin

Code
ggplot(data= alzheimer_healthy,
       mapping= aes(x= female, 
                    y=lhippo,
                    fill=female)) +
  geom_violin()+
  labs(
    title="Violin of lhippo by Gender",
    x="Gender",
    y="LHippo volume (cm^3)") +  
  theme_minimal()

Example 3: Visualization – histogram

Code
ggplot(data= alzheimer_healthy,
       mapping= aes(x=lhippo,
                    fill=female)) +
  geom_histogram(bins=35)+
  labs(
    title="Histogram of Lhippo by Gender",
    x="LHippo volume (cm^3)",
    y="Count") 

Example 3: use two-sample t-test

Code
t.test(alzheimer_healthy$lhippo[alzheimer_healthy$female==0], alzheimer_healthy$lhippo[alzheimer_healthy$female==1])

    Welch Two Sample t-test

data:  alzheimer_healthy$lhippo[alzheimer_healthy$female == 0] and alzheimer_healthy$lhippo[alzheimer_healthy$female == 1]
t = 11.756, df = 971.28, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.2088777 0.2925882
sample estimates:
mean of x mean of y 
 3.345749  3.095016 
Code
#alternative way to do the same test
t.test(alzheimer_healthy$lhippo ~ alzheimer_healthy$female)

    Welch Two Sample t-test

data:  alzheimer_healthy$lhippo by alzheimer_healthy$female
t = 11.756, df = 971.28, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 0.2088777 0.2925882
sample estimates:
mean in group 0 mean in group 1 
       3.345749        3.095016 

Example 4

  • Research Question: Is the left hippocampal volume equal to the right hippocampal volume in humans?

  • Null Hypothesis: Human’s left hippocampus volume is the same as the right hippocampus volume.

  • Alternative Hypothesis: Human’s left hippocampus volume is not the same as the right hippocampus volume.

\[H_0: \mu_L = \mu_R \mbox{ vs } \mu_L \not= \mu_R\]

  • Method: paired t-test

Example 4

  • It is attempting to perform a two-sample t-test
  • However, one fundamental assumption in the two-sample problem is that the two samples should be independent, such as the female group vs the male group.
  • Here we are looking at two features (left vs right hippocampus) of the same subjects. The correct test is
  • paired t-test, which equivalent to perform a one-sample t-test using the differences

Example 4

Code
t.test(alzheimer_data$rhippo, alzheimer_data$lhippo, paired = T)

    Paired t-test

data:  alzheimer_data$rhippo and alzheimer_data$lhippo
t = 17, df = 2699, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.07008463 0.08836085
sample estimates:
mean difference 
     0.07922274 
Code
#equivalently, 
t.test(alzheimer_data$rhippo- alzheimer_data$lhippo)

    One Sample t-test

data:  alzheimer_data$rhippo - alzheimer_data$lhippo
t = 17, df = 2699, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 0.07008463 0.08836085
sample estimates:
 mean of x 
0.07922274 

Example 5

  • Research Question: Is having a large hippocampus (> 7 cm\(^3\)) associated with gender?

  • This is a category variable vs category variable problem

  • Method: chi-squared test

Code
table(alzheimer_data$hippo>7,
      alzheimer_data$female)
       
           0    1
  FALSE  843 1374
  TRUE   308  175
Code
chisq.test(table(alzheimer_data$rhippo>7,
      alzheimer_data$female))

    Chi-squared test for given probabilities

data:  table(alzheimer_data$rhippo > 7, alzheimer_data$female)
X-squared = 58.668, df = 1, p-value = 1.866e-14

Example 6

  • Research Question: hippocampal volume correlated with age in healthy adults?

  • Null Hypothesis: hippocampal volume and age are not correlated.

  • Alternative Hypothesis: hippocampal volume and age are not correlated.

\[H_0: \rho=0 \mbox{ vs } H_1: \rho\not=0\]

  • Method: correlation test, more general, linear regression.

Example 6: Visualization

Code
#plot(alzheimer_healthy$age, alzheimer_healthy$hippo)
ggplot(data= alzheimer_healthy,
       mapping= aes(x= age, 
                    y= hippo)) +
  geom_point() +
  labs(
    title="Scatter plot of hippocampal volume vs age",
    x="Age (years)",
    y="Hippocampal volume (cm^3)") +  
  theme_minimal()

Discussion: is there a linear trend? are there regions of concerns?

Example 6

Code
cor.test(alzheimer_healthy$age, alzheimer_healthy$hippo)

    Pearson's product-moment correlation

data:  alzheimer_healthy$age and alzheimer_healthy$hippo
t = -15.125, df = 1532, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.4032192 -0.3160971
sample estimates:
      cor 
-0.360444 
Code
#alternatively, we can use linear regression
summary(lm(alzheimer_healthy$hippo ~ alzheimer_healthy$age))

Call:
lm(formula = alzheimer_healthy$hippo ~ alzheimer_healthy$age)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.57311 -0.48368 -0.01818  0.49369  2.80738 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)            8.00629    0.10570   75.75   <2e-16 ***
alzheimer_healthy$age -0.02329    0.00154  -15.12   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7209 on 1532 degrees of freedom
Multiple R-squared:  0.1299,    Adjusted R-squared:  0.1294 
F-statistic: 228.8 on 1 and 1532 DF,  p-value: < 2.2e-16

Example 7

  • Research Question: Does left hippocampal volume differ across diagnostic groups?

  • Null Hypothesis:

  • Alternative Hypothesis:

\[H_0: \mu_0=\mu_1=\mu_2 \mbox{ vs } H_1: \mbox{ at least two means are different}\]

  • Method: one-way analysis of variance (ANOVA), more generally, linear regression

Example 7

Code
ggplot(data= alzheimer_data,
       mapping= aes(x= diagnosis, 
                    y=lhippo,
                    fill=diagnosis)) +
  geom_boxplot()+
  labs(
    title="Boxplot of lhippo by Diagnosis",
    x="Diagnosis",
    y="LHippo volume (cm^3)") +  
  theme_minimal()

Code
summary(aov(alzheimer_data$lhippo ~ alzheimer_data$diagnosis))
                           Df Sum Sq Mean Sq F value Pr(>F)    
alzheimer_data$diagnosis    2  100.5   50.27   247.1 <2e-16 ***
Residuals                2697  548.6    0.20                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
#equivalently
summary(aov(lhippo ~ diagnosis, data=alzheimer_data))
              Df Sum Sq Mean Sq F value Pr(>F)    
diagnosis      2  100.5   50.27   247.1 <2e-16 ***
Residuals   2697  548.6    0.20                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summary

  • Ex 1. Is the hippocampal volume of healthy males greater than 6 cm\(^3\)? (one-sample t-test)

  • Ex 2. Is the proportion of healthy adults with a right hippocampal volume > 3cm\(^3\) equal to 50%? (one-sample proportion test or z-test)

  • Ex 3. Do healthy men and women have the same mean left hippocampal volume? (two-sample t-test)

  • Ex 4. Is the left hippocampal volume equal to the right hippocampal volume in humans? (paired t-test)

  • Ex 5. Is having a large hippocampus (> 7 cm\(^3\)) associated with gender? (chi-squared test)

  • Ex 6. Is hippocampal volume correlated with age in healthy adults? (Pearson’s correlation or linear regression)

  • Ex 7. Does left hippocampal volume differ across diagnostic groups? (one-way ANOVA or linear regression)