Confidence Interval and Hypothesis Testing

class: title-slide

<br>
<br>
.right-panel[

# Confidence Interval and Hypothesis Testing
## Dr. Mine Dogucu

]

---

## Schedule for today

- Confidence interval for the population mean
- One-sample t-test
- Test for Proportion
- Two-sample t-test
- Correlation test
- `$\chi^2$` test

---

## Load the Packages and Data

``` r
library(tidyverse)
df <- read.csv("../data/Alzheimer_data.csv")
```

---

## Simulated Data

``` r
set.seed(0)
norm_size <- 100
norm_random <- rnorm(n = norm_size, mean = 10, sd = 2)
summary(norm_random)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.552   8.861   9.934  10.045  11.251  14.883
```

---

## Confidence Interval

``` r
t_test_result <- t.test(x = norm_random)
t_test_result$conf.int
```

```
## [1]  9.695063 10.395611
## attr(,"conf.level")
## [1] 0.95
```

---

## Adjust the confidence level

We can lower our confidence level, which leads to a narrower interval:

``` r
t_test_result <- 
  t.test(x = norm_random, conf.level = 0.9)
t_test_result$conf.int
```

```
## [1]  9.752228 10.338446
## attr(,"conf.level")
## [1] 0.9
```

---
## Adjust the confidence level

To have a higher confidence level, we need a broader interval:

``` r
t_test_result <- 
  t.test(x = norm_random, conf.level = 0.99)
t_test_result$conf.int
```

```
## [1]  9.581697 10.508976
## attr(,"conf.level")
## [1] 0.99
```

---

## Hypothesis Testing

``` r
summary(norm_random)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.552   8.861   9.934  10.045  11.251  14.883
```

`$H_0:\mu=10$` vs. `$H_A:\mu\neq10$`

``` r
t.test(x = norm_random, mu = 10)$p.value
```

```
## [1] 0.7978487
```

---

## Hypothesis Testing (One-sided)

`$H_0:\mu=10$` vs. `$H_A:\mu>10$`

``` r
t.test(x = norm_random, mu = 10, 
       alternative = "greater")$p.value
```

```
## [1] 0.3989244
```

`$H_0:\mu=9$` vs. `$H_A:\mu>9$`

``` r
t.test(x = norm_random, mu = 9, 
       alternative = "greater")$p.value
```

```
## [1] 0.00000002310052
```

---

## Practice 1: Confidence Interval

Try to get a 90 percent confidence interval for the population mean of age among AD subjects.

``` r
summary(df$age)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   21.00   64.00   72.00   70.05   78.00  100.00
```

---

## Practice 1: Confidence Interval

Try to get a 90 percent confidence interval for the population mean of age among AD subjects.

``` r
t_test_age <- t.test(df$age, conf.level = 0.9)
t_test_age$conf.int
```

```
## [1] 69.68402 70.41524
## attr(,"conf.level")
## [1] 0.9
```

---

## Practice 2: One Sample t-test

Test whether the population mean of age is 70 or greater than 70 and get the p-value.

`$H_0:\mu=70$` vs. `$H_A:\mu>70$`

``` r
summary(df$age)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   21.00   64.00   72.00   70.05   78.00  100.00
```

---

## Practice 2: One Sample t-test

Test whether the population mean of age is 70 or greater than 70 and get the p-value.

`$H_0:\mu=70$` vs. `$H_A:\mu>70$`

``` r
t_test_age <- t.test(df$age, mu = 70, alternative = "greater")
round(t_test_age$p.value, 4)
```

```
## [1] 0.4116
```

---

## Confidence Interval for Population Proportion

Simulated data

``` r
set.seed(0)
bern_size <- 100
bern_random <- rbinom(bern_size, 1, 0.3)
head(bern_random, 10)
```

```
##  [1] 1 0 0 0 1 0 1 1 0 0
```

``` r
success_counts <- sum(bern_random)
success_counts
```

```
## [1] 33
```

---

## Confidence Interval for Population Proportion

``` r
prop_test_result <-
  prop.test(x = success_counts, 
            n = bern_size)
prop_test_result$conf.int
```

```
## [1] 0.2411558 0.4320901
## attr(,"conf.level")
## [1] 0.95
```

---

## Hypothesis Testing for Population Proportion

`$H_0:p=0.3$` vs. `$H_A:p\neq0.3$`

``` r
prop_test_result <-
  prop.test(x = success_counts, 
            n = bern_size, 
            p = 0.3)
prop_test_result$p.value
```

```
## [1] 0.5853789
```
---

## Real Data Practice (brain volume: naccicv)

``` r
success_counts_naccicv <- 
  df %>%
  filter(naccicv > 1300 & naccicv < 1600) %>%
  nrow()
bern_size_naccicv <- nrow(df)
success_counts_naccicv
```

```
## [1] 1811
```

``` r
bern_size_naccicv
```

```
## [1] 2700
```

---

# Practice

- Test whether the population proportion is 2/3.

- Get a 95 percent confidence interval for the population proportion.

---

``` r
prop_test_result_naccicv <-
  prop.test(x = success_counts_naccicv, 
            n = bern_size_naccicv, 
            p = 2/3, 
            conf.level = 0.95)
prop_test_result_naccicv$conf.int
```

```
## [1] 0.6525956 0.6883957
## attr(,"conf.level")
## [1] 0.95
```

``` r
prop_test_result_naccicv$p.value
```

```
## [1] 0.6681702
```

---

## Comparing Two Samples

Is blood pressure associated with gender?

---

## Two Sample t-test

We can examine whether the average blood pressure is different between male and female? Note that the boxplots above show the medians not the means.

`$H_0:\mu_M=\mu_F$` vs. `$H_A:\mu_M \neq \mu_F$`

``` r
t.test(bpsys ~ female, data = df)$p.value
```

```
## [1] 0.5100128
```

---

## Two Numerical Variables

Question: Are age and blood pressure correlated?

---

## Correlation Test

`$H_0:$` They are NOT correlated. vs. `$H_A:$` They are correlated.

``` r
round(cor.test(df$age, df$bpsys)$p.value, 4)
```

```
## [1] 0
```

`$H_0:$` They are NOT correlated. vs. `$H_A:$` They are positively correlated.

``` r
round(cor.test(df$age, df$bpsys, 
               alternative = "greater")$p.value, 4)
```

```
## [1] 0
```

---

## Two Categorical Variables

Question: Are gender and disease status associated with each other?

``` r
contingency_table <- table(df$female, df$diagnosis)
contingency_table
```

```
##    
##        0    1    2
##   0  529  327  295
##   1 1005  286  258
```

---

## Pearson's `$\chi^2$` Test of Independence

`$H_0:$` They are independent vs. `$H_A:$` They are NOT independent.

``` r
chisq.test(contingency_table)
```

```
## 
## 	Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 96.346, df = 2, p-value < 0.00000000000000022
```

---

## Summary

- "t.test()" for one/two-sample t-test
- "prop.test()" for proportion
- "cor.test()" for correlation test
- "chisq.test()" for `$\chi^2$` test
- Useful arguments: "mu", "conf.level", "alternative"
- ?t.test, ?prop.test, ?cor.test and ?chisq.test for more information