class: title-slide <br> <br> .right-panel[ # Confidence Interval and Hypothesis Testing ## Dr. Mine Dogucu ] --- ## Schedule for today - Confidence interval for the population mean - One-sample t-test - Test for Proportion - Two-sample t-test - Correlation test - `\(\chi^2\)` test --- ## Load the Packages and Data ``` r library(tidyverse) df <- read.csv("../data/Alzheimer_data.csv") ``` --- ## Simulated Data ``` r set.seed(0) norm_size <- 100 norm_random <- rnorm(n = norm_size, mean = 10, sd = 2) summary(norm_random) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 5.552 8.861 9.934 10.045 11.251 14.883 ``` --- ## Confidence Interval ``` r t_test_result <- t.test(x = norm_random) t_test_result$conf.int ``` ``` ## [1] 9.695063 10.395611 ## attr(,"conf.level") ## [1] 0.95 ``` --- ## Adjust the confidence level We can lower our confidence level, which leads to a narrower interval: ``` r t_test_result <- t.test(x = norm_random, conf.level = 0.9) t_test_result$conf.int ``` ``` ## [1] 9.752228 10.338446 ## attr(,"conf.level") ## [1] 0.9 ``` --- ## Adjust the confidence level To have a higher confidence level, we need a broader interval: ``` r t_test_result <- t.test(x = norm_random, conf.level = 0.99) t_test_result$conf.int ``` ``` ## [1] 9.581697 10.508976 ## attr(,"conf.level") ## [1] 0.99 ``` --- ## Hypothesis Testing ``` r summary(norm_random) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 5.552 8.861 9.934 10.045 11.251 14.883 ``` `\(H_0:\mu=10\)` vs. `\(H_A:\mu\neq10\)` ``` r t.test(x = norm_random, mu = 10)$p.value ``` ``` ## [1] 0.7978487 ``` --- ## Hypothesis Testing (One-sided) `\(H_0:\mu=10\)` vs. `\(H_A:\mu>10\)` ``` r t.test(x = norm_random, mu = 10, alternative = "greater")$p.value ``` ``` ## [1] 0.3989244 ``` `\(H_0:\mu=9\)` vs. `\(H_A:\mu>9\)` ``` r t.test(x = norm_random, mu = 9, alternative = "greater")$p.value ``` ``` ## [1] 0.00000002310052 ``` --- ## Practice 1: Confidence Interval Try to get a 90 percent confidence interval for the population mean of age among AD subjects. ``` r summary(df$age) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 21.00 64.00 72.00 70.05 78.00 100.00 ``` --- ## Practice 1: Confidence Interval Try to get a 90 percent confidence interval for the population mean of age among AD subjects. ``` r t_test_age <- t.test(df$age, conf.level = 0.9) t_test_age$conf.int ``` ``` ## [1] 69.68402 70.41524 ## attr(,"conf.level") ## [1] 0.9 ``` --- ## Practice 2: One Sample t-test Test whether the population mean of age is 70 or greater than 70 and get the p-value. `\(H_0:\mu=70\)` vs. `\(H_A:\mu>70\)` ``` r summary(df$age) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 21.00 64.00 72.00 70.05 78.00 100.00 ``` --- ## Practice 2: One Sample t-test Test whether the population mean of age is 70 or greater than 70 and get the p-value. `\(H_0:\mu=70\)` vs. `\(H_A:\mu>70\)` ``` r t_test_age <- t.test(df$age, mu = 70, alternative = "greater") round(t_test_age$p.value, 4) ``` ``` ## [1] 0.4116 ``` --- ## Confidence Interval for Population Proportion Simulated data ``` r set.seed(0) bern_size <- 100 bern_random <- rbinom(bern_size, 1, 0.3) head(bern_random, 10) ``` ``` ## [1] 1 0 0 0 1 0 1 1 0 0 ``` ``` r success_counts <- sum(bern_random) success_counts ``` ``` ## [1] 33 ``` --- ## Confidence Interval for Population Proportion ``` r prop_test_result <- prop.test(x = success_counts, n = bern_size) prop_test_result$conf.int ``` ``` ## [1] 0.2411558 0.4320901 ## attr(,"conf.level") ## [1] 0.95 ``` --- ## Hypothesis Testing for Population Proportion `\(H_0:p=0.3\)` vs. `\(H_A:p\neq0.3\)` ``` r prop_test_result <- prop.test(x = success_counts, n = bern_size, p = 0.3) prop_test_result$p.value ``` ``` ## [1] 0.5853789 ``` --- ## Real Data Practice (brain volume: naccicv) ``` r success_counts_naccicv <- df %>% filter(naccicv > 1300 & naccicv < 1600) %>% nrow() bern_size_naccicv <- nrow(df) success_counts_naccicv ``` ``` ## [1] 1811 ``` ``` r bern_size_naccicv ``` ``` ## [1] 2700 ``` --- # Practice - Test whether the population proportion is 2/3. - Get a 95 percent confidence interval for the population proportion. --- ``` r prop_test_result_naccicv <- prop.test(x = success_counts_naccicv, n = bern_size_naccicv, p = 2/3, conf.level = 0.95) prop_test_result_naccicv$conf.int ``` ``` ## [1] 0.6525956 0.6883957 ## attr(,"conf.level") ## [1] 0.95 ``` ``` r prop_test_result_naccicv$p.value ``` ``` ## [1] 0.6681702 ``` --- ## Comparing Two Samples Is blood pressure associated with gender? <img src="Lab-05a-confint_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- ## Two Sample t-test We can examine whether the average blood pressure is different between male and female? Note that the boxplots above show the medians not the means. `\(H_0:\mu_M=\mu_F\)` vs. `\(H_A:\mu_M \neq \mu_F\)` ``` r t.test(bpsys ~ female, data = df)$p.value ``` ``` ## [1] 0.5100128 ``` --- ## Two Numerical Variables Question: Are age and blood pressure correlated? <img src="Lab-05a-confint_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- ## Correlation Test `\(H_0:\)` They are NOT correlated. vs. `\(H_A:\)` They are correlated. ``` r round(cor.test(df$age, df$bpsys)$p.value, 4) ``` ``` ## [1] 0 ``` `\(H_0:\)` They are NOT correlated. vs. `\(H_A:\)` They are positively correlated. ``` r round(cor.test(df$age, df$bpsys, alternative = "greater")$p.value, 4) ``` ``` ## [1] 0 ``` --- ## Two Categorical Variables Question: Are gender and disease status associated with each other? ``` r contingency_table <- table(df$female, df$diagnosis) contingency_table ``` ``` ## ## 0 1 2 ## 0 529 327 295 ## 1 1005 286 258 ``` --- ## Pearson's `\(\chi^2\)` Test of Independence `\(H_0:\)` They are independent vs. `\(H_A:\)` They are NOT independent. ``` r chisq.test(contingency_table) ``` ``` ## ## Pearson's Chi-squared test ## ## data: contingency_table ## X-squared = 96.346, df = 2, p-value < 0.00000000000000022 ``` --- ## Summary - "t.test()" for one/two-sample t-test - "prop.test()" for proportion - "cor.test()" for correlation test - "chisq.test()" for `\(\chi^2\)` test - Useful arguments: "mu", "conf.level", "alternative" - ?t.test, ?prop.test, ?cor.test and ?chisq.test for more information