.yellow[hypothesis testing] --- ```{r coin} head <- '' tail <- '' ``` # (Frequentist) hypothesis testing framework * Suppose $X$ is the number of heads out of $n$ independent tosses. * Let $p$ be the probability of getting a `r head` for this coin. | | | |-----|-----| |**Hypotheses** | $H_0: p = 0.5$ vs. $H_a: p > 0.5$. Note $p_0=0.5$.

Alternative $H_a$ is saying we believe that the coin is biased to heads.

.monash-orange2[Alternative needs to be decided before seeing data.] | -- |**Assumptions** | Each toss is independent with equal chance of getting a head. | -- |**Test statistic** | $X \sim B(n, p)$. Recall $E(X\mid H_0) = np_0$.

We observe $n, x, \widehat{p}$. Test statistic is $\widehat{p} - p_0$.| -- |**P-value**

.font_small[(or critical value or confidence interval)] | $P(X ~ \geq ~ x\mid H_0)$ | -- |** Conclusion** | Reject null hypothesis when the $p$-value is less than

some significance level $\alpha$. Usually $\alpha = 0.05$.| --- # Testing coin bias .font_small[Part 1/2] * Suppose I have a coin that I'm going to flip `r tail` `r head` -- * If the coin is unbiased, what is the probability it will show heads? -- * Yup, the probability should be 0.5. -- * So how would I test if a coin is biased or unbiased? -- * We'll collect some data. -- * **Experiment 1**: I flipped the coin 10 times and this is the result:

```{r echo=TRUE} sum(dbinom(7:10, 10, 0.5)) ``` ] .item.w-50[ ## Experiment 1 (n=100) - We observed $x=70$, or $\widehat{p} = 0.7$. - Assuming $H_0$ is true, we expect $np=100\times 0.5=50$. - Calculate the $P(X \geq 70)$

```{r echo=TRUE} sum(dbinom(70:100, 100, 0.5)) ``` ] ] --- # Judicial system .grid[ .item[

] ] --

- Evidence by test statistic
- Judgement by p-value, critical value or confidence interval

Reading data plots require calibration ] --- # Visual inference more formally .flex[ .w-60[ 1. State your null and alternate hypotheses. ]] -- 2. Define a

Recall the linear model for cars shown in week 3. ```{r} #| eval: false #| echo: true lm(dist ~ speed, data = cars) ```

* This is a lineup of the residual plot *

``` > decrypt("clZx bKhK oL 3OHohoOL 0B") [1] "True data in position 11" ```

Recall the linear model for diamonds shown in week 3. ```{r} #| eval: false #| echo: true d_fit <- lm(lprice ~ lcarat, data=diamonds) ```

* This is a lineup of the residual plot for the model where both carat and price are log-transformed *

``` > decrypt("clZx bKhK oL 3OHohoOL 0Q") [1] "True data in position 15" ``` --- # Visual inference p-value (or "see"-value) .flex[ .item.w-45[ Suppose $x=8$ out of $n=12$ people chose plot 15 (previous slide). The probability that this happens by random guessing (p-value) is ```{r} #| echo: true 1 - pbinom(8 - 1, 12, 1/20) nullabor::pvisual(8, 12, 20) ``` ] .item.w-10[ .white[space] ] .item.w-45[ .monash-orange2[This is basically impossible to happen by chance.]

Next, how the residuals are different from "good" residuals has to be determined by the follow-up question: how did you decide your chosen plot was different?

Plot 15 has a different variance pattern, it's not the regular up-down pattern seen in the other plots. This suggests that there is some .monash-orange2[heteroskedasticity] in the data that is not captured by the error distribution in the model. ] ] --- class: transition # Why? --- # Residual plot (1/3) .flex[ .item.w-30[

Is there a problem with the model? ] .item.w-70[ ```{r out.width="60%", fig.width=3, fig.height=3} library(visage) vi_lineup <- readRDS("/Users/cookd/students_PhD/Patrick/lineup_residual_diagnostics/data/vi_lineup.rds") i <- 915 l <- vi_lineup[[i]] VI_MODEL$plot(filter(l$data, k==1), remove_grid_line = TRUE, theme = theme_light()) ``` ] ] --- # Residual plot (2/3) .flex[ .item.w-30[

Is there a problem with the model? ] .item.w-70[ ```{r out.width="60%", fig.width=3, fig.height=3} VI_MODEL$plot(filter(l$data, k==5), remove_grid_line = TRUE, theme = theme_light()) ``` ] ] --- # Residual plot (3/3) .flex[ .item.w-30[

Is there a problem with the model? ] .item.w-70[ ```{r out.width="60%", fig.width=3, fig.height=3} VI_MODEL$plot(filter(l$data, k==14), remove_grid_line = TRUE, theme = theme_light()) ``` ] ] --- class: transition # Residual plots need context It's really hard to decide that there is NO PATTERN!

Which is the worst residual plot? ] .item.w-70[ ```{r fig.height=6, fig.width=7, out.width="80%", fig.align="center"} VI_MODEL$plot_lineup(l$data, remove_grid_line = TRUE, theme = theme_light(), remove_axis = TRUE) ``` ] ] --- .flex[ .item.w-30[

All of the previous residual plots shown were NULL plots ] .item.w-70[ ```{r fig.height=6, fig.width=7, out.width="80%", fig.align="center"} VI_MODEL$plot_lineup(l$data, remove_grid_line = TRUE, theme = theme_light(), remove_axis = TRUE) + geom_rect(data=filter(l$data, k %in% c(1, 5, 14)), aes(xmin=min(l$data$.fitted), xmax=max(l$data$.fitted), ymin=min(l$data$.resid), ymax=max(l$data$.resid)), colour="black", alpha=0.5, fill=NA, linewidth=0.2) ``` ] ] --- .flex[ .item.w-30[

The actual residual plot is ] .item.w-70[ ```{r fig.height=6, fig.width=7, out.width="80%", fig.align="center"} VI_MODEL$plot_lineup(l$data, remove_grid_line = TRUE, theme = theme_light(), remove_axis = TRUE) + geom_rect(data=filter(l$data, k==2), aes(xmin=min(l$data$.fitted), xmax=max(l$data$.fitted), ymin=min(l$data$.resid), ymax=max(l$data$.resid)), colour="yellow", alpha=0.5, fill=NA, linewidth=1) ``` ] ] --- class: transition # It's not only for residual plots --- .flex[ .item.w-30[

Which plot is most different? ] .item.w-70[ ```{r fig.height=6, fig.width=7, out.width="80%", fig.align="center"} threept <- subset(lal, type == "3pt" & !is.na(x) & !is.na(y)) threept <- threept[c(".id", "period", "time", "team", "etype", "player", "points", "result", "x", "y")] threept <- transform(threept, x = x + runif(length(x), -0.5, 0.5), y = y + runif(length(y), -0.5, 0.5)) threept <- transform(threept, r = sqrt((x - 25) ^ 2 + y ^ 2), angle = atan2(y, x - 25)) # Focus in on shots in the typical range threept_sub <- threept %>% filter(between(r, 20, 39)) %>% mutate(angle = angle * 180 / pi) %>% select(angle, r) ggplot(lineup(null_lm(r ~ poly(angle, 2)), true=threept_sub, n = 20, pos = 2), aes(x=angle, y=r)) + geom_point(alpha=0.3) + scale_x_continuous("Angle (degrees)", breaks = c(0, 45, 90, 135, 180), limits = c(0, 180)) + facet_wrap(~ .sample, ncol = 5) + theme_bw() + theme(axis.text=element_blank(), axis.title=element_blank()) ``` ] ] --- .flex[ .item.w-30[

Which plot is most different? ] .item.w-70[ ```{r fig.height=6, fig.width=7, out.width="80%", fig.align="center"} library(forecast) l <- lineup(null_ts("rate", auto.arima), aud, pos=10) ggplot(l, aes(x=date, y=rate)) + geom_line() + facet_wrap(~.sample, scales="free_y") + theme(axis.text = element_blank()) + xlab("") + ylab("") ``` ] ] --- .flex[ .item.w-30[

Which plot is most different? ] .item.w-70[ ```{r fig.height=6, fig.width=7, out.width="80%", fig.align="center"} ggplot(lineup(null_permute('mpg'), mtcars), aes(mpg, wt)) + geom_point() + facet_wrap(~ .sample, ncol=5) ``` ] ] --- class: transition # Reading any plot can benefit from the context of null plots --- # Resources and Acknowledgement .font18[ - Buja, Andreas, Dianne Cook, Heike Hofmann, Michael Lawrence, Eun-Kyung Lee, Deborah F. Swayne, and Hadley Wickham. 2009. “Statistical Inference for Exploratory Data Analysis and Model Diagnostics.” Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 367 (1906): 4361–83. - Wickham, Hadley, Dianne Cook, Heike Hofmann, and Andreas Buja. 2010. “Graphical Inference for Infovis.” IEEE Transactions on Visualization and Computer Graphics 16 (6): 973–79. - Hofmann, H., L. Follett, M. Majumder, and D. Cook. 2012. “Graphical Tests for Power Comparison of Competing Designs.” IEEE Transactions on Visualization and Computer Graphics 18 (12): 2441–48. - Majumder, M., Heiki Hofmann, and Dianne Cook. 2013. “Validation of Visual Statistical Inference, Applied to Linear Models.” Journal of the American Statistical Association 108 (503): 942–56. - Data coding using [`tidyverse` suite of R packages](https://www.tidyverse.org) - Slides originally written by Emi Tanaka and constructed with [`xaringan`](https://github.com/yihui/xaringan), [remark.js](https://remarkjs.com), [`knitr`](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com). ] --- ```{r endslide, child="assets/endslide.Rmd"} ```