All consultations will be in person and zoom. Check Moodle for the links.

Tentative Schedule

Week Slides Tutorial Topic Readings Assessments
00 A: Course information
01 (Jul 24) A: ; B: Overview. Why this course? What is EDA? The Landscape of R Packages for Automated Exploratory Data Analysis Tutorial preparation quizzes due each week.
02 (Jul 31) A: ; B: Learning from history EDA Case Study: Bay area blues
03 (Aug 7) A: ; B: Initial data analysis and model diagnostics: Model dependent exploration and how it differs from EDA The initial examination of data
04 (Aug 14) A: ; B: Using computational tools to determine whether what is seen in the data can be assumed to apply more broadly Wickham et al. (2010) Graphical inference for Infovis
05 (Aug 21) A: ; B: Working with a single variable, making transformations, detecting outliers, using robust statistics Unwin (2015) Graphical Data Analysis Ch 3-4; Wilke (2019) Ch 6 Visualizing Amounts; Ch 7 Visualizing distributions; Assignment 1 (individual) due on Fri Aug 25, 4:30pm
06 (Aug 28) A: ; B: Bivariate dependencies and relationships, transformations to linearise Unwin (2015) Graphical Data Analysis Ch 5; Wilke (2019) Ch 12 Visualising associations
07 (Sep 4) A: ; B: Making comparisons between groups and strata Wilke (2019) Ch 9, 10.2-4, 11.2; Unwin (2015) Graphical Data Analysis Ch 10
08 (Sep 11) A: ; B: Going beyond two variables, exploring high dimensions Unwin (2015) Graphical Data Analysis Ch 6; Cook and Laa (2023) Interactively exploring high-dimensional data and models in R Chapter 1 Assignment 2 (individual) due on Fri Sep 15, 4:30pm
09 (Sep 18) A: ; B: Exploring data having a space and time context Part I Reintroducing tsibble: data tools that melt the clock; brolgar: An R package to BRowse Over Longitudinal Data Graphically and Analytically in R; Listen to Nick talking about longitudinal data; Unwin (2015) Graphical Data Analysis Ch 11
Mid-semester Break (1 week) - no lectures or tutorials
10 (Oct 2) A: ; B: Exploring data having a space and time context Part II Moraga (2019) Spatial data and R packages for mapping; cubble: A Vector Spatio-Temporal Data Structure for Data Analysis; Making maps plot faster Simplify spatial polygons; sf: Simple Features for R Assignment 3 (part 1) due on Fri Oct 6, 4:30pm
11 (Oct 9) A: ; B: Sculpting data using models, checking assumptions, co-dependency and performing diagnostics Cook & Weisberg (1994) An Introduction to Regression Graphics Ch 6; Cleveland (1993) Visualising Data Ch 4; ; How to use a tour to check if your model suffers from multicollinearity
12 (Oct 16) Extending beyond the data, what can and cannot be inferred more generally, given the data collection Assignment 3 (part 2) due on Fri Oct 27, 4:30pm


Learning outcomes

On successful completion of this unit, you should be able to:

  1. learn to use modern data exploration tools with real data to uncover interesting structure and unusual observations

  2. understand how to map out appropriate analyses, and to define what we would expect to see in the data

  3. be able to compute null samples in order to test apparent patterns, and to interpret the results of visual inference

  4. critically assess the strength and adequacy of a data analysis.