ETC5521 Tutorial 7

Bivariate dependencies and relationships, transformations to linearise


Prof. Di Cook


September 7, 2023

🎯 Objectives

These are exercises in making scatterplots and variations to examine association between two variables, and to make practice using transformations.

🔧 Preparation

The reading for this week is Wilke (2019) Ch 12 Visualizing associations. - Complete the weekly quiz, before the deadline! - Install the following R-packages if you do not have them already:

install.packages(c("VGAMdata", "Sleuth2", "colorspace", "nullabor", "broom", "patchwork"))
  • Open your RStudio Project for this unit, (the one you created in week 1, eda or ETC5521). Create a .Rmd document for this week’s activities.
  • Note for Tutors: Randomly allocate one of the problems to each student. Give them 30-40 minutes to work on the problem. Then in the last 30 minutes discuss each one, and the findings with entire tutorial.

Exercise 1: Olympics

We have seen from the lecture that the Athletics category has too many different types of athletics in it for it to be a useful group for studying height and weight. There is another variable called Event which contains more specific information.

# Read-in data
data(oly12, package = "VGAMdata")
  1. Tabulate Event for just the Sport category Athletics, and decide which new categories to create.
  1. Create the new categories, in steps, creating a new binary variable for each. The function str_detect is useful for searching for text patterns in a string. It also helps to know about regular expressions to work with strings like this. And there are two sites, which are great for learning: Regex puzzles, Information and testing board
  1. Make several plots to explore the association between height and weight for the different athletic categories, eg scatterplots faceted by sex and event type, with/without free scales, linear models for the different subsets, overlaid on the same plot, 2D density plots faceted by sex and event type, with free scales.
  1. List what you learned about body types across the different athletics types and sexes.
  1. If one were use visual inference to check for a different relationship between height and weight across sports how would you generate null data? Do it, and test your lineup with others in the class.

Exercise 2: Fisherman’s Reach crabs

Mud crabs are delicious to eat! Prof Cook’s father started a crab farm at Fisherman’s Reach, NSW, when he retired. He caught small crabs (with a special license) and nurtured and fed the crabs until they were marketable size. They were then sent to market, like Queen Victoria Market in Melbourne, for people to buy to eat. Mud crabs have a strong and nutty flavour, and a good to eat simply after steaming or boiling.

Early in the farming setup, he collected the measurements of 62 crabs of different sizes, because he wanted to learn when was the best time to send the crab to market. Crabs re-shell from time to time. They grow too big for their shell, and need to discard it. Ideally, the crabs should be sent to market just before they re-shell, because the will be crab will be fuller in the shell, less air, less juice and more crab meat.

Note: In NSW it is legal to sell female mud crabs, as long as they are not carrying eggs. In Queensland, it is illegal to keep and sell female mud crabs. Focusing only on males could be worthwhile.