class: middle center hide-slide-number monash-bg-gray80 .info-box.w-50.bg-white[ These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See <a href=lecture-09B.pdf>here for the PDF <i class="fas fa-file-pdf"></i></a>. ] <br> .white[Press the **right arrow** to progress to the next slide!] --- class: title-slide count: false background-image: url("images/bg-12.png") # .monash-blue[ETC5521: Exploratory Data Analysis] <h1 class="monash-blue" style="font-size: 30pt!important;"></h1> <br> <h2 style="font-weight:900!important;">Exploring data having a space and time context</h2> .bottom_abs.width100[ Lecturer: *Di Cook* <i class="fas fa-envelope"></i> ETC5521.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 9 - Session 2 <br> ] <style type="text/css"> .gray80 { color: #505050!important; font-weight: 300; } .bg-gray80 { background-color: #DCDCDC!important; } </style> --- # Outline - temporal missing values: time series models require that there is a value for each time step - longitudinal data: it is different from time series. Typically measurements are taken irregularly in time. --- class: transition ## Working with missings --- # Checking counting and filling missings in time .flex[ .w-45[ .s500.f5[ ```r set.seed(328) harvest <- tsibble( year = c(2010, 2011, 2013, 2011, 2012, 2013), fruit = rep(c("kiwi", "cherry"), each = 3), kilo = sample(1:10, size = 6), key = fruit, index = year ) harvest ``` ``` ## # A tsibble: 6 x 3 [1Y] ## # Key: fruit [2] ## year fruit kilo ## <dbl> <chr> <int> ## 1 2011 cherry 2 ## 2 2012 cherry 7 ## 3 2013 cherry 1 ## 4 2010 kiwi 6 ## 5 2011 kiwi 5 ## 6 2013 kiwi 8 ``` ]] .w-10[ .white[...] ] .w-45[ ```r *has_gaps(harvest, .full = TRUE) ``` ``` ## # A tibble: 2 × 2 ## fruit .gaps ## <chr> <lgl> ## 1 cherry TRUE ## 2 kiwi TRUE ``` <br> Both levels of the key have missings. <br><br> .monash-orange2[Can you see the gaps in time?] ]] --- # Checking counting and filling missings in time .flex[ .w-45[ .s500.f5[ ```r set.seed(328) harvest <- tsibble( year = c(2010, 2011, 2013, 2011, 2012, 2013), fruit = rep(c("kiwi", "cherry"), each = 3), kilo = sample(1:10, size = 6), key = fruit, index = year ) harvest ``` ``` ## # A tsibble: 6 x 3 [1Y] ## # Key: fruit [2] ## year fruit kilo ## <dbl> <chr> <int> ## 1 2011 cherry 2 ## 2 2012 cherry 7 ## 3 2013 cherry 1 ## 4 2010 kiwi 6 ## 5 2011 kiwi 5 ## 6 2013 kiwi 8 ``` ] ] .w-10[ .white[...] ] .w-45[ ```r *count_gaps(harvest, .full=TRUE) ``` ``` ## # A tibble: 2 × 4 ## fruit .from .to .n ## <chr> <dbl> <dbl> <int> ## 1 cherry 2010 2010 1 ## 2 kiwi 2012 2012 1 ``` <br> One missing in each level, although it is a different year. <br> <br> Notice how `tsibble` handles this summary so neatly. ]] --- # Checking counting and filling missings in time .flex[ .w-45[ .s500.f5[ ```r set.seed(328) harvest <- tsibble( year = c(2010, 2011, 2013, 2011, 2012, 2013), fruit = rep(c("kiwi", "cherry"), each = 3), kilo = sample(1:10, size = 6), key = fruit, index = year ) harvest ``` ``` ## # A tsibble: 6 x 3 [1Y] ## # Key: fruit [2] ## year fruit kilo ## <dbl> <chr> <int> ## 1 2011 cherry 2 ## 2 2012 cherry 7 ## 3 2013 cherry 1 ## 4 2010 kiwi 6 ## 5 2011 kiwi 5 ## 6 2013 kiwi 8 ``` ] ] .w-10[ .white[...] ] .w-45[ Make the implicit missing values .monash-orange2[explicit]. <br> .s400.f5[ ```r *harvest <- fill_gaps(harvest, * .full=TRUE) harvest ``` ``` ## # A tsibble: 8 x 3 [1Y] ## # Key: fruit [2] ## year fruit kilo ## <dbl> <chr> <int> *## 1 2010 cherry NA ## 2 2011 cherry 2 ## 3 2012 cherry 7 ## 4 2013 cherry 1 ## 5 2010 kiwi 6 ## 6 2011 kiwi 5 *## 7 2012 kiwi NA ## 8 2013 kiwi 8 ``` ] ]] --- # Checking counting and filling missings in time .flex[ .w-45[ .s500[ ```r set.seed(328) harvest <- tsibble( year = c(2010, 2011, 2013, 2011, 2012, 2013), fruit = rep(c("kiwi", "cherry"), each = 3), kilo = sample(1:10, size = 6), key = fruit, index = year ) harvest ``` ] ] .w-10[ .white[...] ] .w-45[ .s500.f5[ ```r harvest_nomiss <- harvest %>% * group_by(fruit) %>% * mutate(kilo = * na_interpolation(kilo)) %>% ungroup() harvest_nomiss ``` ``` ## # A tsibble: 8 x 3 [1Y] ## # Key: fruit [2] ## year fruit kilo ## <dbl> <chr> <dbl> *## 1 2010 cherry 2 ## 2 2011 cherry 2 ## 3 2012 cherry 7 ## 4 2013 cherry 1 ## 5 2010 kiwi 6 ## 6 2011 kiwi 5 *## 7 2012 kiwi 6.5 ## 8 2013 kiwi 8 ``` ] ]] --- # .orange[Case study] .bg-orange.circle[3] Melbourne pedestrian traffic .font_small[Part 1/5] .flex[ .w-45[ .s500.f5[ ```r data(pedestrian) # in tsibble has_gaps(pedestrian, .full = TRUE) ``` ``` ## # A tibble: 4 × 2 ## Sensor .gaps ## <chr> <lgl> ## 1 Birrarung Marr TRUE ## 2 Bourke Street Mall (North) TRUE ## 3 QV Market-Elizabeth St (West) TRUE ## 4 Southern Cross Station TRUE ``` ```r ped_gaps <- pedestrian %>% count_gaps(.full = TRUE) ``` ```r ggplot(ped_gaps, aes(x = Sensor, colour = Sensor)) + geom_linerange( aes(ymin = .from, ymax = .to), size=2) + geom_point(aes(y = .from), size=4) + geom_point(aes(y = .to), size=4) + xlab("") + coord_flip() + scale_y_datetime("", date_breaks = "4 months", date_labels = "%y-%m", date_minor_breaks = "1 month") + scale_color_brewer("", palette="Dark2") + theme(legend.position = "none") ``` ] ] .w-5[ .white[...] ] .w-50[ <img src="images/lecture-09B/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> Every sensor has a missing value each April. .monash-orange2[What happens in April each year]? ]] --- # .orange[Case study] .bg-orange.circle[3] Melbourne pedestrian traffic .font_small[Part 2/5] .panelset[ .panel[.panel-name[🖼️] Missings at the end of the year at QV market. <img src="images/lecture-09B/ped_missing3-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[R] ```r ped_full %>% filter(month(Date_Time) == 12, year(Date_Time) == 2015, mday(Date_Time) > 24) %>% ggplot(aes(x=Date_Time, y=Count, colour=Sensor, group=Sensor)) + geom_line() + geom_point() + xlab("") + facet_wrap(~Sensor, ncol=2, scales="free_y") + scale_color_brewer("", palette="Dark2") + theme(legend.position = "none") ``` ] ] --- # .orange[Case study] .bg-orange.circle[3] Melbourne pedestrian traffic .font_small[Part 3/5] .panelset[ .panel[.panel-name[🖼️] <img src="images/lecture-09B/ped_missing4-1.png" width="80%" style="display: block; margin: auto;" /> Imputed with seasonal component. ] .panel[.panel-name[R] .s500.f5[ ```r ped_nomiss <- ped_full %>% group_by(Sensor) %>% mutate(Count_int = na_interpolation(Count), Count_mean = na_mean(Count), Count_rand = na_random(Count), Count_ma = na_ma(Count, k=24), Count_loc = na_locf(Count), Count_seas = na_seadec(Count, find_frequency = TRUE)) ped_nomiss %>% filter(month(Date_Time) == 12, year(Date_Time) == 2015, mday(Date_Time) > 24) %>% mutate(is_miss = ifelse(is.na(Count), "missing", "not")) %>% ggplot(aes(x=Date_Time, y=Count_seas, group=Sensor, colour = is_miss)) + geom_line() + geom_point() + facet_wrap(~Sensor, ncol=2, scales="free_y") + scale_color_brewer("", palette="Paired", direction = -1) + ylab("Count") + xlab("") + theme(legend.position = c(0.4, 1.0)) ``` ] ] ] --- # .orange[Case study] .bg-orange.circle[3] Melbourne pedestrian traffic .font_small[Part 4/5] .panelset[ .panel[.panel-name[🖼️] <img src="images/lecture-09B/ped_missing5-1.png" width="80%" style="display: block; margin: auto;" /> Missings in November at Birrarung Marr. ] .panel[.panel-name[R] ```r ped_full %>% filter(month(Date_Time) == 11, year(Date_Time) == 2015, mday(Date_Time) < 8) %>% ggplot(aes(x=Date_Time, y=Count, colour=Sensor, group=Sensor)) + geom_line() + geom_point() + xlab("") + facet_wrap(~Sensor, ncol=2, scales="free_y") + scale_color_brewer("", palette="Dark2") + theme(legend.position = "none") ``` ] ] --- # .orange[Case study] .bg-orange.circle[3] Melbourne pedestrian traffic .font_small[Part 5/5] .panelset[ .panel[.panel-name[🖼️] <img src="images/lecture-09B/ped_missing6-1.png" width="80%" style="display: block; margin: auto;" /> Imputed with seasonal component. Irregular patterns make imputation difficult. ] .panel[.panel-name[R] ```r ped_nomiss %>% filter(month(Date_Time) == 11, year(Date_Time) == 2015, mday(Date_Time) < 8) %>% mutate(is_miss = ifelse(is.na(Count), "missing", "not")) %>% ggplot(aes(x=Date_Time, y=Count_seas, group=Sensor, colour = is_miss)) + geom_line() + geom_point() + facet_wrap(~Sensor, ncol=2, scales="free_y") + scale_color_brewer("", palette="Paired", direction = -1) + ylab("Count") + xlab("") + theme(legend.position = c(0.4, 1.0)) ``` ] ] --- class: middle <p style="padding-left: 20%"> .info-box.w-60[Imputing temporal data is necessary for modeling and forecasting, which typically require complete data. Incorporate seasonal components, if necessary, and temporal dependence. That means you need to .monash-orange2[understand enough about the data to do imputation well].] </p> --- class: transition ## Longitudinal data Information from the same individuals, recorded at multiple points in time. Usually irregular, and not easy to regularise. Lots more short series. Longitudinal data has the .monash-yellow2[same properties] as time series, but generally .monash-yellow2[different objectives] for the analysis. --- In the `brolgar` package methods build on the `tsibble` data object. ``` *## # A tsibble: 6,402 x 9 [!] *## # Key: id [888] ## id ln_wages xp ged xp_since_ged black hispanic high_grade unemploy_rate ## <int> <dbl> <dbl> <int> <dbl> <int> <int> <int> <dbl> ## 1 31 1.49 0.015 1 0.015 0 1 8 3.21 ## 2 31 1.43 0.715 1 0.715 0 1 8 3.21 ## 3 31 1.47 1.73 1 1.73 0 1 8 3.21 ## 4 31 1.75 2.77 1 2.77 0 1 8 3.3 ## 5 31 1.93 3.93 1 3.93 0 1 8 2.89 ## 6 31 1.71 4.95 1 4.95 0 1 8 2.49 ## 7 31 2.09 5.96 1 5.96 0 1 8 2.6 ## 8 31 2.13 6.98 1 6.98 0 1 8 4.8 ## 9 36 1.98 0.315 1 0.315 0 0 9 4.89 ## 10 36 1.80 0.983 1 0.983 0 0 9 7.4 ## # ℹ 6,392 more rows ``` --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 1/15] .pull-left[ ```r wages %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line(alpha=0.3) ``` Log(wages) of 888 individuals, measured at various times in their employment (workforce experience). ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle center # from a spaghetti mess <img src="images/week9B/spaghetti_mess.gif" width="640" height="480"> .footnote[Source: giphy] --- class: middle center count: false # to controlled spaghetti handling <img src="images/week9B/spaghetti_clean.gif" width="640" height="480"> .footnote[Source: giphy] --- class: middle center count: false # to perfection <img src="images/week9B/spaghetti_perfect.gif" width="640" height="480"> .footnote[Source: giphy] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 2/15] .pull-left[ Using features, compute the number of measurements for each subject ```r wages %>% * features(ln_wages, n_obs) %>% ggplot(aes(x = n_obs)) + geom_bar() + xlab("Number of observations") ``` Different number of observations per person! It ranges from 1-13. .info-box[Too few observations means there is a lack of support to do temporal analysis.] ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 3/15] .pull-left[ You should filter on this, and remove subjects with few observations. <br><br> ```r *wages <- wages %>% add_n_obs() wages %>% * filter(n_obs > 3) %>% select(id, ln_wages, xp, n_obs) ``` ] .pull-right[ ``` ## # A tsibble: 6,145 x 4 [!] *## # Key: id [764] ## id ln_wages xp n_obs ## <int> <dbl> <dbl> <int> ## 1 31 1.49 0.015 8 ## 2 31 1.43 0.715 8 ## 3 31 1.47 1.73 8 ## 4 31 1.75 2.77 8 ## 5 31 1.93 3.93 8 ## 6 31 1.71 4.95 8 ## 7 31 2.09 5.96 8 ## 8 31 2.13 6.98 8 ## 9 36 1.98 0.315 10 ## 10 36 1.80 0.983 10 ## # ℹ 6,135 more rows ``` ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 4/15] .pull-left[ Using features to extract minimum time <br><br> ```r wages %>% * features(xp, list(min = min)) %>% ggplot(aes(x = min)) + geom_histogram(binwidth=0.5) + xlim(c(0, 13)) + xlab("First time in study") ``` <br><br> Subjects start in the study at different employment experience times, ranging from 0 to more than 10 years. ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 5/15] .pull-left[ Using features to extract range of time index <br><br> ```r wages_xp_range <- wages %>% * features(xp, feat_ranges) ggplot(wages_xp_range, aes(x = range_diff)) + geom_histogram() + xlab("Range of experience") ``` There's a range of workforce experience. ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 6/15] .pull-left[ .monash-orange2[Small spoonfuls of spaghetti] Sample some individuals <br><br> ```r wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") ``` Wages conversion 0.5 = $1.65; 4.5 = $90 ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 6/15] .pull-left[ .monash-orange2[Small spoonfuls of spaghetti] Sample some individuals <br><br> ```r wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") ``` Wages conversion 0.5 = $1.65; 4.5 = $90 ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 6/15] .pull-left[ .monash-orange2[Small spoonfuls of spaghetti] Sample some individuals <br><br> ```r wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") ``` Wages conversion 0.5 = $1.65; 4.5 = $90 ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 7/15] .pull-left[ .monash-orange2[Take a spoonful of different lengths] Sample experienced individuals <br><br> ```r wages %>% add_n_obs() %>% * filter(n_obs > 7) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") ``` Wages conversion 0.5 = $1.65; 4.5 = $90 ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 7/15] .pull-left[ .monash-orange2[Take a spoonful of different lengths] Sample experienced individuals <br><br> ```r wages %>% add_n_obs() %>% * filter(n_obs > 7) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") ``` Wages conversion 0.5 = $1.65; 4.5 = $90 ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 7/15] .pull-left[ .monash-orange2[Take a spoonful of different lengths] Sample experienced individuals <br><br> ```r wages %>% add_n_obs() %>% * filter(n_obs > 7) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") ``` Wages conversion 0.5 = $1.65; 4.5 = $90 ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-26-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 8/15] .panelset[ .panel[.panel-name[🖼️] <img src="images/lecture-09B/wages_facet-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[info] <br><br><br> Two sampling tools - `facet_strata`: show the whole pot, neatly separated into equally portioned - `facet_sample`: show most of the pot in neatly separated portions ] .panel[.panel-name[R] ```r wages %>% filter(n_obs > 1) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + * facet_sample(n_per_facet = 3, * n_facets = 20) + xlab("Years of experience") + ylab("Log wages") ``` ] ] --- class: transition ## Special features --- ## Special features Remember scagnostics? Compute longnostics for each subject - Slope, intercept from simple linear model - Variance, standard deviation - Jumps, differences For large collections of time series, take a look at the `feasts` package, which has a long list of time series features (tignostics) to calculate. --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 9/15] .pull-left[ .monash-orange2[increasing] ```r wages_slope <- wages %>% add_n_obs() %>% filter(n_obs > 4) %>% * add_key_slope(ln_wages ~ xp) %>% as_tsibble(key = id, index = xp) wages_slope %>% * filter(.slope_xp > 0.4) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") ``` ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 10/15] .pull-left[ .monash-orange2[decreasing] ```r wages_slope %>% * filter(.slope_xp < (-0.7)) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") ``` ] .pull-right[ <img src="images/lecture-09B/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: transition ## Longitudinal data needs a special five number summary --- # Summarising individuals A different style of five number summary Who is average? Who is different? <br><br> Find those individuals who are .monash-orange2[representative] of the min, median, maximum, etc of a particular feature, e.g. trend, using `keys_near()`. This reports the individual who is closest to a particular statistic. <br><br> `wages_threenum()` returns the three individuals: min, max and closest to the median value. `wages_fivenum()` returns the five individuals: min, max and closest to the median, Q1 and Q3 values. --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 11/15] .pull-left[ ```r wages_threenum <- wages %>% add_n_obs() %>% filter(n_obs > 4) %>% key_slope(ln_wages ~ xp) %>% * keys_near(key = id, * var = .slope_xp, * funs = l_three_num) %>% left_join(wages, by = "id") %>% as_tsibble(key = id, index = xp) ``` .info-box[Minimum/maximum are short series with substantial decline/incline. Median is very flat, no change in real wages.] ] ] .pull-right[ <img src="images/lecture-09B/three_number_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 12/15] .pull-left[ .f5[ ```r wages_fivenum <- wages %>% add_n_obs() %>% filter(n_obs > 6) %>% key_slope(ln_wages ~ xp) %>% keys_near(key = id, var = .slope_xp, * funs = l_five_num) %>% left_join(wages, by = "id") %>% as_tsibble(key = id, index = xp) ``` ] ] .pull-right[ .info-box[Q1 and Q3 are also flat which means that, at least, 50% of the individuals experience no real change in wage.] ] <br><br> <img src="images/lecture-09B/five_number_plot-1.png" width="90%" style="display: block; margin: auto;" /> --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 13/15] .pull-left[ .monash-orange2[Sculpting spaghetti] Mixed effects model, education as fixed effect, subject random effect using slope. <br><br> .f5[ ```r wages_fit_int <- * lmer(ln_wages ~ xp + high_grade + * (xp |id), data = wages) wages_aug <- wages %>% add_predictions(wages_fit_int, var = "pred_int") %>% add_residuals(wages_fit_int, var = "res_int") ``` ] ] .pull-right[ <img src="images/lecture-09B/model_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 13/15] .pull-left[ .monash-orange2[Sculpting spaghetti] Mixed effects model, education as fixed effect, subject random effect using slope. <br><br> .f5[ ```r wages_fit_int <- * lmer(ln_wages ~ xp + high_grade + * (xp |id), data = wages) wages_aug <- wages %>% add_predictions(wages_fit_int, var = "pred_int") %>% add_residuals(wages_fit_int, var = "res_int") ``` ] ] .pull-right[ <img src="images/lecture-09B/res_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 14/15] .panelset[ .panel[.panel-name[🖼️] Model diagnostics: Sample individuals and plot model on the data. .monash-orange2[Notice individual 5630.] <img src="images/lecture-09B/model-data-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[R] ```r set.seed(1) wages_aug %>% add_n_obs() %>% filter(n_obs > 4) %>% sample_n_keys(size = 12) %>% ggplot() + * geom_line(aes(x = xp, y = pred_int, group = id, colour = factor(id))) + * geom_point(aes(x = xp, y = ln_wages, colour = factor(id))) + scale_colour_ochre(palette = "emu_woman_paired") + facet_wrap(~id, ncol=4) + xlab("Years of experience") + ylab("Log wages") + theme(legend.position = "none") ``` ] ] --- # .orange[Case study] .bg-orange.circle[4] Wages .font_small[Part 15/15] <br><br> ## What we learn about wages that we would not have learned without doing EDA - The individual wage experience is extremely varied - Some individuals see a decline in their wages the longer they are in the workforce - Most individuals generally see some (small) increase, on average <br><br> .info-box[Exploratory analysis of individual temporal patterns can be very really interesting!] --- class: middle .info-box[The main difference between .monash-green2[time series] data and .monash-purple2[longitudinal] data, is the former is typically regular, complete, may be only one or a few, and the latter is typically of different lengths, different time measurements and a lot.] <img src="images/lecture-09B/ts-visual-1.png" width="100%" style="display: block; margin: auto;" /> --- # Resources and Acknowledgement - Tidy tools for time series [tidyverts](https://tidyverts.org) - Imputing missings in time using [imputeTS](https://cran.r-project.org/web/packages/imputeTS/vignettes/imputeTS-Time-Series-Missing-Value-Imputation-in-R.pdf) - [Temporal missings](https://tsibble.tidyverts.org/articles/implicit-na.html) in `tsibble` - [Longitudinal data exploration](http://brolgar.njtierney.com/index.html) with `brolgar` - Data coding using [`tidyverse` suite of R packages](https://www.tidyverse.org) - Slides constructed with [`xaringan`](https://github.com/yihui/xaringan), [remark.js](https://remarkjs.com), [`knitr`](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com). --- background-size: cover class: title-slide background-image: url("images/bg-12.png") <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .bottom_abs.width100[ Lecturer: *Di Cook* <i class="fas fa-envelope"></i> ETC5521.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 9 - Session 2 <br> ]