class: middle center hide-slide-number monash-bg-gray80 .info-box.w-50.bg-white[ These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See <a href=lecture-09A.pdf>here for the PDF <i class="fas fa-file-pdf"></i></a>. ] <br> .white[Press the **right arrow** to progress to the next slide!] --- class: title-slide count: false background-image: url("images/bg-12.png") # .monash-blue[ETC5521: Exploratory Data Analysis] <h1 class="monash-blue" style="font-size: 30pt!important;"></h1> <br> <h2 style="font-weight:900!important;">Exploring data having a space and time context</h2> .bottom_abs.width100[ Lecturer: *Di Cook* <i class="fas fa-envelope"></i> ETC5521.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 9 - Session 1 <br> ] <style type="text/css"> .gray80 { color: #505050!important; font-weight: 300; } .bg-gray80 { background-color: #DCDCDC!important; } </style> --- class: informative middle animated slideInLeft .pull-left[ > Time series analysis is what you do after all the interesting stuff has been done! [Heike Hofmann, 2005](https://en.wikipedia.org/wiki/Heike_Hofmann) ] .pull-right[ <img src="images/week9A/heike-headshot.png" style="width: 400px; border-radius: 50%"> ] --- # What is temporal data? 🕰 Melbourne pedestrian sensor data <br> <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:500px; overflow-x: scroll; width:100%; "><table> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> Sensor </th> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> Date_Time </th> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> Date </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Time </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Count </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-02-14 22:00:00 </td> <td style="text-align:left;"> 2015-02-14 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 7081 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-02-21 21:00:00 </td> <td style="text-align:left;"> 2015-02-21 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 8363 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-02-21 22:00:00 </td> <td style="text-align:left;"> 2015-02-21 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 9658 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-02-21 23:00:00 </td> <td style="text-align:left;"> 2015-02-21 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:right;"> 10121 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-02-22 00:00:00 </td> <td style="text-align:left;"> 2015-02-22 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 8441 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-07 20:00:00 </td> <td style="text-align:left;"> 2015-03-07 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 7144 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-07 21:00:00 </td> <td style="text-align:left;"> 2015-03-07 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 7238 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 13:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 7092 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 14:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 7031 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 15:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 6951 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 16:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 7167 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 17:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 7246 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 18:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 7122 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 19:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> 7565 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 20:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 8121 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-08 21:00:00 </td> <td style="text-align:left;"> 2015-03-08 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 7330 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-09 13:00:00 </td> <td style="text-align:left;"> 2015-03-09 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 7413 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-09 14:00:00 </td> <td style="text-align:left;"> 2015-03-09 </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 7665 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-09 15:00:00 </td> <td style="text-align:left;"> 2015-03-09 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 6954 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-29 13:00:00 </td> <td style="text-align:left;"> 2015-03-29 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 8919 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-03-29 22:00:00 </td> <td style="text-align:left;"> 2015-03-29 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 9858 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-04-25 17:00:00 </td> <td style="text-align:left;"> 2015-04-25 </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 7293 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-06-17 22:00:00 </td> <td style="text-align:left;"> 2015-06-17 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 7556 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-07-18 21:00:00 </td> <td style="text-align:left;"> 2015-07-18 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 9318 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-07-24 18:00:00 </td> <td style="text-align:left;"> 2015-07-24 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 7426 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-07-24 22:00:00 </td> <td style="text-align:left;"> 2015-07-24 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 11224 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-07-26 09:00:00 </td> <td style="text-align:left;"> 2015-07-26 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 6949 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-07-26 10:00:00 </td> <td style="text-align:left;"> 2015-07-26 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 8263 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-07-26 11:00:00 </td> <td style="text-align:left;"> 2015-07-26 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 7124 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-09-12 22:00:00 </td> <td style="text-align:left;"> 2015-09-12 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 7757 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-10-03 12:00:00 </td> <td style="text-align:left;"> 2015-10-03 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 6974 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2015-10-03 18:00:00 </td> <td style="text-align:left;"> 2015-10-03 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 7085 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-02-20 21:00:00 </td> <td style="text-align:left;"> 2016-02-20 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 9579 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-02-20 22:00:00 </td> <td style="text-align:left;"> 2016-02-20 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 11121 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-02-20 23:00:00 </td> <td style="text-align:left;"> 2016-02-20 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:right;"> 11273 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-02-21 00:00:00 </td> <td style="text-align:left;"> 2016-02-21 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 9201 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-02-21 01:00:00 </td> <td style="text-align:left;"> 2016-02-21 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 7678 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-03-12 20:00:00 </td> <td style="text-align:left;"> 2016-03-12 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 6957 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-03-12 21:00:00 </td> <td style="text-align:left;"> 2016-03-12 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 7179 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-03-13 13:00:00 </td> <td style="text-align:left;"> 2016-03-13 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 7112 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-03-13 16:00:00 </td> <td style="text-align:left;"> 2016-03-13 </td> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 7049 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-03-13 20:00:00 </td> <td style="text-align:left;"> 2016-03-13 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 8033 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-03-13 21:00:00 </td> <td style="text-align:left;"> 2016-03-13 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 7901 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-03-14 14:00:00 </td> <td style="text-align:left;"> 2016-03-14 </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 7286 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-07-24 10:00:00 </td> <td style="text-align:left;"> 2016-07-24 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 7158 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-09-16 18:00:00 </td> <td style="text-align:left;"> 2016-09-16 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 7993 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-10-01 11:00:00 </td> <td style="text-align:left;"> 2016-10-01 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 7038 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-10-01 12:00:00 </td> <td style="text-align:left;"> 2016-10-01 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 8591 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-10-01 18:00:00 </td> <td style="text-align:left;"> 2016-10-01 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 9716 </td> </tr> <tr> <td style="text-align:left;"> Birrarung Marr </td> <td style="text-align:left;font-style: italic;color: white !important;background-color: #D93F00 !important;"> 2016-10-16 08:00:00 </td> <td style="text-align:left;"> 2016-10-16 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 7218 </td> </tr> </tbody> </table></div> --- # What is temporal data? .flex[ .w-50[ <img src="images/lecture-09A/CO2-1.png" width="100%" style="display: block; margin: auto;" /> ] .w-50[ - Temporal data has date/time/ordering index variable, call it .monash-orange2[time]. - A time variable has special structure: - it can have *cyclical* patterns, eg seasonality (summer, winter), an over in cricket - the cyclical patterns can be *nested*, eg postcode within state, over within innings - Measurements are also .monash-orange2[NOT independent] - yesterday may influence today. - It still likely has .monash-orange2[non-cyclical patterns], trends and associations with other variables, eg temperature increasing over time, over is bowled by Elise Perry or Sophie Molineaux ] ] --- # .orange[Case study] .bg-orange.circle[1] Melbourne pedestrian traffic .panelset[ .panel[.panel-name[🖼️] <br> Pedestrian counts at Southern Cross in Feb 2016 <br> <img src="images/lecture-09A/sc_ts-1.png" width="100%" style="display: block; margin: auto;" /> <br>
This is interesting!
] .panel[.panel-name[learn] <br> <br> <br> - There are similar patterns for 5 days, and then a different pattern. - In the 5 day pattern, there are two big peaks and a smaller peak. - This might be called multi-seasonality because there are two types of cyclical patterns. <br>
This is interesting!
] .panel[.panel-name[R] .s500[ ```r p <- pedestrian %>% filter(Sensor == "Southern Cross Station", year(Date) == 2016, month(Date) == 2) %>% ggplot(aes(x=Date_Time, Count)) + geom_line(size=1.1) + xlab("") + theme_bw() p + annotate("rect", xmin=ymd_hms("2016-02-01 01:00:00"), xmax=ymd_hms("2016-02-01 23:00:00"), ymin=0, ymax=3200, fill="#D93F00", alpha=0.2) + annotate("rect", xmin=ymd_hms("2016-02-02 01:00:00"), xmax=ymd_hms("2016-02-02 23:00:00"), ymin=0, ymax=3200, fill="#D93F00", alpha=0.2) + annotate("rect", xmin=ymd_hms("2016-02-03 01:00:00"), xmax=ymd_hms("2016-02-03 23:00:00"), ymin=0, ymax=3200, fill="#D93F00", alpha=0.2) + annotate("rect", xmin=ymd_hms("2016-02-04 01:00:00"), xmax=ymd_hms("2016-02-04 23:00:00"), ymin=0, ymax=3200, fill="#D93F00", alpha=0.2) + annotate("rect", xmin=ymd_hms("2016-02-05 01:00:00"), xmax=ymd_hms("2016-02-05 23:00:00"), ymin=0, ymax=3200, fill="#D93F00", alpha=0.2) + annotate("rect", xmin=ymd_hms("2016-02-06 01:00:00"), xmax=ymd_hms("2016-02-06 23:00:00"), ymin=0, ymax=3200, fill="#008A25", alpha=0.2) + annotate("rect", xmin=ymd_hms("2016-02-07 01:00:00"), xmax=ymd_hms("2016-02-07 23:00:00"), ymin=0, ymax=3200, fill="#008A25", alpha=0.2) ``` ] ] ] --- # .orange[Case study] .bg-orange.circle[1] Melbourne pedestrian traffic .panelset[ .panel[.panel-name[🖼️] <br> Pedestrian counts at Birrarung Marr in Feb 2016 <br> <img src="images/lecture-09A/bm_ts-1.png" width="100%" style="display: block; margin: auto;" /> <br>
This is interesting!
] .panel[.panel-name[learn] <br> <br> <br> - There are irregular patterns. - There may be some small (almost) regular patterns. <br>
This is interesting!
] .panel[.panel-name[R] .s500[ ```r pedestrian %>% filter(Sensor == "Birrarung Marr", year(Date) == 2016, month(Date) == 2) %>% ggplot(aes(x=Date_Time, Count)) + geom_line(size=1.1) + xlab("") + theme_bw() ``` ] ] ] --- # .orange[Case study] .bg-orange.circle[1] Melbourne pedestrian traffic .panelset[ .panel[.panel-name[🖼️] <br> What does Heike mean? <br> <img src="images/lecture-09A/arima-1.png" width="100%" style="display: block; margin: auto;" /> <br>
This is a little bit boring!
It is important for fitting a model that accounts for dependencies between measurements, though. Exploratory analysis of temporal data is interested in extracting the trend and general patterns. ] .panel[.panel-name[learn] <br> <br> <br> There is no apparent structure in this data. When you read time series analysis, expect to see a focus on modeling this non-structure, usally called a stochastic process. There is some dependence in the measurements from one to another, and modeling this process forms the core of most of what is called time series analysis. <br>
This is a little bit boring!
It is important for fitting a model that accounts for dependencies between measurements, though. Exploratory analysis of temporal data is interested inn extracting the trend and general patterns. ] .panel[.panel-name[R] .s500[ ```r pedestrian %>% filter(Sensor == "Birrarung Marr", year(Date) == 2016, month(Date) == 2) %>% mutate(arima = arima.sim(n=696, list(ar = c(0.8897, -0.4858), ma = c(-0.2279, 0.2488)), sd = sqrt(0.1796))) %>% ggplot(aes(x=Date_Time, arima)) + geom_line(size=1.1) + xlab("") + theme_bw() ``` ] ] ] --- class: informative # What is exploratory analysis of time series? <br> .info-box[Exploratory analysis of time series investigates trends, patterns, cyclical, nested cyclical, temporal outliers, and temporal dependence.] <br> For the pedestrian sensor data this is: - work day vs holiday pattern - daily patterns - weather and season related changes - event related patterns --- background-image: url(https://tsibble.tidyverts.org/reference/figures/logo.png) background-position: 5% 15% # `tsibble`: temporal object in R <br><br><br><br> The tsibble package provides a data infrastructure for tidy temporal data with wrangling tools. Adapting the tidy data principles, tsibble is a data- and model-oriented object. In tsibble: - Index is a variable with inherent ordering from past to present. - Key is a set of variables that define observational units over time. - Each observation should be uniquely identified by index and key. - Each observational unit should be measured at a common interval, if regularly spaced. --- # Regular vs irregular .pull-left[ The .monash-blue2[Melbourne pedestrian sensor] data has a .monash-orange2[regular] period. Counts are provided for every hour, at numerous locations. .s400[ ``` *## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne> ## # Key: Sensor [4] ## Sensor Date_Time Date Time Count ## <chr> <dttm> <date> <int> <int> ## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630 ## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826 ## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567 ## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264 ## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139 ## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77 ## 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44 ## 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56 ## 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113 ## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166 ## # ℹ 66,027 more rows ``` ] ] .pull-right[ <br> In contrast, the .monash-blue2[US flights] data, below, is .monash-orange2[irregular]. <br><br> .s400[ ``` *## # A tsibble: 336,776 x 20 [!] <UTC> ## # Key: origin, dest, carrier, tailnum [52,807] ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest air_time distance hour minute time_hour dt ## <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dttm> <dttm> ## 1 2013 1 30 2224 2000 144 2316 2101 135 EV 4309 N10575 EWR ALB 31 143 20 0 2013-01-30 20:00:00 2013-01-30 20:00:00 ## 2 2013 2 17 2012 2010 2 2120 2114 6 EV 4162 N10575 EWR ALB 33 143 20 10 2013-02-17 20:00:00 2013-02-17 20:10:00 ## 3 2013 2 26 2356 2000 236 41 2104 217 EV 4162 N10575 EWR ALB 24 143 20 0 2013-02-26 20:00:00 2013-02-26 20:00:00 ## 4 2013 3 13 1958 2005 -7 2056 2109 -13 EV 4566 N10575 EWR ALB 32 143 20 5 2013-03-13 20:00:00 2013-03-13 20:05:00 ## 5 2013 5 16 2214 2000 134 2307 2112 115 EV 4117 N10575 EWR ALB 30 143 20 0 2013-05-16 20:00:00 2013-05-16 20:00:00 ## 6 2013 5 30 2045 2000 45 2141 2112 29 EV 4117 N10575 EWR ALB 29 143 20 0 2013-05-30 20:00:00 2013-05-30 20:00:00 ## 7 2013 9 11 2254 2159 55 2336 2303 33 EV 6043 N10575 EWR ALB 27 143 21 59 2013-09-11 21:00:00 2013-09-11 21:59:00 ## 8 2013 9 12 NA 2159 NA NA 2303 NA EV 6043 N10575 EWR ALB NA 143 21 59 2013-09-12 21:00:00 2013-09-12 21:59:00 ## 9 2013 9 8 2156 2159 -3 2250 2303 -13 EV 4264 N11113 EWR ALB 29 143 21 59 2013-09-08 21:00:00 2013-09-08 21:59:00 ## 10 2013 1 26 1614 1620 -6 1706 1724 -18 EV 4271 N11119 EWR ALB 34 143 16 20 2013-01-26 16:00:00 2013-01-26 16:20:00 ## # ℹ 336,766 more rows ``` ] ] --- class: motivator middle .panelset[ .panel[.panel-name[question] <br> <br> ## Is pedestrian traffic regular, really? ] .panel[.panel-name[discussion] <br> <br> *No*, its event data: one pedestrian, arbitrary time. Its aggregated into a regular time period. Often, the first step to analysing temporal data is to .monash-pink2[aggregate by temporal unit], possibly by multiple quantities, eg number of arrivals, departures, average hourly arrival delay and departure delays. ] ] --- class: transition middle animated slideInLeft ## Let's make some plots --- # Plotting temporal data - .monash-orange2[lines]: connecting sequential time points indicates the temporal dependence is important - .monash-orange2[aspect ratio]: wide or tall? [Cleveland, McGill, McGill (1988) ](https://eagereyes.org/basics/banking-45-degrees) argue the average line slope in a line chart should be 45 degrees, which is called banking to 45 degrees. But this is refuted in Talbot, Gerth, Hanrahan (2012) that the conclusion was based on a flawed study. Nevertheless, aspect ratio is an inescapable skill for designing effective plots. For time series, typically a wide aspect ratio is good. - .monash-orange2[conventions]: - time on the horizontal axis, - ordering of elements like week day, month. --- # Aspect ratio .panelset[ .panel[.panel-name[🖼️] <img src="images/lecture-09A/CO2_ratio-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[learn] <br> <br> - Is the trend linear or non-linear? - Yes, slightly non-linear. We could fit a linear regression model, and examine the residuals to better assess non-linear trend. - Is there a cyclical pattern? - Yes, there is a yearly trend. <br> <br> <br> *This type of data is easy to model, and forecast.* ] .panel[.panel-name[R] ```r load(here::here("data/CO2_ptb.rda")) CO2.ptb <- CO2.ptb %>% filter(year > 1980) %>% filter(co2_ppm > 100) # handle missing values p <- ggplot(CO2.ptb, aes(x=date, y=co2_ppm)) + geom_line(size=1) + xlab("") + ylab("CO2 (ppm)") p1 <- p + theme(aspect.ratio = 1) + ggtitle("1 to 1 (may be useless)") p3 <- p + theme(aspect.ratio = 2) + ggtitle("tall & skinny: trend") p2 <- ggplot(CO2.ptb, aes(x=date, y=co2_ppm)) + annotate("text", x=2000, y=375, label="CO2 at \n Point Barrow,\n Alaska", size=8) + theme_solid() p4 <- p + scale_x_continuous("", breaks = seq(1980, 2020, 5)) + theme(aspect.ratio = 0.2) + ggtitle("short & wide: seasonality") grid.arrange(p1, p2, p3, p4, layout_matrix= matrix(c(1,2,3,4,4,4), nrow=2, byrow=T)) ``` ] ] --- # .orange[Case study] .bg-orange.circle[2] nycflights13 .font_small[Part 1/7] .flex[ .w-50[ ```r library(nycflights13) ``` What is a useful time element to use, in order to study traffic over time? .monash-orange2[Hour, 15 minutes, day, month?] <br> <br> Possibly, all of these. <br> <br> Let's start with .monash-orange2[hourly]. ] .w-50[ .s500[ ```r flights_hourly <- flights %>% * group_by(time_hour, origin) %>% * summarise(count = n(), * dep_delay = mean(dep_delay, * na.rm = TRUE)) %>% ungroup() %>% as_tsibble(index = time_hour, key = origin) flights_hourly ``` ``` *## # A tsibble: 19,486 x 4 [1h] <America/New_York> ## # Key: origin [3] ## time_hour origin count dep_delay ## <dttm> <chr> <int> <dbl> ## 1 2013-01-01 05:00:00 EWR 2 -1 ## 2 2013-01-01 06:00:00 EWR 18 3.06 ## 3 2013-01-01 07:00:00 EWR 12 14.2 ## 4 2013-01-01 08:00:00 EWR 20 0.75 ## 5 2013-01-01 09:00:00 EWR 19 9.05 ## 6 2013-01-01 10:00:00 EWR 18 2.06 ## 7 2013-01-01 11:00:00 EWR 11 0 ## 8 2013-01-01 12:00:00 EWR 22 6.73 ## 9 2013-01-01 13:00:00 EWR 28 28.7 ## 10 2013-01-01 14:00:00 EWR 18 33.7 ## # ℹ 19,476 more rows ``` ] ] ] --- # .orange[Case study] .bg-orange.circle[2] nycflights13 .font_small[Part 2/7] IDA: Pick one airport, and examine the hourly number of flights. ```r flights_hourly %>% filter(origin == "JFK") %>% ggplot(aes(x=time_hour, y=count)) + geom_line() + xlab("") + ylab("number of flights") ``` <img src="images/lecture-09A/flights_time-1.png" width="100%" style="display: block; margin: auto;" /> No, that's too much information, too much time. There's no overall trend. Not an interesting plot. --- # .orange[Case study] .bg-orange.circle[2] nycflights13 .font_small[Part 3/7] IDA: Reduce the time frame to check for periodicities .s300[ ```r flights_hourly %>% filter(origin == "JFK", * time_hour < ymd("2013-01-08")) %>% ggplot(aes(x=time_hour, y=count)) + geom_line(size=1.1) + scale_x_datetime("", date_breaks = "1 day", date_labels = "%y-%m-%d %H", date_minor_breaks = "6 hours") + ylim(c(0, 32)) + xlab("") + ylab("number of flights") ``` ] <img src="images/lecture-09A/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> --- # .orange[Case study] .bg-orange.circle[2] nycflights13 .font_small[Part 4/7] .panelset[ .panel[.panel-name[🖼️] .s500[ <img src="images/lecture-09A/calendar-1.png" width="80%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[learn] **Calendar plot** - Draw the daily data in the layout of a regular calendar - A wonderful way to get a lot of data into a page - Easy to examine daily patterns, weekly, monthly patterns <br> <br> **Overview summary** - The daily pattern at JFK is **very** regular. - It is similar for every day of the week, and for every month - There is a peak in early flights, a drop around lunchtime and then the number of flights pick up again. - .monash-orange2[Is it too regular?] ] .panel[.panel-name[R] ```r calendar_df <- flights_hourly %>% filter(origin == "JFK") %>% mutate(hour = hour(time_hour), date = as.Date(time_hour)) %>% filter(year(date) < 2014) %>% * frame_calendar(x=hour, y=count, date=date, nrow=3) p1 <- calendar_df %>% ggplot(aes(x = .hour, y = .count, group = date)) + geom_line() + theme(axis.line.x = element_blank(), axis.line.y = element_blank()) prettify(p1, size = 3, label.padding = unit(0.15, "lines")) ``` ] ] --- # .orange[Case study] .bg-orange.circle[2] nycflights13 .font_small[Part 5/7] .flex[ .w-50[ ```r flights_hourly %>% filter(origin == "JFK") %>% mutate(month = month(time_hour), * hour = hour(time_hour), * date = as.Date(time_hour)) %>% ggplot(aes(x=hour, y=count)) + geom_line(aes(group=date), alpha = 0.1) + geom_smooth(se = FALSE) + xlab("hour") + ylab("number of flights") ``` <br> <br> This data has a very regular. The volume per hour is very similar from one day to the next. .monash-orange2[Why is it so regular?] ] .w-50[ <img src="images/lecture-09A/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] ] --- class: transition middle animated slideInLeft ## Examine departure delays --- # .orange[Case study] .bg-orange.circle[2] nycflights13 .font_small[Part 6/7] .panelset[ .panel[.panel-name[🖼️] .s500[ <img src="images/lecture-09A/calendar_delay-1.png" width="100%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[learn] ## Delays are much more interesting to examine - Most days have few delays - Jun and July seem to have more delays - A few days, sporadically in the year, have big delays .monash-orange2[Can you find a reason for one of the days with a big delay?] From ChatGPT: *As of my last update in September 2021, a significant late-season snowstorm did affect parts of the United States in April 2013, but it was more focused on the Midwest rather than the Northeast where JFK Airport (John F. Kennedy International Airport) is located. The storm impacted states like Minnesota, Wisconsin, and South Dakota, among others, and brought heavy snowfall and icy conditions.* *However, weather conditions can have a cascading effect on flight schedules nationwide, so it's possible that there were some delays at JFK related to this or other weather phenomena.* ] .panel[.panel-name[R] ```r calendar_df <- flights_hourly %>% filter(origin == "JFK") %>% mutate(hour = hour(time_hour), date = as.Date(time_hour)) %>% filter(year(date) < 2014) %>% * frame_calendar(x=hour, y=dep_delay, date=date, nrow=3) p2 <- calendar_df %>% ggplot(aes(x = .hour, y = .dep_delay, group = date)) + geom_line() + theme(axis.line.x = element_blank(), axis.line.y = element_blank()) prettify(p2, size = 3, label.padding = unit(0.15, "lines")) ``` ] ] --- # .orange[Case study] .bg-orange.circle[2] nycflights13 .font_small[Part 7/7] .flex[ .w-60[ Days in comparison to each other. .f5[ ```r flights_hourly %>% filter(origin == "JFK") %>% mutate(month = month(time_hour), hour = hour(time_hour), date = as.Date(time_hour)) %>% ggplot(aes(x=hour, y=dep_delay)) + geom_hline(yintercept=0, colour="#027EB6", size=2) + geom_line(aes(group=date), alpha = 0.1) + geom_smooth(se=FALSE, colour="#D93F00") + xlab("hour") + ylab("Departure delay (mins)") ``` ] - A lot of day to day variability - modeling and forecasting delays will need other information like weather. - Delays worsen, .monash-orange2[on average], later in the day. - Interestingly, a lot of flights depart a few minutes early, especially later in the day. ] .w-40[ <img src="images/lecture-09A/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> ] ] --- # Summary: Melting time .f5[ ``` ## [1] "year" "month" "day" "dep_time" ## [5] "sched_dep_time" "dep_delay" "arr_time" "sched_arr_time" ## [9] "arr_delay" "carrier" "flight" "tailnum" ## [13] "origin" "dest" "air_time" "distance" ## [17] "hour" "minute" "time_hour" ``` ] - The structure of the `flights` table is very handy. Date-time has already been melted into: `year`, `month`, `day`, `hour`, `minute`. - There are also several possible key variables: `origin`, `carrier`, `tailnum`. <br> .monash-orange2[Why isn't `dest` considered a key variable? Why not have `air_time` as a key variable?] - Aggregate by temporal components, in different ways to explore different patterns of variables in relation to elements of time. --- class: transition middle ## Interactive exploration with tsibbletalk --- .panelset[ .panel[.panel-name[🖼️] .pull-left[ <img src="images/lecture-09A/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ Remember scagnostics? These are examples of .monash-orange2[tignostics], time series diagnostics. <img src="images/lecture-09A/unnamed-chunk-16-1.png" width="80%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[️R] .f5[ ```r library(tsibble) # remotes::install_github("earowang/tsibbletalk") library(tsibbletalk) tourism_shared <- tourism %>% as_shared_tsibble(spec = (State / Region) * Purpose) p0 <- plotly_key_tree(tourism_shared, height = 700, width = 450) library(feasts) tourism_feat <- tourism_shared %>% features(Trips, feat_stl) p1 <- tourism_shared %>% ggplot(aes(x = Quarter, y = Trips)) + geom_line(aes(group = Region), alpha = 0.5) + facet_wrap(~ Purpose, scales = "free_y") p2 <- tourism_feat %>% ggplot(aes(x = trend_strength, y = seasonal_strength_year)) + geom_point(aes(group = Region)) ``` ] ] ] --- .pull-left[
] .pull-right[ <br><br><br> .f5[ ```r library(plotly) subplot(p0, subplot( ggplotly(p1, tooltip = "Region", width = 700), ggplotly(p2, tooltip = "Region", width = 600), nrows = 2), widths = c(.4, .6)) %>% highlight(dynamic = FALSE) ``` ] ] --- class: transition middle # Live demos Interactive wrapping to explore periodicities --- <center>
<i class="fas fa-wrench faa-wrench animated-hover faa-slow " style=" color:#D93F00;"></i>
Your turn, .monash-blue[cut and paste the code] into your R console. Drag the scroll bar to wrap the series on itself. </center> ```r p <- fill_gaps(pedestrian) %>% filter_index(~ "2015") %>% ggplot(aes(x = Date_Time, y = Count, colour = Sensor)) + geom_line(size = .2) + facet_wrap(~ Sensor, scales = "free_y") + theme(legend.position = "none") library(shiny) ui <- fluidPage(tsibbleWrapUI("tswrap")) server <- function(input, output, session) { tsibbleWrapServer("tswrap", p, period = "1 day") } shinyApp(ui, server) ``` --- # A step back in time Some series that look periodic, are not. .monash-orange2[Try to patch the peaks] .flex[ .w-45[ Annual numbers of lynx trappings for 1821–1934 in Canada. Almost 10 year cycle. .s400.f5[ ```r lynx_tsb <- as_tsibble(lynx) %>% rename(count = value) pl <- ggplot(lynx_tsb, aes(x = index, y = count)) + geom_line(size = .2) ui <- fluidPage( tsibbleWrapUI("tswrap")) server <- function(input, output, session) { tsibbleWrapServer("tswrap", pl, period = "10 year") } shinyApp(ui, server) ``` ] ] .w-10[ <center> .white[...] </center> ] .w-45[ Monthly mean relative sunspot numbers from 1749 to 1983. Almost 10 year cycle. .s400.f5[ ```r sunspots_tsb <- as_tsibble(sunspots) %>% rename(count = value) pl <- ggplot(sunspots_tsb, aes(x = index, y = count)) + geom_line(size = .2) ui <- fluidPage( tsibbleWrapUI("tswrap")) server <- function(input, output, session) { tsibbleWrapServer("tswrap", pl, period = "10 year") } shinyApp(ui, server) ``` ] ] ] --- # Resources and Acknowledgement - The temporal data object [tsibble](https://tsibble.tidyverts.org/index.html) - Wang & Cook, [Conversations in Time: Interactive Visualization to Explore Structured Temporal Data](https://journal.r-project.org/dev/articles/RJ-2021-050), The R Journal, 2020 - Data coding using [`tidyverse` suite of R packages](https://www.tidyverse.org) - Slides constructed with [`xaringan`](https://github.com/yihui/xaringan), [remark.js](https://remarkjs.com), [`knitr`](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com). - In Semester 3's ETC5550 expect to learn more about regular time series, which will include some exploration and some modeling --- background-size: cover class: title-slide background-image: url("images/bg-12.png") <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .bottom_abs.width100[ Lecturer: *Di Cook* <i class="fas fa-envelope"></i> ETC5521.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 9 - Session 1 <br> ]