# Schedule for: 19w5198 - Toward a Comprehensive, Integrated Framework for Advanced Statistical Analyses of Observational Studies

Beginning on Sunday, June 2 and ending Friday June 7, 2019

All times in Banff, Alberta time, MDT (UTC-6).

Sunday, June 2 | |
---|---|

16:00 - 17:30 | Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre) |

17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

20:00 - 22:00 | Informal gathering (Corbett Hall Lounge (CH 2110)) |

Monday, June 3 | |
---|---|

07:00 - 08:30 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

08:30 - 08:45 | Introduction by Organizers (M. Abrahamowicz, W. Sauerbrei) (TCPL 201) |

08:45 - 09:00 |
Introduction and Welcome by BIRS Staff ↓ A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions. (TCPL 201) |

09:00 - 09:20 | Presentation of everyone (TCPL 201) |

09:20 - 10:00 |
Willi Sauerbrei: Recent developments in the STRATOS initiative ↓ The STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative:
http://www.stratos-initiative.org/ (TCPL 201) |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:30 - 11:30 |
Mitchell Gail: Potential collaborations between Design Topic Group (TG5) and other STRATOS Topic Groups ↓ The aim of Topic Group5 is to provide accessible and accurate guidance in the design of observational studies, and our efforts have focused on exposition of established designs. Collaboration with other Topic Groups in STRATOS offers potential opportunities for investigation of new designs or design adaptations needed for emerging areas of research. Some examples follow. 1. Many observational study designs, such as case-cohort and nested case-control designs, can be regarded as subsampling a cohort with missingness by design. There may be opportunities for strengthening such designs by augmenting the samples or by obtaining ancillary data on all cohort members to strengthen the analyses of the subsampled data(1) (TG1, TG8). 2. Dose-response modeling is an important goal of some observational studies. What aspects of study design can improve information for dose-response modeling (TG2)? 3. Validation sub-samples can improve inference from error-prone observational data from electronic data bases(2). Can one advance this approach by judiciously selecting the validation sub-samples (TG4)? 4. The development and validation of risk models for predicting incident disease, for diagnosis and for prognosis require data for model-building and for validation. Often, the data for independent validation do not include complete covariate information on some or all members of the validation sample. Can the design of the validation sample be improved by including supplemental data (TG6)? 5. There has been an explosion of interest in causal analysis and related topics such as Mendelian randomization and mediation analysis. What work on design has been done in these areas, and what design considerations might improve studies in these areas (TG7)? 6. The design and analysis of observational studies with high-dimensional outcomes or exposure data pose special challenges in data quality and preprocessing, control of false positive discoveries, and replicability of results. Is it possible to improve the design of a sequence of such studies for the discovery and replicability of associations with phenotype (TG9)? 7. Study design can promote data quality and completeness, limit biases in estimates of exposure effects from errors in laboratory measurements, improve control for confounding, and limit or help define selection biases. Such issues are of importance to all TGs. These 7 topics are meant to start a conversation that will lead to more and better suggestions for collaboration on design issues by members of the various TGs. |

11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

13:00 - 14:00 |
Guided Tour of The Banff Centre ↓ Meet in the Corbett Hall Lounge for a guided tour of The Banff Centre campus. (Corbett Hall Lounge (CH 2110)) |

14:00 - 14:15 |
Group Photo ↓ Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo! (TCPL 201) |

14:15 - 15:00 |
Terry Therneau: Recent and future work of Survival analysis Topic Group (TG8) ↓ I'll describe the basic content of our first paper (currently in draft) as background. More time will be spent discussing 3 challenges that we have faced in creation, along with an opportunity for feedback from others. |

15:00 - 15:30 | Coffee Break (TCPL Foyer) |

15:30 - 16:15 |
Ewout Steyerberg: Recent and future work of Diagnostic tests and prediction models Topic Group (TG6) ↓ (Presentation 30 min. + Discussion 15 min.) (TCPL 201) |

16:15 - 17:00 |
Els Goetghebeur: Recent and future work of Causal inference Topic Group (TG7) ↓ (Presentation 30 min. + Discussion 15 min.) (TCPL 201) |

17:00 - 17:45 |
Jörg Rahnenführer: Recent and future work of High-dimensional data Topic Group (TG9) ↓ (Presentation 30 min. + Discussion 15 min.) (TCPL 201) |

17:45 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

Tuesday, June 4 | |
---|---|

07:00 - 08:30 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

08:30 - 08:35 |
Laurence Freedman: Recent and future work of Measurement error and misclassification Topic Group (TG4) ↓ This 5 minute presentation will review the basic philosophy behind the group’s work, the past achievements and the future plans. For details, see the report from TG4. |

08:35 - 09:00 |
Ruth Keogh: Guidance papers on measurement error: Overview and some special topics (TG4) ↓ This talk will discuss two guidance papers for biostatisticians on the topic of measurement error, written by Topic Group 4 (Measurement Error and Misclassification) of the STRATOS Initiative. I will provide some background to this work and give an overview of the material covered in the two papers. The topics in the first paper range from an introduction to error of different types and discussion of their impact, to study design considerations, to the simpler methods of measurement error correction (regression calibration and simulation extrapolation). The second paper covers more advanced topics including more advanced and flexible methods for error correction, such as Bayesian methods and multiple imputation, the design and analysis of studies when the outcome is measured with error, and the use of sensitivity analyses. I will also highlight some of the available software for implementing the methods discussed. |

09:00 - 09:10 |
Laurence Freedman: Berkson error in epidemiology (TG4) ↓ While working on our guidance paper for biostatisticians, we came across some little-known facts related to a type of measurement error known as Berkson error, named after Joseph Berkson who wrote about this type of error in a JASA paper, published in 1951. The main characteristic of such measurement error is that it is independent of the mismeasured observation. In other words, if we denote the true measurement by $X$ and the mismeasured one by $X^{*}$, and the error by e (with mean zero and constant variance), then $X=X^{*}+e$, where e is independent of $X^{*}$. This is in contra-distinction to the more commonly occurring classical error, where $X^{*}=X+e$, and $e$ is independent of $X$. |

09:10 - 09:20 |
Pamela Shaw: Use and misuse of predicted values in epidemiologic data analyses (TG4) ↓ Pamela A. Shaw, Paul Gustafson, Daniela Sotres-Alvarez, Victor Kipnis, and Laurence Freedman |

09:20 - 09:30 |
Victor Kipnis: New insights into the effects of error-prone exposure in the analysis of longitudinal studies with mixed models (TG4) ↓ Mixed effects models have become one of the major approaches to the analysis of longitudinal studies. Random effects in those models play a twofold role. First, they address heterogeneity among individual temporal trajectories, and, second, they induce a correlational structure among temporal observations of the same subject. If both the exposure and outcome vary with time, it is natural to specify mixed effects model for both. If heterogeneity in temporal trajectories is related to unknown subject-level confounders, the corresponding random effects will be correlated, inducing correlation between random effects in the outcome model and the exposure. In this case, there are three different effects of the exposure on outcome, the within-subject or individual level effect, the between-subject effect of mean individual exposure, and the marginal or the population-average effect. If the existing correlation between random effects and exposure is ignored, the estimated exposure effect(s) will be biased. If exposure is measured with error, there will always be a nonzero correlation between random effects in the outcome model and error-prone exposure, even if this correlation was zero in the model with true exposure. |

09:30 - 10:00 | Discussion (TG4) (TCPL 201) |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:45 - 11:15 |
Carsten Schmidt: Recent and future work of Initial data analysis Topic Group (TG3) ↓ (Presentation 15 min. + Discussion 5 min.) (TCPL 201) |

11:15 - 11:35 |
Marianne Huebner: Undertaking initial data analysis before fitting a regression model: What should a researcher think about? ↓ (Presentation 15 min. + Discussion 5 min.) (TCPL 201) |

11:35 - 13:00 | Lunch (Vistas Dining Room) |

13:00 - 14:30 |
Frank Harrell: Controversies in predictive modeling, machine learning, and validation ↓ This talk will cover a variety of controversial and/or current issues related to statistical modeling and prediction research. Some of the topics covered are why external validation is often not a good idea, why validating researchers is often more efficient than validating models, what distinguishes statistical models from machine learning, how variable selection only gives the illusion of learning from data, and advantages of older measures of model performance. |

14:30 - 15:15 |
Georg Heinze: Recent and future work of Selection of variables and functional forms Topic Group (TG2) ↓ (Presentation 30 min. + Discussion 15 min.) (TCPL 201) |

15:15 - 15:45 | Coffee Break (TCPL Foyer) |

15:45 - 16:00 |
Organization of inter-Topic-Group collaboration meetings ↓ Propositions from: |

16:00 - 18:00 | Inter-Topic-Group Collaboration meetings (TCPL 201) |

18:00 - 19:30 | Dinner (Vistas Dining Room) |

Wednesday, June 5 | |
---|---|

07:00 - 08:30 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

08:30 - 09:30 |
Lisa McShane: Simulation Panel (SP) ↓ (Presentation 40 min. + Discussion 20 min.) (TCPL 201) |

09:30 - 10:00 |
Saskia le Cessie: The simulation learner (TG7) ↓ The recently submitted tutorial of TG7, “formulating causal concepts and principled statistics answers” is accompanied by the “simulation learner”. This is a simulated dataset, motivated by the Promotion of Breastfeeding Intervention Trial (PROBIT)1. Mother-infant pairs were randomised to receive either standard care or a breastfeeding encouragement (BFE) intervention, and weight achieved at age 3 months was the main outcome. |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:30 - 11:30 |
Per Kragh Andersen: Pseudo-observations (TG8) ↓ Survival analysis is characterized by the need to deal with incomplete observation of outcome variables, most frequently caused by right-censoring, and several - now standard - inference procedures have been developed to deal with this. Examples include the Kaplan-Meier estimator for the survival function and partial likelihood for estimating regression coefficients in the proportional hazards (Cox) regression model. During the past 15 years, methods based on pseudo-observations have been studied. Here, the idea is to apply a transformation of the incompletely observed survival data and, thereby, to create a more simple data set on which `standard' techniques (i.e., for complete data) may be applied, e.g., methods using generalized estimating equations (GEE). |

11:30 - 13:00 | Lunch (Vistas Dining Room) |

13:00 - 17:00 |
Free Afternoon ↓ Suggestions of activities from the BIRS Station Manager, Jacob Posacki, for participants who do not have their own vehicle: - Hike up Tunnel Mountain. BIRS is located on the side of Tunnel Mountain. It is about a2.5 hour round trip from BIRS if you walk slowly and take your time at the top. It is an easy walk and a good introduction to Banff for those who are coming for the first time.
- A visit to Cave and Basin National Historic site. There is a lot of science and history to interpret here as well as a network of scenic and easy walking trails around the valley bottom. There’s more information about Cave and Basin at the following link:
https://www.pc.gc.ca/en/lhn-nhs/ab/caveandbasin/activ - Hike or take the gondola up Sulphur mountain. This is a longer hike (6.1 km one-way from parking lot to Sanson's Peak, 655 m elevation gain, 4 hour round trip) but on a fairly established trail. The gondola is expensive (from 57$ round trip, half price for going down only) but can be a good experience for those who want to skip the hard part! The parking lot can be reached by taxi or Roam Transitl local route #1:
http://www.banff.ca/locals-residents/public-transit-buses/bus-routes-schedules.htm See the Banff map: https://banff.ca/DocumentCenter/View/25 #2 = BIRS, trail to Tunnel Mountain is just above; #7 = Cave and Basin National Historic site; #19 = trail to Sulphur Mountain and Gondola.
Jacob Posacki will be available in-person during the week to provide more insight for participants who choose to venture further into Banff National Park during their free afternoon ([email protected]). |

17:30 - 19:30 | Dinner (Vistas Dining Room) |

Thursday, June 6 | |
---|---|

07:00 - 08:30 |
Breakfast ↓ |

08:30 - 08:50 | Maarten van Smeden: Dataset Panel (DP) (TCPL 201) |

08:50 - 09:30 |
Suzanne Cadarette: Knowledge Translation Panel (TP) ↓ (Presentation 15 min. + Discussion 5 min.) (TCPL 201) |

09:40 - 10:00 | Geraldine Rauch: (Systematic) review on statistical series within medical journals - what medical researchers are told about regression modeling (TCPL 201) |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:30 - 11:30 |
Mark Baillie: How do we make better graphs? Effective visual communication for the quantitative scientist (Visualisation Panel, VP) ↓ The goal of quantitative science is to enable informed decisions and actions through a data‐driven understanding of complex scientific questions. It is the role of any quantitative scientist (pharmacometrician, statistician, epidemiologist, etc.) to support this goal through (1) elucidation of the scientific question of interest, (2) appropriate quantitative methods (experimental design, statistical or mathematical models, etc.) and (3) effective communication of results. All of these aspects work in concert; one without the others is not sufficient. |

11:30 - 13:00 | Lunch (Vistas Dining Room) |

13:00 - 13:30 | Martin Boeker: Glossary Panel (GP) (TCPL 201) |

13:30 - 15:00 | Short talks or TG meetings (TCPL 201) |

15:00 - 15:30 | Coffee Break (TCPL Foyer) |

15:30 - 17:30 | Collaborations (TCPL 201) |

17:30 - 19:30 | Dinner (Vistas Dining Room) |

Friday, June 7 | |
---|---|

07:00 - 08:30 |
Breakfast ↓ |

08:30 - 08:50 |
Orlagh Carroll: How are missing data handled in observational time-to-event studies? A systematic review. ↓ Missing data in covariates are known to result in biased estimates of association with the outcome and loss of power to detect associations. Missing data can also lead to other challenges in time-to-event analyses including the handling of time-varying effects of covariates, selection of covariates and their flexible modelling. This review aimed to understand how researchers are approaching time-to-event analyses when missing data are present. Medline and Embase were searched for observational time-to-event studies published from January 2011 to January 2018. We assessed the covariate selection procedure, assumptions of proportional hazards models, if functional forms were considered and how missing data affected this. We recorded the extent of missing data and how it was addressed in the analysis, for example using a complete-case analysis or multiple imputation. 148 studies were included in the review. On average, 15% of data were discarded due to missingness while determining the study population and 32% during the analysis stage. In total, 86% did not state any missing data assumptions. Complete-case analysis was common (56%) while 22% used multiple imputation. |

08:50 - 10:00 | Discussion / Reports (TCPL 201) |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:30 - 11:30 | Discussion / Reports (TCPL 201) |

11:30 - 12:00 |
Checkout by Noon ↓ 5-day workshop participants are welcome to use BIRS facilities (BIRS Coffee Lounge, TCPL and Reading Room) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 12 noon. (Front Desk - Professional Development Centre) |

12:00 - 13:00 | Lunch from 11:30 to 13:30 (Vistas Dining Room) |