Schedule for: 19w5198 - Toward a Comprehensive, Integrated Framework for Advanced Statistical Analyses of Observational Studies

Beginning on Sunday, June 2 and ending Friday June 7, 2019

All times in Banff, Alberta time, MDT (UTC-6).

Sunday, June 2
16:00 - 17:30 Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
20:00 - 22:00 Informal gathering (Corbett Hall Lounge (CH 2110))
Monday, June 3
07:00 - 08:30 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
08:30 - 08:45 Introduction by Organizers (M. Abrahamowicz, W. Sauerbrei) (TCPL 201)
08:45 - 09:00 Introduction and Welcome by BIRS Staff
A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions.
(TCPL 201)
09:00 - 09:20 Presentation of everyone (TCPL 201)
09:20 - 10:00 Willi Sauerbrei: Recent developments in the STRATOS initiative
The STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative: http://www.stratos-initiative.org/
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:30 Mitchell Gail: Potential collaborations between Design Topic Group (TG5) and other STRATOS Topic Groups

The aim of Topic Group5 is to provide accessible and accurate guidance in the design of observational studies, and our efforts have focused on exposition of established designs. Collaboration with other Topic Groups in STRATOS offers potential opportunities for investigation of new designs or design adaptations needed for emerging areas of research. Some examples follow. 1. Many observational study designs, such as case-cohort and nested case-control designs, can be regarded as subsampling a cohort with missingness by design. There may be opportunities for strengthening such designs by augmenting the samples or by obtaining ancillary data on all cohort members to strengthen the analyses of the subsampled data(1) (TG1, TG8). 2. Dose-response modeling is an important goal of some observational studies. What aspects of study design can improve information for dose-response modeling (TG2)? 3. Validation sub-samples can improve inference from error-prone observational data from electronic data bases(2). Can one advance this approach by judiciously selecting the validation sub-samples (TG4)? 4. The development and validation of risk models for predicting incident disease, for diagnosis and for prognosis require data for model-building and for validation. Often, the data for independent validation do not include complete covariate information on some or all members of the validation sample. Can the design of the validation sample be improved by including supplemental data (TG6)? 5. There has been an explosion of interest in causal analysis and related topics such as Mendelian randomization and mediation analysis. What work on design has been done in these areas, and what design considerations might improve studies in these areas (TG7)? 6. The design and analysis of observational studies with high-dimensional outcomes or exposure data pose special challenges in data quality and preprocessing, control of false positive discoveries, and replicability of results. Is it possible to improve the design of a sequence of such studies for the discovery and replicability of associations with phenotype (TG9)? 7. Study design can promote data quality and completeness, limit biases in estimates of exposure effects from errors in laboratory measurements, improve control for confounding, and limit or help define selection biases. Such issues are of importance to all TGs. These 7 topics are meant to start a conversation that will lead to more and better suggestions for collaboration on design issues by members of the various TGs.

1. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Using the Whole Cohort in the Analysis of Case-Cohort Data. Am J Epidemiol. 2009;169(11):1398-405.
2. Oh EJ, Shepherd BE, Lumley T, Shaw PA. Considerations for analysis of time-to-event outcomes measured with error: Bias and correction with SIMEX. Stat Med. 2018;37(8):1276-89.

(Presentation 40 min. + Discussion 20 min.)

(TCPL 201)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:00 - 14:00 Guided Tour of The Banff Centre
Meet in the Corbett Hall Lounge for a guided tour of The Banff Centre campus.
(Corbett Hall Lounge (CH 2110))
14:00 - 14:15 Group Photo
Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo!
(TCPL 201)
14:15 - 15:00 Terry Therneau: Recent and future work of Survival analysis Topic Group (TG8)

I'll describe the basic content of our first paper (currently in draft) as background. More time will be spent discussing 3 challenges that we have faced in creation, along with an opportunity for feedback from others.

1. Data. An initial decision was to try and locate an observational data set, one was rich enough to support both simple and complex questions, and which could be freely shared outside the group. Maja Pohar was able to contribute a data set that follows 742 subjects with peripheral arterial disease (PAD) along with 1 matched control, selected from the same practice by each physician contributer. Terry T was able to contribute a second data set on 3854 patients with non-alcoholic liver disease (NAFLD), each with 4 age/sex matched controls. The final paper uses these along with several smaller data sets.
Issues:
a. The smaller data sets are in some ways not realisitic (no missings, few variables), but can be easier for targeted demonstration. They are not all observational. Should we stick with the "serious" data?
b. Are these useful to others
c. Where will they be formally hosted.

2. The first paper concerns hazard models. With respect to writing style, how much relative effort to devote to
rigorous definitions
statistical properties and motivation vs other motivation
checklist
examples
dicussion (weaknesses, alternatives)
breadth and detail

3. Minor technical. An example is the consensus decision that "check the PH assumption" should be on the checklist, and then finding in reviewing the example that there are quite varied (and strong) opinions on what is the best way to perform such a check. More generally, how 'perfect' do the examples have to be, with respect to say methods that work reasonably well in practice but are subject to challenge?

(Presentation 30 min. + Discussion 15 min.)

(TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 16:15 Ewout Steyerberg: Recent and future work of Diagnostic tests and prediction models Topic Group (TG6)
(Presentation 30 min. + Discussion 15 min.)
(TCPL 201)
16:15 - 17:00 Els Goetghebeur: Recent and future work of Causal inference Topic Group (TG7)
(Presentation 30 min. + Discussion 15 min.)
(TCPL 201)
17:00 - 17:45 Jörg Rahnenführer: Recent and future work of High-dimensional data Topic Group (TG9)
(Presentation 30 min. + Discussion 15 min.)
(TCPL 201)
17:45 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
Tuesday, June 4
07:00 - 08:30 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
08:30 - 08:35 Laurence Freedman: Recent and future work of Measurement error and misclassification Topic Group (TG4)

This 5 minute presentation will review the basic philosophy behind the group’s work, the past achievements and the future plans. For details, see the report from TG4.

(TCPL 201)
08:35 - 09:00 Ruth Keogh: Guidance papers on measurement error: Overview and some special topics (TG4)

This talk will discuss two guidance papers for biostatisticians on the topic of measurement error, written by Topic Group 4 (Measurement Error and Misclassification) of the STRATOS Initiative. I will provide some background to this work and give an overview of the material covered in the two papers. The topics in the first paper range from an introduction to error of different types and discussion of their impact, to study design considerations, to the simpler methods of measurement error correction (regression calibration and simulation extrapolation). The second paper covers more advanced topics including more advanced and flexible methods for error correction, such as Bayesian methods and multiple imputation, the design and analysis of studies when the outcome is measured with error, and the use of sensitivity analyses. I will also highlight some of the available software for implementing the methods discussed.

The rest of the talk will focus on two particular topics from the guidance paper which involve more recent findings: use of multiple imputation for correcting for the impacts of measurement error in covariates, and special issues arising when there is measurement error in an outcome variable. Examples will be given from a study in nutritional epidemiology with error-prone covariates, and from a trial with an error-prone outcome.

In the following presentation, Laurence Freedman will discuss another particularly interesting and challenging topic covered in the guidance papers, that of Berkson error.

(TCPL 201)
09:00 - 09:10 Laurence Freedman: Berkson error in epidemiology (TG4)

While working on our guidance paper for biostatisticians, we came across some little-known facts related to a type of measurement error known as Berkson error, named after Joseph Berkson who wrote about this type of error in a JASA paper, published in 1951. The main characteristic of such measurement error is that it is independent of the mismeasured observation. In other words, if we denote the true measurement by $X$ and the mismeasured one by $X^{*}$, and the error by e (with mean zero and constant variance), then $X=X^{*}+e$, where e is independent of $X^{*}$. This is in contra-distinction to the more commonly occurring classical error, where $X^{*}=X+e$, and $e$ is independent of $X$.

The common perception is that when a covariate $X^{*} $ in a regression model has Berkson error, the regression coefficients in the model are estimated without bias. In other words if outcome $Y$ is related to $X$ and other covariates $Z$ in a regression model, e.g. $E(Y|X,Z)=\beta_0+\beta_X X+\beta_Z Z$, then it is also true that $E(Y|X^{*},Z)=\beta_0+\beta_X X^{*}+\beta_Z Z$. The only condition that was thought to require this condition was the “non-differentiality” of the error e, i.e. the condition that the conditional distribution $Y|X^{*},X,Z$ is equal to $Y|X,Z$. We have found out that this not generally true and a further condition is required, namely that Berkson error e is also independent of $Z$.

It is possible that this fact may impact on many studies in occupational health. Industrial exposures to workers are often estimated from the mean exposures to specific subgroups. For example, those who perform hands on laboratory work may be at a one level of exposure, whereas those who are office workers and occasionally walk through the laboratory are at a different (lower) level. Ascribing the subgroup mean exposure $X^{*}$ to each worker in a specific subgroup induces Berkson error. But suppose now that the outcome in question is a form of cancer that is related to gender, so that the risk model includes gender among the covariates $Z$. Now if the subgroups include both men and women and the true exposure to men is higher than to women, then the Berkson error e will be related to gender, and our second condition will not be satisfied. Our group will explore whether this type of problem indeed arises in occupational health studies. In addition, we are working on related problems with Berkson error arising from the use of prediction equations in place of observed exposure values, as Pamela Shaw will explain.

(TCPL 201)
09:10 - 09:20 Pamela Shaw: Use and misuse of predicted values in epidemiologic data analyses (TG4)

Pamela A. Shaw, Paul Gustafson, Daniela Sotres-Alvarez, Victor Kipnis, and Laurence Freedman

For many epidemiologic settings, the principle exposure or outcome under study can only be imprecisely measured. In an attempt to address error-in-variables, sometimes the analyst will adjust these variables, say through a calibration or prediction equation, and use the resulting predicted value in the analysis in place of the observed value. When a predicted quantity is used in place of an observed value in a data analysis, consideration of the impact of the uncertainty in the predicted quantity on the study results is needed, but this is not always done in practice. Such predicted variables usually have Berkson error. The result of ignoring this uncertainty, or prediction error, for some settings could be that the parameter estimates are biased, the standard errors are biased, or both. We examine three common examples for how predicted values are used in an analysis in place of an error-prone variable: 1) to estimate the distribution of a variable, 2) to compare values of a variable between groups by using the predicted value in a two-group statistic (e.g. t-statistic) or as an outcome variable in a regression, and 3) to estimate the effect of an error-prone variable on an outcome, where the predicted quantity is used as exposure variable in a regression. For each example, we present an overview of the potential consequences for using a predicted quantity in an analysis in place of the true value without appropriate statistical adjustment. We further illustrate some concepts with data from a large population-based cohort, the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).

(TCPL 201)
09:20 - 09:30 Victor Kipnis: New insights into the effects of error-prone exposure in the analysis of longitudinal studies with mixed models (TG4)

Mixed effects models have become one of the major approaches to the analysis of longitudinal studies. Random effects in those models play a twofold role. First, they address heterogeneity among individual temporal trajectories, and, second, they induce a correlational structure among temporal observations of the same subject. If both the exposure and outcome vary with time, it is natural to specify mixed effects model for both. If heterogeneity in temporal trajectories is related to unknown subject-level confounders, the corresponding random effects will be correlated, inducing correlation between random effects in the outcome model and the exposure. In this case, there are three different effects of the exposure on outcome, the within-subject or individual level effect, the between-subject effect of mean individual exposure, and the marginal or the population-average effect. If the existing correlation between random effects and exposure is ignored, the estimated exposure effect(s) will be biased. If exposure is measured with error, there will always be a nonzero correlation between random effects in the outcome model and error-prone exposure, even if this correlation was zero in the model with true exposure.

Due to this critical result, the unbiased estimation of the effect of measurement error in the mixed model requires taking the correlation between random effects and error-prone exposure into account. Theoretical developments are exemplified by the analysis of data on physical activity energy expenditure from a large validation study of different physical activity instruments using doubly labeled water as unbiased reference measurements.

(TCPL 201)
09:30 - 10:00 Discussion (TG4) (TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:45 - 11:15 Carsten Schmidt: Recent and future work of Initial data analysis Topic Group (TG3)
(Presentation 15 min. + Discussion 5 min.)
(TCPL 201)
11:15 - 11:35 Marianne Huebner: Undertaking initial data analysis before fitting a regression model: What should a researcher think about?
(Presentation 15 min. + Discussion 5 min.)
(TCPL 201)
11:35 - 13:00 Lunch (Vistas Dining Room)
13:00 - 14:30 Frank Harrell: Controversies in predictive modeling, machine learning, and validation

This talk will cover a variety of controversial and/or current issues related to statistical modeling and prediction research. Some of the topics covered are why external validation is often not a good idea, why validating researchers is often more efficient than validating models, what distinguishes statistical models from machine learning, how variable selection only gives the illusion of learning from data, and advantages of older measures of model performance.

(Presentation 60 min. + Discussion 30 min.)

(TCPL 201)
14:30 - 15:15 Georg Heinze: Recent and future work of Selection of variables and functional forms Topic Group (TG2)
(Presentation 30 min. + Discussion 15 min.)
(TCPL 201)
15:15 - 15:45 Coffee Break (TCPL Foyer)
15:45 - 16:00 Organization of inter-Topic-Group collaboration meetings

Propositions from:
- Michal Abrahamowicz
- Mitchell Gail
- Marianne Huebner
- Jörg Rahnenführer and Lisa McShane

(TCPL 201)
16:00 - 18:00 Inter-Topic-Group Collaboration meetings (TCPL 201)
18:00 - 19:30 Dinner (Vistas Dining Room)
Wednesday, June 5
07:00 - 08:30 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
08:30 - 09:30 Lisa McShane: Simulation Panel (SP)
(Presentation 40 min. + Discussion 20 min.)
(TCPL 201)
09:30 - 10:00 Saskia le Cessie: The simulation learner (TG7)

The recently submitted tutorial of TG7, “formulating causal concepts and principled statistics answers” is accompanied by the “simulation learner”. This is a simulated dataset, motivated by the Promotion of Breastfeeding Intervention Trial (PROBIT)1. Mother-infant pairs were randomised to receive either standard care or a breastfeeding encouragement (BFE) intervention, and weight achieved at age 3 months was the main outcome.

The simulated path from randomized assignment to outcome, went over the uptake of the intervention (education program), that could be followed by the start and a specific duration of breastfeeding. The data were further enriched by the generation of alternative exposure levels with their potential outcome data in addition to ‘observed’ data. This enables the reader of the tutorial or student in a course to better understand concepts of causal inference through visualization of potential outcomes under different treatments and different causal effects estimands in different populations. It further allows to explore distinct estimation methods and compare their results with the simulated populations parameter. The simulation learner showed us for example how approaches valid for one type of exposure (e.g receiving an offer for the BFE programme or actually following the BFE programme) are not automatically also valid for other exposures (e.g. the effect of actually starting breastfeeding). R code for data generation and analysis is available on www.ofcaus.org, where SAS and Stata code for analysis is also provided.

1. Kramer MS, Chalmers B, Hodnett ED, et al. Promotion of breastfeeding intervention trial (PROBIT) - A randomized trial in the Republic of Belarus. Journal of the American Medical Association. 2001;285(4):413-420.

(Presentation 20 min. + Discussion 10 min.)

(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:30 Per Kragh Andersen: Pseudo-observations (TG8)

Survival analysis is characterized by the need to deal with incomplete observation of outcome variables, most frequently caused by right-censoring, and several - now standard - inference procedures have been developed to deal with this. Examples include the Kaplan-Meier estimator for the survival function and partial likelihood for estimating regression coefficients in the proportional hazards (Cox) regression model. During the past 15 years, methods based on pseudo-observations have been studied. Here, the idea is to apply a transformation of the incompletely observed survival data and, thereby, to create a more simple data set on which `standard' techniques (i.e., for complete data) may be applied, e.g., methods using generalized estimating equations (GEE).

As an example, we can consider the problem of relating the survival probability, $S(t_0)$ at a single time point, $t_0$, to covariates, $z$, based on right-censored survival times $T_i$ and failure indicators $D_i$ for independent observations $i=1,...,n$. Here, $T_i=min(X_i,U_i)$ for potential complete failure times $X_i$ and right-censoring times $U_i$ and $D_i=I(T_i=X_i)$. Let $\hat{S}(t)$ be the Kaplan-Meier estimator for $S(t)=P(X>t)$. Then the pseudo-observations for the incompletely observed survival indicators $I(X_i>t_0)$, $i=1,...,n$ are $$S_i=n\hat{S}(t_0)-(n-1)\hat{S}^{(-i)}(t_0),\;\;\;i = 1,...,n,$$ where $\hat{S}^{(-i)}(t)$ is the Kaplan-Meier estimator applied to the sample of size $n-1$ with observation $i$ taken out. Regression coefficients in a generalized linear model $$g(S(t_0|z)) =\beta_0 +\beta^{T}z$$ with link function $g$ are then estimated by solving the GEE $$\sum_{i} A(\beta,z_i)\big(S_i-g^{-1}(\beta_0+\beta^{T}z)\big)=0$$ where, typically, $A(\beta,z)=\frac{\partial}{\partial \beta}g^{-1}(\beta_0+\beta^{T}z)$.

An advantage of this approach is that it applies quite generally to parameters for which no other regression methods are directly available (including average time spent in a state of a multi-state model), whereas disadvantages include that the method is not fully efficient and that it, in its most simple form, requires that the distribution of censoring times, $U$, is independent of the covariates, $z$. We will review the development in this field since the method was put forward by Andersen, Klein and Rosthoj (2003, Biometrika), with special emphasis on recent results by Overgaard, Parner and Pedersen (2017, Ann. Statist.) and Pavlic, Martinussen and Andersen (2019, Lifetime Data Anal.).

(Presentation 40 min. + Discussion 20 min.)

(TCPL 201)
11:30 - 13:00 Lunch (Vistas Dining Room)
13:00 - 17:00 Free Afternoon

Suggestions of activities from the BIRS Station Manager, Jacob Posacki, for participants who do not have their own vehicle:

  1. Hike up Tunnel Mountain. BIRS is located on the side of Tunnel Mountain. It is about a2.5 hour round trip from BIRS if you walk slowly and take your time at the top. It is an easy walk and a good introduction to Banff for those who are coming for the first time.

  2. A visit to Cave and Basin National Historic site. There is a lot of science and history to interpret here as well as a network of scenic and easy walking trails around the valley bottom. There’s more information about Cave and Basin at the following link:

    https://www.pc.gc.ca/en/lhn-nhs/ab/caveandbasin/activ

  3. Hike or take the gondola up Sulphur mountain. This is a longer hike (6.1 km one-way from parking lot to Sanson's Peak, 655 m elevation gain, 4 hour round trip) but on a fairly established trail. The gondola is expensive (from 57$ round trip, half price for going down only) but can be a good experience for those who want to skip the hard part! The parking lot can be reached by taxi or Roam Transitl local route #1:
    http://www.banff.ca/locals-residents/public-transit-buses/bus-routes-schedules.htm

    See the Banff map:
    https://banff.ca/DocumentCenter/View/25
    #2 = BIRS, trail to Tunnel Mountain is just above; #7 = Cave and Basin National Historic site; #19 = trail to Sulphur Mountain and Gondola.

Jacob Posacki will be available in-person during the week to provide more insight for participants who choose to venture further into Banff National Park during their free afternoon ([email protected]).

(Banff National Park)
17:30 - 19:30 Dinner (Vistas Dining Room)
Thursday, June 6
07:00 - 08:30 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
08:30 - 08:50 Maarten van Smeden: Dataset Panel (DP) (TCPL 201)
08:50 - 09:30 Suzanne Cadarette: Knowledge Translation Panel (TP)
(Presentation 15 min. + Discussion 5 min.)
(TCPL 201)
09:40 - 10:00 Geraldine Rauch: (Systematic) review on statistical series within medical journals - what medical researchers are told about regression modeling (TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:30 Mark Baillie: How do we make better graphs? Effective visual communication for the quantitative scientist (Visualisation Panel, VP)

The goal of quantitative science is to enable informed decisions and actions through a data‐driven understanding of complex scientific questions. It is the role of any quantitative scientist (pharmacometrician, statistician, epidemiologist, etc.) to support this goal through (1) elucidation of the scientific question of interest, (2) appropriate quantitative methods (experimental design, statistical or mathematical models, etc.) and (3) effective communication of results. All of these aspects work in concert; one without the others is not sufficient.

Scientific influence relies on effective communication, however, we often focus on the former and neglect the latter, with sophisticated investigations remain without impact. Effective visual communication is a core competency for the quantitative scientist [1]. It is essential in every step of the quantitative workflow, from scoping to execution and communicating results and conclusions. With this competency, we can better understand data and influence decisions towards appropriate actions. Without it, we can fool ourselves and others and pave the way to wrong conclusions and actions.

In this talk, I will present an example of a concerted effort to improve the way we (visually) communicate as statisticians at Novartis, sharing experiences of an internal initiative to help foster the use of good graphs in pharmaceutical statistics [2]. I will also discuss the role of the new STRATOS visualization panel [3] to promote the use of good graphical principles for effective visual communication. The aim of the panel is to provide guidance and recommendations covering all aspects from the design, implementation and review of statistical graphics.

[1] https://arxiv.org/abs/1903.09512
[2] https://onlinelibrary.wiley.com/doi/full/10.1002/pst.1912
[3] STRATOS visualisation panel : http://www.stratos-initiative.org/node/61

(Presentation 40 min. + Discussion 20 min.)

(TCPL 201)
11:30 - 13:00 Lunch (Vistas Dining Room)
13:00 - 13:30 Martin Boeker: Glossary Panel (GP) (TCPL 201)
13:30 - 15:00 Short talks or TG meetings (TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 17:30 Collaborations (TCPL 201)
17:30 - 19:30 Dinner (Vistas Dining Room)
Friday, June 7
07:00 - 08:30 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
08:30 - 08:50 Orlagh Carroll: How are missing data handled in observational time-to-event studies? A systematic review.

Missing data in covariates are known to result in biased estimates of association with the outcome and loss of power to detect associations. Missing data can also lead to other challenges in time-to-event analyses including the handling of time-varying effects of covariates, selection of covariates and their flexible modelling. This review aimed to understand how researchers are approaching time-to-event analyses when missing data are present. Medline and Embase were searched for observational time-to-event studies published from January 2011 to January 2018. We assessed the covariate selection procedure, assumptions of proportional hazards models, if functional forms were considered and how missing data affected this. We recorded the extent of missing data and how it was addressed in the analysis, for example using a complete-case analysis or multiple imputation. 148 studies were included in the review. On average, 15% of data were discarded due to missingness while determining the study population and 32% during the analysis stage. In total, 86% did not state any missing data assumptions. Complete-case analysis was common (56%) while 22% used multiple imputation.

While guidelines are in place, few studies are implementing their recommendations in practice. Missing data are present in many studies but few state clearly how they handled it or the assumptions they have made.

(Presentation 15 min. + Discussion 5 min.)

(TCPL 201)
08:50 - 10:00 Discussion / Reports (TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:30 Discussion / Reports (TCPL 201)
11:30 - 12:00 Checkout by Noon
5-day workshop participants are welcome to use BIRS facilities (BIRS Coffee Lounge, TCPL and Reading Room) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 12 noon.
(Front Desk - Professional Development Centre)
12:00 - 13:00 Lunch from 11:30 to 13:30 (Vistas Dining Room)