Schedule for: 16w5063 - Newest Developments and Urgent Issues in Measurement Error and Latent Variable Problems

Arriving in Banff, Alberta on Sunday, August 14 and departing Friday August 19, 2016
Sunday, August 14
16:00 - 17:30 Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
20:00 - 22:00 Informal gathering (Corbett Hall Lounge (CH 2110))
Monday, August 15
07:00 - 09:20 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:20 - 09:30 Introduction and Welcome by BIRS Station Manager (TCPL 201)
09:30 - 10:00 Raymond Carroll: New Measurement Error Data Structures
This is a survey overview on some relatively recent measurement error data structures, and brief discussions of how might one analyze them. Problems include (a) mixtures of continuous and discrete variables both subject to error; (b) problems in which the target covariate varies over time and is measured with error; (c) classical additive and Berkson multiplicative errors in radiation exposure doses; (d) Calibration and seasonal adjustment for matched case-control studies; (e) Spatial regression with covariate measurement error; (f) using functional data analysis to assess measurement error in energy intake; (g) deconvolution with heteroscedastic measurement error.
(TCPL 201)
10:00 - 11:00 Coffee Break (TCPL Foyer)
11:00 - 11:30 Donna Spiegelman: Generalized methods-of-moments estimation and inference for the assessment of multiple imperfect measures of diet and physical activity in validation studies
Accurate and precise measurement of diet and physical activity (PA) in free-living populations is difficult. As a result, key findings in nutritional epidemiology have been controversial, a notable example of this being the relation of dietary fat intake to breast cancer risk. There is a long literature in statistics on methods to adjust relative risk estimates for cancer and other chronic diseases for bias due to measurement error in long-term dietary intake and PA. To use the popular regression calibration method to correct for the bias, the de-attenuation factor needs to be estimated. Popular is used here in the sense that it is virtually the only method for correcting for bias due to exposure measurement error that has been used in applications, and there are hundreds of published instances of this. In this talk, we develop semi-parametric generalized methods of moments estimators for the de-attenuation factor and other quantities of interest, in particular, the correlation of each surrogate measure with the unobserved truth and intra-class correlation coefficients characterizing the random within-person variation around each measurement. The method makes assumptions only about the first two moments of the multivariate distribution between the measures. The robust variance is derived to allow asymptotic inference. We consider a one-step method which is theoretically inefficient, as well as fully efficient methods that are iterative. For some variables of interest, such as total energy intake, protein density, and total PA, there may be unbiased gold standards (X) available and when they are available, they are used. When these are not available and even when so, we consider other objective (W) and subjective measures (Z), such as biased (concentration) biomarkers, self-report, accelerometer and pulse, as means of estimating the de-attenuation factor and other quantities of interest. Measurements denoted W are assumed to have errors uncorrelated with all other measurements, and those denoted Z are allowed to have correlated errors with one or more of the other measures. Harvards Womens Lifestyle Validation Study (WLVS) assessed diet and physical activity over a 1 year period among 777 women. Total physical activity was assessed by doubly labeled water, often considered to be the gold standard for energy expenditure, accelerometer, resting pulse, physical activity questionnaire (PAQ), and ACT24, an on-line PA assessment tool. Thus, dim(X)=1, dim(Z)=2, and dim(W)=2. Using all 5 of these measures, the deattenuation factor (kcal/ MET-hours) for total physical activity assessed by the PAQ was estimated to be 4.09 (95% CI 0.94, 7.24) and for ACT24 5.59 (1.43, 9.75). These de-attenuation factors are calibrating the units of the PAQ and ACT24 from MET-hours/day to kcal/day, as well as adjusting for bias due to measurement error. In addition, using all 5 measures, the respective correlations of PAQ and ACT24 with truth were 0.36 (0.30, 0.41) and 0.32 (0.26, 0.38), respectively, and correlations of the accelerometer and resting pulse with truth were 0.891 (0.887, 0.893) and -0.20 (-0.32, -0.71) respectively. Little gain in efficiency between the one-step and fully iterated estimators was evident in this example. User-friendly publicly available software is under development.
(TCPL 201)
11:30 - 13:00 Lunch (Vistas Dining Room)
13:00 - 14:00 Guided Tour of The Banff Centre
Meet in the Corbett Hall Lounge for a guided tour of The Banff Centre campus.
(Corbett Hall Lounge (CH 2110))
14:30 - 15:30 Coffee Break (TCPL Foyer)
15:00 - 15:10 Group Photo (TCPL Foyer)
15:30 - 16:00 Xianzheng Huang: Nonparametric Modal Regression in the Presence of Measurement Error
In the context of regressing a response Y on a predictor X, we consider estimating the local modes of the distribution of Y given X = x when the data for X are contaminated with measurement error. We propose two nonparametric estimation methods. In one approach we relate this problem to estimating the partial derivative of the joint density of (X; Y ) in the presence of measurement error; and the second approach is built upon estimating the partial derivative of the conditional density of Y given X = x using error-prone data. We study the asymptotic properties of the mode estimator resulting from each method, and demonstrate their performance via simulation experiments.
(TCPL 201)
16:00 - 17:30 Problem Session 1: Malka Gorfine, Alicia Carriquiry, Len Stefanski et al. (TCPL 201)
16:15 - 16:50 Malka Gorfine: Nonparametric Adjustment for Measurement Error in Time to Event Data: Application to Risk Prediction Models (TCPL 201)
16:50 - 17:25 Alicia Carriquiry: Bivariate kernel deconvolution density estimation: An application to vitamin D (TCPL 201)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
Tuesday, August 16
07:00 - 09:30 Breakfast (Vistas Dining Room)
09:30 - 10:00 Yijian Huang: On heteroscedastic covariate measurement error in Cox regression
Many survival studies have error-contaminated covariates, which may lack a gold standard of measurement. Furthermore, the error distribution can depend on the true covariates but the dependence structure is typically difficult to quantify; heteroscedasticity is a common manifestation. In this talk, we suggest an additive measurement error model in this circumstance, and develop a functional modeling method for Cox regression when an instrumental variable is available. The estimated regression coefficients are consistent and asymptotically normal. Preliminary numerical studies, including simulations, will be provided.
(TCPL 201)
10:00 - 10:30 Victor Kipnis: Time-varying models for longitudinal data measured with error, with application to physical activity and sleep.
Modern accelerometers provide interesting and objective longitudinal data on different characteristics of physical activity that may influence important health outcomes. Those characteristics may fluctuate over a short span of time due to life demands, and their dynamic nature at the individual level is often of principal interest. The current research is motivated by the problem of estimating the temporal effect of moderate and vigorous physical activity on sleep using accelerometry measurements. We analyze weekly data from the BodyMedia study of 3650 women and 1009 men who wore accelerometers continuously for 12 consecutive weeks. On an appropriate scale, we propose a joint multivariate linear mixed model when both the exposure and bivariate outcome (lying down minutes and sleep minutes) vary over time and are subject to measurement error. To accommodate the possibility that heterogeneities in person-specific trajectories in physical activity and sleep characteristics may be related, we allow random effects in the corresponding parts of the model to be correlated. This correlation leads to important differences among the individual-level (or within-person), between-person, and population-level (or marginal) effects, as is exemplified by our data. Our simulations also demonstrate that ignoring correlated random effects, as is common in the mixed model approach to longitudinal data that are subject to measurement error, leads to substantial biases in estimated exposure effects.
(TCPL 201)
10:30 - 11:00 Coffee Break (TCPL Foyer)
11:00 - 11:30 Arthur Lewbel: Unobserved Preference Heterogeneity in Demand Using Generalized Random Coefficients
We model unobserved preference heterogeneity in demand systems as random Barten scales in utility functions. These Barten scales appear as random coefficients multiplying prices in demand functions. Consumer demands are nonlinear in prices and may have unknown functional structure. We therefore prove identification of additive Generalized Random Coefficients models, defined as additive nonparametric regressions where each regressor is multiplied by an unobserved random coefficient having an unknown distribution. Using Canadian data, we estimate energy demand functions with and without random coefficient Barten scales. We find that not accounting for this unobserved preference heterogeneity substantially biases estimated consumer-surplus costs of an energy tax.
(TCPL 201)
11:30 - 13:30 Lunch (Vistas Dining Room)
13:30 - 14:00 Joan Hu: Application of Latent Class Models in a Cancer Survivorship Study
Cancer survivors are often at risk of subsequent and ongoing health problems that are primarily treatment-related. This talk presents an analysis of the medical cost data associated with the longitudinal physician claims of a cancer survivor cohort and a sample from the general population under latent class models. It allows us to classify the survivors into two groups, the at-risk and notat-risk groups, and to make comparisons between the survivor cohort and the general population.
(TCPL 201)
14:00 - 14:30 Donglin Zeng: Threshold-Dependent Proportional Hazards Model to Assess Risk Factors for Incident Diabetes Defined by Plasma Glucose Levels
The Atherosclerosis Risk in Communities (ARIC) Study is a prospective study of risk factors for atherosclerosis being conducted in four U.S. communities. One important objective of this study is to assess the risk factors for diabetes. A participant is classified as diabetic when his or her fasting plasma glucose (FPG) value crosses a specified, fixed threshold. However, the exact time when the threshold is crossed is not observable when FPG values are subject to substantial measurement error. In this paper, we propose a semiparametric regression model based on the generalized extremevalue distribution to model the longitudinal FPG values. Our model is equivalent to modeling threshold-dependent time to diabetes via a Cox proportional hazards model, where the thresholddependent event time is defined as the time of the FPG values crossing a given threshold. To account for measurement error in the FPG values, we estimate the model parameters using the nonparametric pseudo-likelihood approach and implement computation via the pseudo-EM algorithm. In analyzing the ARIC Study data, several factors were found to be significantly associated with diabetes.
(TCPL 201)
14:30 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 17:30 Problem Session 2: Yair Goldberg, Haiying Wang, Yuanjia Wang, Magne Thoresen et al. (TCPL 201)
15:30 - 16:30 Yair Goldberg: RKHS and mixed effect models (TCPL 201)
16:30 - 17:30 Haiying Wang: subsampling big data for logistic and linear models (TCPL 201)
17:30 - 19:30 Dinner (Vistas Dining Room)
Wednesday, August 17
07:00 - 09:30 Breakfast (Vistas Dining Room)
09:30 - 10:00 Weixing Song: Regression Calibration in Measurement Error Modeling
When a p-dimensional parameter θ is defined through the moment condition Em(X,θ) = 0, a simple estimation procedure of θ is proposed by Hong and Tamer when X, a k-dimensional random vector, is contaminated with Laplace measurement error U, that is, we can only observe Z = X + U. However, the estimation procedure was designed particularly for the cases where the components of the measurement error vector U are independent. We shall introduce a general multivariate Laplace distribution, then extend the Hong-Tamer moment estimation procedure to a more general multivariate scenario. Moreover, the Hong-Tamer moment estimation procedure is based on the unconditional expectation Em(X,θ) = EH(Z,θ) for some function H. Example shows this techniques does not work in some cases. We will further discuss an estimation procedure based on the condition expectation E(m(X,θ)|Z), which can be treated as an extension of the regression calibration technique. Large sample properties of the proposed estimation procedure will be investigated. Next, I will try to extend the above extended regression technique to nonparametric setup, particularly focusing on the normal and Laplace measurement error.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Ingrid Van Keilegom: Frontier estimation in the presence of measurement error with unknown variance
We consider the problem of estimating a stochastic frontier, i.e. a frontier that is subject to (additive) measurement error. Contrary to other papers in the literature who work with unknown frontiers and normal noise variables, we consider the case where the variance of the noise is unknown. We show that under weak model assumptions this variance is identifiable, and we propose three ways to estimate this variance. The first proposal is given in Kneip, Simar and Van Keilegom (2015), who study the asymptotic theory and finite sample behavior in detail. The two other proposals are currently under investigation. Preliminary results will be given showing their excellent finite sample behavior. All three methods will first be studied in the univariate case (i.e. in the case where the boundary of the support of a univariate variable is of interest). The extension to (two- or more-dimensional) frontier models will be given in a second step.
(TCPL 201)
11:00 - 11:30 Yingyao Hu: Microeconomic Models with Latent Variables: Econometric Methods and Empirical Applications (TCPL 201)
11:30 - 13:30 Lunch (Vistas Dining Room)
13:30 - 15:30 Free Afternoon (Banff National Park)
17:30 - 19:30 Dinner (Vistas Dining Room)
Thursday, August 18
07:00 - 09:30 Breakfast (Vistas Dining Room)
09:30 - 10:00 Samiran Sinha: Analysis of proportional odds models with censoring and errors-in-covariates
We propose a consistent method for estimating both the finite and infinite dimensional parameters of the proportional odds model when a covariate is subject to measurement error and time-to-events are subject to right censoring. The proposed method does not rely on the distributional assumption of the true covariate which is not observed in the data. In addition, the proposed estimator does not require the measurement error to be normally distributed or to have any other specific distribution, and we do not attempt to assess the error distribution. Instead, we construct martingale based estimators through inversion, using only the moment properties of the error distribution, estimable from multiple erroneous measurements of the true covariate. The theoretical properties of the estimators are established and the finite sample performance is demonstrated via simulations. We illustrate the usefulness of the method by analyzing a dataset from a clinical study on AIDS.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Aurore Delaigle: Methodology for nonparametric deconvolution when the error distribution is unknown
We consider nonparametric estimation of a regression curve when the data are observed with multiplicative distortion which depends on an observed confounding variable. We suggest several estimators, ranging from a relatively simple one that relies on restrictive assumptions usually made in the literature, to a sophisticated piecewise approach that involves reconstructing a smooth curve from an estimator of a constant multiple of its absolute value, and which can be applied in much more general scenarios. We show that, although our nonparametric estimators are constructed from predictors of the unobserved undistorted data, they have the same first order asymptotic properties as the standard estimators that could be computed if the undistorted data were available. We illustrate the good numerical performance of our methods on both simulated and real datasets.
(TCPL 201)
11:20 - 11:45 Qihua Wang: Least productive relative error criterion based estimating equation approaches for the error-in-covariables multiplicative regression models
In this paper, we propose two estimating equation based methods to estimate the regression parameter vector in the multiplicative regression model when a subset of covariates are subject to measurement error but replicate measurements of their surrogates are available. Both methods allow the number of replicate measurements to vary between subjects. No parametric assumption is imposed on the measurement error term and the true covariates which are not observed in the data set. Under some regular- ity conditions, the asymptotic normality is established for both methods. Some simulation studies are conducted to assess the performances of the proposed methods. Real data analysis is used to illustrate our methods.
(TCPL 201)
11:30 - 13:30 Lunch (Vistas Dining Room)
13:30 - 14:00 Di Shu: IPTW estimation in marginal structural models with error-prone time-varying confounders
The inverse-probability-of-treatment weighted (IPTW) method, proposed by Robins et al (2000), is a useful approach for estimation of causal parameters pertaining to marginal structural models. This method requires that the measurements of the associated variables are precisely collected. In practice, however, measurement error arises commonly. In this talk, I will discuss how measurement error in time-varying confounders can bias the IPTW estimators for causal effects. To adjust for the measurement error effects, we develop several methods to consistently estimate causal parameters. Numerical studies are conducted to assess the performance of our methods.
(TCPL 201)
14:00 - 14:30 Tanja Hoegg: Bayesian analysis of matched case-control data subject to outcome misclassification, with application to database studies of multiple sclerosis
Health administrative databases collected by the Canadian provincial governments are often used as a cost-effective data source for multiple sclerosis (MS) research at the population level. Due to a high misdiagnosis rate in MS, identification of study subjects from administrative data results in high numbers of false positives and thus requires statistical techniques allowing for imperfect outcome variables. Motivated by an ongoing Canada-wide matched case-control study examining healthcare utilization between MS cases and healthy controls, we investigate the impact of outcome misclassification on association measures under this sampling scheme. Further, we aim to develop a Bayesian model for the analysis of associations between a binary exposure and a misclassified outcome variable. Emphasis will be placed on allowing for non-constant misclassification probabilities among the subjects as a way to incorporate information contributing to the certainty of an individuals true outcome, such as the total count of MS-related physician contacts.
(TCPL 201)
14:30 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 17:00 Problem Session 4: Malka Gorfine, Ingrid Van Keilegom, Arthur Lewbel, Xinyu Zhang et al. (TCPL 201)
15:30 - 16:30 Malka Gorfine: Heritability Estimation Using a Regularized Regression Approach (HERRA) (TCPL 201)
17:00 - 17:30 Zhongyi Zhu: Simultaneous Mean and Covariance Estimation of Partially Linear Models for Longitudinal Data with Missing Responses and Covariate Measurement Error (TCPL 201)
Missing responses and covariate measurement error are very commonly seen in practice. New estimating equations are developed to simultaneously estimate the mean and covariance under a partially linear model for longitudinal data with missing responses and covariate measurement error. Specifically, a novel approach is proposed to handle measurement error by using independent replicate measurements. Compared with existing methods, the proposed method requires fewer assumptions. For example, it does not require to specify the distribution of the mismeasured covariate or the measurement error, and does not need a parametric model to estimate the probability of being observed or to impute the missing responses. Additionally, the proposed estimating equations are easy to implement in most popular statistical softwares by applying existing algorithms for standard generalized estimating equations. The asymptotic properties of the proposed estimators are established under regularity conditions, and simulation studies demonstrate desired properties. Finally, the proposed method is applied to data from the Lifestyle Education for Activity and Nutrition (LEAN) study. This data analysis confirms the effectiveness of the intervention in producing weight loss at month nine.
(TCPL 201)
17:30 - 19:30 Dinner (Vistas Dining Room)
Friday, August 19
07:00 - 09:00 Breakfast (Vistas Dining Room)
09:00 - 09:30 Tanya Garcia: Simultaneous treatment of unspecified heteroskedastic model error distribution and mismeasured covariates for restricted moment models
This paper is concerned with the consistent and efficient estimationof parameters in general regression models with mismeasured covariates. We assume the distributions of the model error and covariates are completely unspecified, and that the measurement error distribution is a general parametric distribution with unknown variance-covariance. In this general setting, we construct root-n consistent, asymptotically normal and locally efficient estimators based on the semiparametric efficient score. Constructing the consistent estimator does not involve estimating the unknown distributions, nor modeling the potential model error heteroskedasticity. Instead, a consistent estimator is formed under possibly incorrect working models for the model error distribution, the error-prone covariate distribution, or both. A simulation study demonstrates that our method is robust and performs well for different incorrect working models, and various homoskedastic and heteroskedastic regression models with error-prone covariates. The usefulness of the method is further illustrated in a real data example.
(TCPL 201)
09:30 - 10:00 Magne Thoresen: Magne Thoresen: Ultrahigh dimensional variable screening with measurement error
We frequently encounter regression problems where the number of potential predictors / covariates exceeds the number of observations. Penalized regression methods may in such situations be a good solution. However, these days we also have to deal with situations where the number of potential predictors is too high even for these methods, hence some initial variable filtering is necessary. There is a rather large literature on this, but it is unclear how one should deal with measurement error in such situations. I will briefly present one example, where the variable selection is further complicated by a number of issues, like non-linearities and missing data.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:28 - 11:14 Wenqing He: Measurement error problems in image co-registration: a prostate cancer investigation (TCPL 201)
10:30 - 11:30 Problem Session 5: Wenqing He, Grace Yi, Paul Gustafson et al. (TCPL 201)
11:30 - 12:00 Checkout by 12 (TCPL 201)
11:30 - 13:30 Lunch from 11:30 to 13:30 (Vistas Dining Room)