 |
Tutorial: Handling missing data in R with MICE
|
Stef van Buuren, TNO Quality of Life, Leiden and Faculty of Social Sciences, University of Utrecht, The Netherlands
Karin Groothuis-Oudshoorn, Health Technology and Services Research, University of
Twente, The Netherlands
Abstract
Multiple imputation (Rubin 1987, 1996) is the method of choice for complex incomplete data problems. Missing data that occur in more than one variable presents a special challenge. Two general approaches for imputing multivariate data have emerged: joint
modeling (JM) and fully conditional specification (FCS) (van Buuren 2007). Multivariate Imputation by Chained Equations (MICE) is the name of software for imputing incomplete multivariate data by FCS.
In this tutorial we present the R package mice v2.1, which extends the functionality of
mice v1.0 in several ways (van Buuren and Groothuis-Oudshoorn 2009). In the tutorial a hands-on, stepwise approach will be given to using
mice v2.1 for solving incomplete data problems in real data. The goal of the tutorial is to provide sound and practical imputation techniques to obtain appropriate statistical inferences from incomplete data.
The tutorial focuses on the specification of the imputation model, the most challenging step in multiple imputation. There is no magical setting that produces appropriate imputations in every problem. The tutorial will teach you how to go beyond the default
settings. In addition, we outline practical tools and techniques for analyzing the imputed data.
Outline
Topics will include:
- Concise theory on multiple imputation
- A description of how the algorithm in MICE works
- Specification of the imputation model
- Special cases and features of MICE
- After MICE: repeated analysis and pooling
- Adding custom imputation functions
- Sensitivity analysis under MNAR
- Interacting with other software
Prerequisites
Elementary knowledge of general statistical concepts and (linear) statistical models is assumed. Moreover, basic programming in R is useful.
Potential attendees
R users, researchers who wants to analyse datasets with missing data.
References
- Rubin DB (1987). Multiple imputation for nonresponse in surveys. Wiley, New York.
- Rubin DB (1996). Multiple Imputation after 18+ Years. Journal of the American Statistical Association, 91(434), 473-489.
- Van Buuren S (2007). Multiple imputation of discrete and continuous data by fully conditional specification.
Statistical Methods in Medical Research, 16(3), 219-242.
- Van Buuren S, Groothuis-Oudshoorn K (2009). MICE: Multivariate Imputation by Chained Equations in R,
Journal of Statistical Software,
forthcoming.