Universiteit Leiden

nl en

Missing Data

This course focuses on the treatment of missing data. Participants will learn how to handle missing data using multiple imputation for basic statistical analyses in several software packages.

Target group
PhD candidate
Teacher
Joost van Ginkel  (Assistant Professor)
Method
Workshop
Combination of lectures and computer assignments.

Deadline enrollment is 8 June 2019.

Description

Missing data are a very common problem in social sciences. By default, many statistical software packages remove any case with missing data from the statistical analysis of interest. This method is also known as listwise deletion. Besides listwise deletion resulting in a loss of power, results of the statistical analyses may also be biased if the respondents that are removed from the sample differ systematically from the respondents that remain in the sample.

Multiple imputation is an alternative way to deal with missing data which does not throw away information. In multiple imputation, plausible estimates are filled in for the missing data several times, resulting in several plausible complete versions of the incomplete datasets. These several datasets are then analyzed using the statistical analysis that the researcher is interested in, and the results of these several analyses are combined into one overall analysis.

In this course, multiple imputation is explained at a theoretical level. Participants will practice with multiple imputation using empirical datasets, apply statistical analyses to these multiply imputed datasets, and practice with combining the results. Finally, participants will apply their acquired skills to a dataset with missing values of their own.

 

Prerequisites

Basic proficiency in R is a requirement: participants are advised to have taken the course ‘Introduction to R’, or should otherwise make sure that their R programming skills are at a similar level. 

Please bring your laptop to the course with R and SPSS installed. Also, bring an empirical dataset with missing data from your own research.

Reading list

  • Van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton: Chapman & Hall/CRC Press. Especially of importance are Chapter 1 and Chapter 2, pp. 31-35. (there is an online version on the internet).
  • Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 219–242. doi: 10.1177/0962280206074463. You can skip sections 6 and 7.
  • Van Ginkel, J.R. & Kroonenberg, P.M. (2014). Analysis of variance of multiply imputed data. Multivariate Behavioral Research, 49, 78-91.
  • Van Ginkel, J. R. (2016). MI-MUL2.pdf [Software manual].
  • Van Ginkel, J.R. (2019). Significance tests and estimates for R2 for multiple regression in multiply imputed datasets: a cautionary note on earlier findings, and alternative solutions. Multivariate Behavioral Research.

Assessment method

Active participation.

This website uses cookies. Read more