useR! 2010: Interval censored data analysis

Tutorial: Interval censored data analysis

Michael P. Fay, National Institute of Allergy and Infectious Diseases (NIAID), USA

Abstract

Interval censored data analysis is important in biomedical statistics for any type of time-to-event response where the time of response is not known exactly but only known to occur between two assessment times, one before the event occurred and one after the event occurred. Some examples are:

Standard survival methods (e.g., Kaplan-Meier curves, logrank tests, accelerated failure time regression models) must be modified to properly account for the interval censoring. For example, naively imputing the failure time as the mid-point of the interval and performing the usual logrank test for right-censored data can lead to large type 1 errors. This topic is relevant for the R users conference because for some important methods for this type of data, the only readily available software is implemented in R packages. The goal of this tutorial is to show why these interval censored data methods are needed and useful, and to show that some of the methods are easily performed in R.

Outline

Topics will include:

Types of interval censoring (non-informative vs. informative; Case 1, Case 2, Case k)
Nonparametric Maximum likelihood estimation (NPMLE) of the Survival Curve
- Right censored case (Kaplan-Meier). Graphical description of Efron's redistribution to the right algorithm.
- Interval censored case. Graphical description of Turnbull's self-consistent algorithm. (To give intuition on the NPMLE).
- Calculation of NPMLE in R
  - survfit in survival package, including review of Surv function and different types of censoring
  - interval package
  - Icens package and its algorithms.
Testing the difference between two groups
- Why we usually use rank tests for time-to-event responses
- Basic permutation tests
- Generalizing the Wilcoxon-Mann-Whitney test for survival data
- Likelihoods for interval censored data.
  - Marginal likelihood of the ranks
  - Grouped continuous model
- Weighted logrank tests as score tests on semiparametric models
  - Logrank test (two versions)/ Proportional Hazards
  - Wilcoxon test/ Proportional odds
- Multiple imputation
- Why midpoint imputation can give bad type I errors
- What if the inspection process is different between treatment groups
- Overview of type I error problems and different rank tests
- Weighted logrank tests in R using interval package
  - Choosing model/score
  - Choosing method
Regression
- Parametric models (accelerated failure time models)
  - Examples using survival R package

Potential attendees

Potential attendees are those who analyze interval censored data or plan clinical trials with endpoints of that type.

Required knowledge

Minimal knowledge of R is required. The tutorial will assume that participants have been exposed previously to standard right-censored data analysis methods (Kaplan-Meier curves, logrank tests, etc.) although an in-depth knowledge of those methods is not necessary.

Tutorial Materials

Slides are here.