Statistical Analysis of Computer Models using R

Tutorial: Statistical Analysis of Computer Models using R

Rui Paulo , ISEG, Technical University of Lisbon, Portugal (rui@iseg.utl.pt)
Jesús Palomo , Universidad Rey Juan Carlos, Spain (jesus.palomo@urjc.es)

Overview

A 'computer model' is a computer implementation of a complex mathematical model of a real phenomenon. The purpose of building such model is typically to be able to study the phenomenon through computer, rather than physical, experimentation, and that raises some questions that are inherently statistical. The present tutorial introduces statistical methodology, which is implemented in an R package called 'SAVE', designed to address the problems of emulation, calibration and validation of computer models.

Goals

Outline

Computer models are often computationally very intensive, and that precludes their direct evaluation in tasks like optimization and Markov chain Monte Carlo algorithms. We address the problem of constructing an emulator of a computer model, which is a fast approximation to its output and associated measure of uncertainty, using Gaussian process response surface methodology;
Computer models typically involve two types of inputs: controllable and calibration. Calibration inputs are usually associated with quantities that are unknown and need to be estimated from experimental data if one wishes to utilize the computer model as a surrogate for physical experimentation. We describe how experimental data can be used in conjunction with a statistical model relating real process and computer model to effectively obtain estimates of calibration parameters;
Ultimately, practitioners want to ascertain how effective is the computer model as a surrogate for reality, a task that is often referred to as ‘validation.’ We approach this problem by producing computer model-based estimates of reality along with an associated measure of uncertainty -- the so-called tolerance bars.

The plan for the tutorial is to start by describing the methodology (for the most part contained in Bayarri et al. 2007, Technometrics), and to simultaneously discuss the contents of the 'SAVE' package. We will then provide hands-on experience on the package with the aid of real examples. Attendees are encouraged to bring their own problems to the tutorial. The methodology is Bayesian, so some familiarity with that approach to Statistics is recommended.

Prerequisites

The methodology is Bayesian, so some familiarity with that approach to Statistics is recommended. Familiarity with the terminology of computer models is also recommended: see Bayarri et al. 2007, Technometrics, and references therein.

Intended Audience

Practitioners interested in analyzing specific computer models or classes of computer models of their area of expertise
Statisticians interested in the general area of analysis of computer experiments

Workshop Materials

The SAVE package is available from CRAN:
http://cran.r-project.org/web/packages/SAVE/index.html.