Tutorial: Statistical Analysis of Computer Models using R
Rui Paulo
, ISEG, Technical University of Lisbon, Portugal (rui@iseg.utl.pt)
Jesús Palomo
, Universidad Rey Juan Carlos, Spain (jesus.palomo@urjc.es)
Overview
A 'computer model' is a computer implementation of a complex mathematical model of a real phenomenon. The purpose of building such model
is typically to be able to study the phenomenon through computer, rather
than physical, experimentation, and that raises some questions that are inherently statistical. The present tutorial introduces statistical methodology,
which is implemented in an R package called 'SAVE', designed to address
the problems of emulation, calibration and validation of computer models.
Goals
Outline
-
Computer models are often computationally very intensive, and that
precludes their direct evaluation in tasks like optimization and Markov
chain Monte Carlo algorithms. We address the problem of constructing
an emulator of a computer model, which is a fast approximation to its
output and associated measure of uncertainty, using Gaussian process
response surface methodology;
- Computer models typically involve two types of inputs: controllable
and calibration. Calibration inputs are usually associated with quantities that are unknown and need to be estimated from experimental data
if one wishes to utilize the computer model as a surrogate for physical
experimentation. We describe how experimental data can be used in
conjunction with a statistical model relating real process and computer
model to effectively obtain estimates of calibration parameters;
-
Ultimately, practitioners want to ascertain how effective is the computer model as a surrogate for reality, a task that is often referred
to as ‘validation.’ We approach this problem by producing computer
model-based estimates of reality along with an associated measure of
uncertainty -- the so-called tolerance bars.
The plan for the tutorial is to start by describing the methodology (for
the most part contained in Bayarri et al. 2007, Technometrics), and to simultaneously discuss the contents of the 'SAVE' package. We will then provide
hands-on experience on the package with the aid of real examples. Attendees
are encouraged to bring their own problems to the tutorial. The methodology
is Bayesian, so some familiarity with that approach to Statistics is recommended.
Prerequisites
The methodology is Bayesian, so some familiarity with that approach to Statistics is recommended. Familiarity with the terminology of computer models is also recommended: see Bayarri et al. 2007, Technometrics, and references therein.
Intended Audience
-
Practitioners interested in analyzing specific computer models or classes of computer models of their area of expertise
- Statisticians interested in the general area of analysis of computer experiments
Workshop Materials
The SAVE package is available from CRAN:
http://cran.r-project.org/web/packages/SAVE/index.html.
Related Links
http://cran.r-project.org/web/packages/SAVE/index.html.