Interacting with local and remote data repositories using the
'stashR' package for R

Sandrah P. Eckel and Roger D. Peng

Department of Biostatistics, Johns Hopkins Bloomberg School of
Public Health

The 'stashR' package (a Set of Tools for Administering
SHared Repositories) for R implements a simple key-value style
database where character string keys are associated with data
values. The key-value databases can be either stored locally on the
user's computer or accessed remotely via the Internet. Methods
specific to the 'stashR' package allow users to share data
repositories or access previously created remote data repositories.
In particular, methods are available for the S4 classes "localDB"
and "remoteDB" to insert, retrieve, or delete data from the
database as well as to synchronize local copies of the data to the
remote version of the database. Users efficiently access
information from a remote database by retrieving only the data
files indexed by user-specified keys and caching this data in a
local copy of the remote database. The local and remote
counterparts of the 'stashR' package offer the potential to enhance
reproducible research by allowing users of 'Sweave' to cache their
R computations for a research paper in a "localDB" database. This
database can then be stored on the Internet as a "remoteDB"
database. When readers of the research paper wish to reproduce the
computations involved in creating a specific figure or calculating
a specific numeric value, they can access the "remoteDB" database
and obtain the R objects involved in the computation.