Christian Stratowa
Distributed Storage and Analysis of Microarray Data in the Terabyte Range:
An Alternative to BioConductor
**************************************************************************

Novel high-throughput technologies such as DNA microarray analyses are allowing
biologists to generate sets of data in the terabyte realm. Many of these data
will be deposited in the public domain, necessitating a common standard.
Currently available database systems are not appropriate for these intentions.

In this paper, I will introduce ROOT (http://root.cern.ch), an object-oriented
framework that has been developed at CERN for distributed data warehousing and
data mining of particle data in the petabyte range. Data are stored as sets of
objects in machine-independent files, and specialized methods are used to get
direct access to separate attributes of selected data objects. ROOT has been
designed in such a way that it can query its databases in parallel on SMP/MPP
machines, on clusters of PC's, or using common GRID services.

In order to demonstrate the applicability of ROOT to microarray data, I will
present a functional prototype system, called XPS - eXpression Profiling System,
which can be considered to be an alternative to the Bioconductor project. The
current implementation handles the storage of Affymetrix GeneChip schemes and
data, and the pre-processing, normalization and filtering of GeneChip data.
Based on this system, I will propose a novel standard for the distributed storage
of microarray data.

Finally, I will emphasize the similarities between R and ROOT, and show how R
could be easily extended to access ROOT from within R.