Title: Efficient pattern matching on very large strings using
       the Biostrings package

Author: Hervé Pagès

Biostrings is an R package that provides an efficient infrastructure
for searching for patterns in strings of hundreds of millions of
letters. The package implements the BString class that allows a
single string to be stored in a way similar to the raw type (byte
array) but with the important difference that the data are not copied
on object duplication or substring extraction. The matchPattern
function implements fast algorithms for matching patterns against a
BString object. The BStringViews class allows compact storage
of a set of views on the same BString object, typically the
matches returned by the matchPattern function.

In addition to the general purpose BString and BStringViews classes,
the package also provides biology-oriented BString subclasses
like DNAString, for storing a DNA sequence, or AAString, for storing
a sequence of amino acids.

We discuss the implementation of the BString class and compare R's
standard pattern matching tools to those provided by Biostrings by
searching for thousands of short patterns in the fly genome.