Revisiting Array languages for Statistical Computing

L.Fraser Jackson

The array oriented language APL was popular with many statisticians after 
its development in the 1960's with books by several well known statisticians 
and probabilists and numerous collections of functions.  It had an influence 
on several languages including some ideas included in S.  There are now a 
number of array oriented languages which have exploited many developments in 
computing since that time.  There are some problems for which they are 
especially adapted and they have developed extremely efficient methods of 
handling arrays with elements of a wide range of types and powerful tools 
for exploiting alternative ways of extracting elements from arrays for use 
in calculations.  Many generalised forms of tensor product are available. 
These languages also have elegant functional and object oriented programming 
tools and some of them are designed to interface easily with other tools. A 
feature is that usually a single piece of code will handle arrays of all 
ranks and shape from empty arrays to multidimensional arrays and for some 
problems that is very valuable.

This paper will describe aspects of using the array language J which 
provides many advances on the structure of APL. In addition to the features 
noted it has excellent sparse array tools. It is easily interfaced with R 
using the RCom interface so both R and J can act as servers for each other. 
We illustrate features of the language with a range of functions for 
processing contingency tables, the interface with R and some new approaches 
to contingency tables which enable exploration of the structure of the large 
tables generated from national data collections such as a census.

Ways in which the array theory and array manipulation model of these 
languages and the elegant tools for functional composition can contribute to 
development of languages for statistical computing will be explored.  R 
users may find it provides very powerful tools complementing the tool set of 
R especially in data manipulation.