Open-Source Machine Learning: R Meets Weka
Kurt Hornik, Christian Buchta, Achim Zeileis
********************************************

Weka (http://www.cs.waikato.ac.nz/~ml/weka/) is the leading open-source
project in machine learning. Weka is a comprehensive collection of
machine-learning algorithms for data mining tasks written in Java,
containing tools for data pre-processing, classification, regression,
clustering, association rules, and visualization. For many of these
algorithms Weka provides de facto reference implementations, hence a
variety of additional projects are based on Weka. To enhance the
statistics and statistical learning tool box already available in R by
popular machine learning techniques, an interface from R to Weka would
clearly be desirable.

The R extension package RWeka provides such an interface.  On the
highest level, its main feature are R functions for a variety of
machine-learning algorithms such as tree learners (C4.5/J4.8,
M5', logistic model trees), or popular meta and rule-based learners.
These learners are in fact obtained via interface generators,
i.e., functions which return functions providing interfaces to Weka's
classes. Such generators are available for Weka's "classifiers"
(i.e., regression and classification learners), clusterers, association
learners, and filters. The generated interface functions have enough
meta information to allow for dynamic documentation, in particular for
listing the available Weka control options via WOW, the Weka option
wizard. The R objects obtained by calling the interface functions are
suitably (S3) classed, making it possible to provide general-purpose
prediction methods for Weka's classifiers and clusterers and more
specialized methods for vizualization. Users can easily add interfaces
to additional Weka learners and filters, and add R classes and methods
for the results of applying these interfaces. The low-level interaction
between R and Java is based on package rJava, with only minimal
amounts of Java glue code added for performance enhancements.

We also discuss possible enhancements, which also relate to general issues
arising when interfacing R with other systems: First, too much of the
Weka objects is private and hence basically unavailable for interfacing.
Second, ideally data would be shared between R and Weka in a way that
conversion between the native formats would happen only when needed. And
finally, it would be very valuable to have Weka interfaces to some of
R's functionality.

Keywords: machine learning, statistical learning, R, Weka, Java, interface