Keywords: data-wrangling, high-performance, big-data, in-memory
data.table
is one of the fastest open-source in-memory data manipulation packages available today. The package’s syntax has a learning curve, but once internalised through proper understanding of its philosophy it just clicks! First released to CRAN by Matt Dowle in 2006, it continues to grow in popularity. Over 300 CRAN
and Bioconductor
packages now import or depend on data.table. Its StackOverflow
tag has attracted ~5,000 questions from users in many fields making it a top 3 asked about R package. It is the 8th most starred R package on GitHub
.
This three hour tutorial will start with basic queries and go all the way to advanced topics. At the beginning, you will be asked to solve a few (commonly occuring) data manipulation tasks (of varying complexity) using your favourite package in R, or even your favourite programming language (~10-15 min). After a short discussion, we will proceed towards learning data.table (see Outline
). You will be asked to solve a few exercises after each section to internalise each concept. We will finally come back to the tasks from the start, but use data.table this time.
fread
DT[i, j, by]
(or for those familiar with SQL: DT[where, select|update, group by]
)by = .EACHI
fwrite
Familiarity with base R and/or SQL is useful but is not absolutely essential.
In base R, good understanding of list
data structure is a plus. In particular manipulating lists with mapply
, lapply
, Map
, Reduce
etc.
You will need your laptop with the latest version of R and latest stable (CRAN) version of data.table already installed.
Homepage: http://r-datatable.com
Vignettes: https://github.com/Rdatatable/data.table/wiki/Getting-started
Articles: https://github.com/Rdatatable/data.table/wiki/Articles
Matt Dowle (main author, @MattDowle), Arun Srinivasan (co-author, @arun_sriniv) and plenty other contributors.