Keywords: data-wrangling, high-performance, big-data, in-memory
data.table is one of the fastest open-source in-memory data manipulation packages available today. The package’s syntax has a learning curve, but once internalised through proper understanding of its philosophy it just clicks! First released to CRAN by Matt Dowle in 2006, it continues to grow in popularity. Over 300 CRAN and Bioconductor packages now import or depend on data.table. Its StackOverflow tag has attracted ~5,000 questions from users in many fields making it a top 3 asked about R package. It is the 8th most starred R package on GitHub.
This three hour tutorial will start with basic queries and go all the way to advanced topics. At the beginning, you will be asked to solve a few (commonly occuring) data manipulation tasks (of varying complexity) using your favourite package in R, or even your favourite programming language (~10-15 min). After a short discussion, we will proceed towards learning data.table (see Outline). You will be asked to solve a few exercises after each section to internalise each concept. We will finally come back to the tasks from the start, but use data.table this time.
freadDT[i, j, by] (or for those familiar with SQL: DT[where, select|update, group by])by = .EACHIfwriteFamiliarity with base R and/or SQL is useful but is not absolutely essential.
In base R, good understanding of list data structure is a plus. In particular manipulating lists with mapply, lapply, Map, Reduce etc.
You will need your laptop with the latest version of R and latest stable (CRAN) version of data.table already installed.
Homepage: http://r-datatable.com
Vignettes: https://github.com/Rdatatable/data.table/wiki/Getting-started
Articles: https://github.com/Rdatatable/data.table/wiki/Articles
Matt Dowle (main author, @MattDowle), Arun Srinivasan (co-author, @arun_sriniv) and plenty other contributors.