June 27 - June 30 2016
Stanford University, Stanford, California
The materials used in the tutorial are not yet available since Matt's session was a live demo with question and answer sessions. Matt says he is preparing a markdown document that he will send along as soon as he can.
data.table is known for its speed on large data in RAM (e.g. 100GB) but it also has a consistent and flexible syntax for more advanced data manipulation tasks on small data too. First released to CRAN in 2006 it continues to grow in popularity. 180 CRAN and Bioconductor packages now import or depend on data.table. Its StackOverflow tag has attracted 4,000 questions from users in many fields making it a top 3 asked about R package. It is the 7th most starred R package on GitHub.
This three hour tutorial will guide complete beginners from basic queries through to advanced topics via examples you will run on your laptop. There is a short learning curve to data.table but once it clicks it sticks.
fread()
- basic to advanced usage and its convenience features
DT[i, j, by]
DT[where, select|update|do, group by]
setkey()
DT[==,]
and DT[,on=]
roll=TRUE|+n|n
.SD
, .N
and .I
for()
loops again! :=
and set()
by=.EACHI
DT[...][...]
DT[...]
Familiarity with base R and/or SQL is an advantage but not required.
R with the latest CRAN release of data.table installed.
Homepage: https://github.com/Rdatatable/data.table/wiki
Vignettes: https://github.com/Rdatatable/data.table/wiki/Getting-started