June 27 - June 30 2016
Stanford University, Stanford, California
The materials used in the tutorial are available here.
This hands-on tutorial introduces the Jupyter notebook project (previously called IPython) in the context of programming in R for reproducible research and knowledge transfer. After this tutorial an attendee will have insights into why and how one would use Jupyter notebooks for research, training and/or instruction.
Because notebooks: 1) have “live” code, which is interactive/modifiable; 2) allow for easy collaboration on a notebook system; and 3) serve as a scratchpad for both code and notes, it can be argued that Jupyter notebooks fill an important gap between resources associated with technical projects and a way to communicate and share those resources.
The Jupyter project had its inception over 15 years ago. We’ll provide some context behind the project, then and now, and how it has evolved into its current form: a browser-based framework for live coding, a scratchpad, a shareable document, a fun way to teach R (or 50 other languages), et cetera.
We’ll guide attendees through a hands-on lab involving some simple image processing with k-means clustering. The lab will take attendees through what training looks like in a notebook system.
Following the lab, arguments will be presented on when to use or not use notebooks with R over Rmarkdown. As an example, reproducible research revolves around being able to recreate an “experiment” and customize it for collaboration or new and novel applications.
A longer hands-on lab will build upon the first lab using k-means clustering for image analysis. The attendees will be encouraged to break out into groups, be creative and come up with ideas for clustering or other forms of image analysis on real-world use-cases and then implement their ideas in notebooks.
There will be a quick mention of both the tools provided by Microsoft for the labs as well as resources for custom local or cloud setups which serve similar purposes. Good practices when using notebooks for reproducible research and training will be presented as well. Along with the tips, there will be a demo on creating slideshows and rendering into impactful static documents.
All components will be provided so no local installs will be necessary. Additional notebooks building on what was learned in the tutorial will be provided along with introductory material for using the R kernel in a Jupyter notebook. All notebooks created by the instructors fall under the MIT License.
By the end of the tutorial attendees will have very practical knowledge on how and when to use notebooks, how to create quality notebooks and the ability to immediately begin leveraging them for any computational project, training event or publication.
This tutorial requires no background knowledge of Jupyter or Python. Knowledge of R for data processing and analysis is highly recommended.
We will provide an online, hosted Jupyter system for this tutorial. If users wish to install their own system, we’ll provide detailed download and installation instructions on the github site.
Andrie de Vries and Micheleen Harris will jointly teach this tutorial.
Andrie de Vries is a programme manager at Microsoft, responsible for
the development of Microsoft R Open and connectivity between R, the
Azure cloud and other Microsoft products, e.g. Excel.
During UseR!2015, Andrie had a tutorial session on RHadoop, a very
popular session with more than 100 people attending.
Andrie is also a regular speaker at R and industry events, including: