June 27 - June 30 2016
Stanford University, Stanford, California
The materials used in the tutorial are available here.
Data analysts can use the Git version control system to manage a motley assortment of project files in a sane way (e.g., data, code, reports, etc.). This has benefits for the solo analyst and, especially, for anyone who wants to communicate and collaborate with others. Git helps you organize your project over time and across different people and computers. Hosting services like GitHub, Bitbucket, and GitLab provide a home for your Git-based projects on the internet.
What's special about using R and Git(Hub)?
The tutorial will be structured as ~5 task-oriented units. Indicative topics:
This will be a hands-on tutorial, so bring your prepared laptop and pre-register a free GitHub account (see below).
This tutorial will teach novices about Git on a strict "need to know" basis. Git was built to manage development of the Linux kernel, which is probably very different from what you do. Most people need a small subset of Git's functionality and that will be our focus. If you want a full-blown exposition of Git as a directed acyclic graph or a treatise on the Git-Flow branching strategy, you will be sad.
Our target audience is someone who uses R to analyze data. While R package development with Git(Hub) is absolutely in scope, it's not an explicit focus or requirement.
We target GitHub - not Bitbucket or GitLab - for the sake of specificity. However, all the big-picture principles and even some mechanics will carry over to these alternative hosting platforms.
The tutorial is aimed at intermediate to advanced R users, who are comfortable writing R scripts and managing R projects. You should have a good grasp of files and directories and be generally knowledgeable about where things live on your computer.
Although we will show alternatives for most Git operations, we will inevitably spend some time in the shell and we assume some prior experience. For example, you should know how to open up a shell, navigate to a certain directory, and list the files there. You should be comfortable using shell commands to view/move/rename files and to work with your command history.
R Markdown or RStudio will feature prominently in most of the units, so this tutorial will be most rewarding for people who already use these or are eager to try them out.
Preparation instructions can be found here.
It is vital that you attempt to set up your system in advance. You cannot show up at 9am with no preparation and keep up! These are battle-tested instructions, so most will succeed, but it could easily take 1 - 2 hours. We believe in you! We will have TAs in the room starting at 8:15am and throughout the workshop.
Jenny Bryan (twitter, GitHub) is a professor at the University of British Columbia. She's been using and teaching R (or S!) for 20 years, most recently in STAT 545 and Software Carpentry. Other aspects of her R life include work with rOpenSci, development of the googlesheets and gapminder packages, and being academic director for UBC's Masters of Data Science.
Dean Attali and Bernhard Konrad will be teaching assistants. They both have experience in teaching this material (and much more) in STAT 545 and Software Carpentry. Added bonus: they know how to use Windows.