Francois Michonneau1 and Tracy K. Teal2
1. University of Florida
2. Data Carpentry
Keywords: Reproducible Research, RMarkdown, Open Science, Data Carpentry, Github
The goal of this tutorial is to teach participants the skills and perspectives needed to conduct open research with R, including literate programming and best practices in sharing and publishing code and data. We will focus on how to make the work navigable, interpretable, and repeatable by others. This tutorial is modified from the Data Carpentry “Reproducible Research with R” workshop, and we will focus on an exemplar project to model the workflow of going from code and data to a published, archived product.
This tutorial is a module of the Data Carpentry “Reproducible Research with R” workshop, and we will focus on an exemplar project to model the workflow of going from code and data to a published product.
The objectives (in bold) and outline for the workshop are:
Lessons:
As an increasing number of people are using R to conduct research, it is possible to share and publish this code and data to enable reproducible research and advance research progress. Additionally there is increasing acceptance and interest in ‘open’ research practices, where code and data are shared as a part of the ongoing or final research product. In Data Carpentry surveys of the skills scientists are looking to learn, people with some R experience are most interested in learning how to distribute, share and publish their code and data using RMarkdown. In particular, they are interested in learning about literate programming, sharing code through github, and publishing code and data in repositories like FigShare. These skills and perspectives are necessary for open research and are clear next steps in learning for best practices and effective research R. We expect that many participants at UseR! will be looking for opportunities to learn these skills.
The active learning style of Data Carpentry workshops has been shown to be effective in teaching people the skills and perspectives needed to work with data. The vast majority (>90%) of participants responding to post-workshop surveys say that participating in the workshop was worth their time and led to improvements in their data management and data analysis skills. Additionally 94% of participants agree or strongly agree that they would recommend a workshop to a friend or colleague (reference: [1])
In most domains of research & industry there is an increasing capacity to generate data and an increasing knowledge of how to analyze this data using programming languages such as R. With this increasing capacity, is the potential to more effectively communicate, share and publish both the results of these analyses and the complete workflow for open research. However, many people do not yet have the skills or know best practices to effectively share and disseminate research products.
This tutorial is for people who are looking to move towards open research practices and want to learn the next steps in how to structure code and data, share these research products and publish this for broader dissemination, reuse and attribution. In this tutorial participants will learn how to generate documents with literate programming, share them via github, use appropriate licenses, and publish data and code research products. They will learn how to write RMarkdown, share code via github, and publish and track data and code research products.
Workshop participants should have a familiarity with working in R.
Many people in academia or in other organizations are being required to, or see the value in sharing and publishing their research products and process. Potential attendees are people who are conducting work in R and are interested in learning how to share and publish their code and data for open and reproducible research.
Tracy Teal is a co-founder and the Executive Director of Data Carpentry. She has taught over 30 workshops teaching people to work more effectively with data and helped develop curriculum on Reproducible Research and domain-specific curriculum for working with ecology data, genomic data, and geospatial data.
Francois Michonneau is a post-doctoral researcher at the University of Florida interested in using and teaching about best practices in Reproducible Research. He has taught over 15 workshops and developed a semester-long course to help people learn how to use R to work with their data.
1 Jordan, Kari. (2016). Data Carpentry Assessment Report: Analysis of Post-Workshop Survey Results. Zenodo. doi: 10.5281/zenodo.165858