This book teaches the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducibility is the idea that data analyses should be published or made available with their data and software code so that others may verify the findings and build upon them. The need for reproducible report writing is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This book will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
This ebook is a particularly useful guide for researchers of all kinds, with strong emphasis on how to systematically create reproducible reports. Reproducibility is a major key to effectively communicating our analyses and research findings to others so that anyone else can re-run or examine specific steps in our investigations to validate our results. And since nowadays most people have access to substantial computing power, free open-source statistical softwares, and cloud-based data infrastructures (e.g. Github), creating reproducible reports has never been easier. The author also provides a general introduction to R markdown using RStudio, which has become one of the most popular tools for creating reproducible reports.