Bayesian modeling with PyMC3 and exploratory analysis of Bayesian models with ArviZ
Key Features
A step-by-step guide to conduct Bayesian data analyses using PyMC3 and ArviZ
A modern, practical and computational approach to Bayesian statistical modeling
A tutorial for Bayesian analysis and best practices with the help of sample problems and practice exercises.
Code and figuresYou can find the code and figures in this GitHub repository github.com/aloctavodia/BAP/ You can also use this repository to report any problem you find with the book or code
Book Description The second edition of Bayesian Analysis with Python is an introduction to the main concepts of applied Bayesian inference and its practical implementation in Python using PyMC3, a state-of-the-art probabilistic programming library, and ArviZ a new library for exploratory analysis of Bayesian models.
The main concepts of Bayesian statistics are covered using a practical and computational approach. Synthetic and real data sets are used to introduce several types of models such as generalized linear models for regression and classification, mixture models, hierarchical models and Gaussian process among others.
By the end of the book, you will have a working knowledge of probabilistic modeling and you will be able to design and implement Bayesian models for your own data science problems. After reading the book you will be better prepared to delve into more advance material or specialized statistical modeling in case you need it.
What you will learn
Build probabilistic models using the Python library PyMC3
Analyze probabilistic models with the help of ArviZ
Acquire the skills required to sanity check models and modify them if necessary
Understand the advantages and caveats of hierarchical models
Find out how different models can be used to answer different data analysis questions
Compare models and choose between alternative ones
Discover how different models are unified under a probabilistic perspective
Think probabilistically and benefit from the flexibility of the Bayesian framework
Who This Book Is For
If you are a student, data scientist, researcher in the natural or social sciences, or a developer looking to get started with Bayesian data analysis and probabilistic programming, this book is for you. The book is introductory so no previous statistical knowledge is required, although some experience in using Python and NumPy is expected.
Table of Contents
Thinking Probabilistically
Programming Probabilistically
Modeling with Linear Regression
Generalizing Linear Models
Model Comparison
Mixture Models
Gaussian Processes
Inference Engines
Where to go next?
This is a pretty good hands-on book on using the PyMC3 library in Python to do Bayesian analysis. It also includes some introductory stuff on Bayesian statistics. I would recommend reading it if you want to learn more about Bayesian analysis.
My main takeaway is that PyMC3 (and apparently its intellectual ancestor STAN) are amazing. I had no idea that building Bayesian models was this easy to do. It allows a level of clarity and match of code to modeling concepts that is almost unmatched. The author is also one of the core developers of PyMC3, and this is clearly quite an achievement.
I like how hands-on a lot of the examples are, but the book is sort of weak on theory. Some of the notations used in the equations are poorly explained (or not explained at all), and required visiting Wikipedia to get them. However, after reading the book, I do feel like I have a good grasp of some basics that I had really not gotten in previous attempts to understand Bayesian statistics. In particular, the concept of a conjugate prior now makes a lot of sense, as does the pragmatic process for selecting and updating priors. (The book obviously doesn't solve the problem of choosing priors, but does give helpful starting places, rules of thumb and lots of examples.)
My other criticism is that the book needs more rounds with a good editor. The prose is often clunky: many sentences could be both shorter and clearer. It also has a lot of typos, some of which are pretty confusing. I'd suggest getting the hard copy, and marking it up with the errata by hand (see the errata here).
The conceptual organization should be reworked as well. I found myself repeatedly saying "Wow, how does PyMC3 do any of this?" and "what does this warning output mean?" when running through the examples, but then the underpinnings of how it works are described in Chapter 8. I think the "Inference Engines" chapter should have been much earlier in the book, since it would've let me build more intuition about how inference works or when it might fail.
I'd suggest reading it in the following order: 1. Chapter 1-2 to see introductory examples of how Bayesian modeling works with PyMC3. 2. Chapter 8 prior to the "Diagnosing the samples" section, and also this blog post. 3. Chapter 2 again. 4. Chapters 3-4. 5. The remainder of chapter 8. 6. Chapters 5-7. 7. Possibly chapter 8 again.
I also found it very helpful to read it with a Jupyter notebook open, typing in the code examples (all of which actually work! 😵), along with jotting down notes in Markdown cells. My notebook for this book will probably be handy as a reference for years to come.