Saturday, March 8, 2014

Ideal book for self-study: "Doing Bayesian Data Analysis"

In this post, I'd like to heartily recommend a book for anyone doing self-study who doesn't have much statistics or math in their background:
This book is head-and-shoulders better than the others I've seen.  I'm using it myself right now.  Here's what's good about it:
  • It builds from very simple foundations.
  • Math is minimized.  No proofs.
  • From start to finish, everything is demonstrated through R programs. Anyone learning statistics today should be learning a statistics programming language at the same time.  R is the most popular choice and by some measures the best choice.
  • It helps you learn Empirical Bayesian methods from every angle.  It does great both with the fundamental concepts and the practical applications.
  • It takes you as far as you want to go, at least into advanced territory if you want.  But you don't have to read the whole textbook to benefit.
For what it's worth, this book was voted most popular introductory book on Stack Exchange.
Empirical Bayesian methods are an attractive alternative to "Null Hypothesis Significance Testing" and related methods (linear regression, logistic regression, Analysis of Variance or ANOVA).  If you don't know anything about either one, don't worry.  This book does a great job of introducing you to both and explaining why Bayesian methods are better (i.e. yield more robust/reliable results) -- see Chapters 10 and 11.

Empirical Bayesian methods have only become practical for ordinary people in the last 10 years or so thanks to a computational method called Markov Chain Monte Carlo that is built into software like BUGS (Windows), and also JAGS and more recently STAN (both are cross-platform).  Even today, most undergraduate and graduate statistics courses teach the older methods of Null Hypothesis Significance Testing because they don't require computational methods if the data fit certain assumptions (e.g. normal distributions).  Here's a 14 minute video of Dr. Kruschke explaining a Bayesian alternative to the Student t-test.  (This isn't an introductory video, so if you are new to hypothesis testing, this may not make much sense to you.)


Instead of mastering math and proofs, learning Empirical Bayesian methods is mostly about mastering programming and tuning the MCMC software package.  This should be more comfortable for InfoSec people who usually have some coding skills. Because the results are immediately visualized via plots, its relatively easy to see what is happening at each step in the learning process. The book is full of really good graphs and other visualizations.

I call this "a book for self-study" because I couldn't find an on-line course or MOOC that uses it as a text. That's a crying shame because it would be awesome if there was an on-line course. I only found this course at Indiana University.  He does present 3-4 day seminars at various places, too.

Last thing: don't confuse Empirical Bayesian methods with "subjective probability" (i.e. degrees of belief). Though Bayesian methods are used extensively in cognitive theories based on subjective probability, this book doesn't get into that territory.

4 comments:

  1. For folks interested in picking up this text, otherwise known as the 'puppies' text, do yourself a favor and surf to the author's website (http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/) and grab the revised sample code. Kruschke has converted all the code in the text from the largely deprecated BUGS to JAGS, which will be much more useful for carrying these concepts over to your production data.

    ReplyDelete
    Replies
    1. Great comment, David. Yes, JAGS is the preferred MCMC package now, and also necessary for Mac and Linux users anyway. STAN is even more capable, but it would take further translation from the author's code. Possible, but for people getting into this for the first time, you are best to stick with JAGS.

      Delete
  2. Any opinion on the Johns Hopkins Data Science specialization program offered through Coursera?

    ReplyDelete
    Replies
    1. I don't have an opinion on that particular program. Looking at the web site isn't much help since I don't see a syllabus for each course. I don't know what text books they are using, if any. The "Statistical Inference" course is only 4 weeks and they don't say whether they teach traditional methods or Bayesian methods. Maybe if you emailed the instructors you could find out.

      Delete