Bayesian Tricks – J. Grady Heller

Context

I am not sure I have found an authoritative text on how Epidemiologists like myself would practically apply Bayesian models. There are a number good of books on Bayesian analysis, and I have mentioned some of these books in other places, but it the focus is not quite what I would prefer. As an epidemiologist, I don’t just want to know about how to set up a hierarchical model; it would be so much more helpful if there were stronger discussion of how this works in research applications. Or, imagine a basic epidemiology text that either focuses or just doesn’t exclude the alternative Bayesian methods, e.g. a Bayesian (or Bayesian-Frequentist) Rothman text. Maybe in time, that book will exist.

I want an thorough discussion on applied Bayesian methods in epidemiological studies, not just a few papers, not just books intended to for biostatistics or math classes – I want to know more about how I can do what I do in a way that makes sense to me. What I do is plan studies, provide causal inference, model complex real-world problems in genomics, work with health records, and synthesize evidence from studies. I don’t know of a single book that discusses these to sufficient depth and breadth. I will not be able to cover all that in these notes, but I can point out some instances where I really wish more epidemiologists were instructed on Bayesian applications. In these notes I will discuss the sort of things I discovered in literature after my PhD that I found interesting and that might be good to know.

Power Analyses vs. Planning for Precision

Bayesian analysis do not rely on statistical tests, and so conducting a power analysis does not translate. This does not mean, however, that you cannot plan a study. Instead of estimating a sample size that provides adequate power for a test, one plans for achieving a specific value for precision (this is a specific parameter in Bayesian analysis, often represented by \(\phi\) or \(\tau\), and is equal to \(\frac{1}{\sigma^2}\), where \(\sigma^2\) is the variance). Perhaps more practically, one can extend the idea of precision to plan for a specific width of credibility interval, which is really the goal of planning the sample size.

There are other ways to plan sample sizes for studies when using Bayesian models (e.g. using Bayes’ factors between competing hypotheses), but this is the most intuitive and straightforward way in my opinion, and won’t potentially cause logical or statistical issues. In fact, it is likely the case that this is a very scientific way to plan, since the ultimate goal in Bayesian analysis is to provide the most accurate and precise estimate (of the correct data-generating model). This is also something that can be easily simulated with BRMS so that you can establish a curve similar to what you might see in the power analysis R package simr for frequentist models. I will likely provide a working example in separate notes at a later date.

Likelihood Ratios as Bayesian Updating

This is not exactly a new trick, and has been used in clinical methods for some time now. But, in case you didn’t know, simple Bayesian applications allow for updated probabilities of a status or event (i.e, essentially smashing likelihood ratios together over many predictors in order to find a probability of a diagnosis). One example that I am familiar with is the estimation of a person’s probability of being prodromal for Parkinson’s Disease. This is a complex probability with a number of factors that can influence it, and something still being researched (both the probability resulting from specific exposures and the nature of prodromal and subclinical Parkinson’s Disease). It’s a great example of taking a series of odds ratios, converting them to likelihood ratios, and then assigning an individual a probability.

This is actually a very simple form of Bayesian updating. It doesn’t carry the uncertainty forward in estimation of its probability, but this method incorporates evidence over multiple points in order to update evidence for or against a hypothesis. For proper updating, combination of whole posterior distributions is expected, but this simpler method can be useful in the case of high confidence in the established likelihood ratio point estimates, or when the estimated probability is used as a quick rule of thumb sort of aid.

Bayesian Networks and Joint Distributions

When conducting causal research, Bayesian methods provide some teeth to Directed Acyclic Graphs that might be useful on occasion. If, for any process you can draw a DAG and you know the full joint distribution for the system, this is an interesting case. The DAG essentially becomes a tool for simulating effects when the system changes (e.g. new clinical observations), in any direction in the graph.

If you have read Judea Pearl’s “Causality,” then you already knew what I was going to say, but it seems like those in my field really tend to lean more towards the Rubins Causal Model or the approaches championed by Miguel Hernan. All methods I have seen are useful and informative when properly applied, but I feel like many in the field may not even know that you can use a DAG quantitatively, not just qualitatively - and it can be useful for decisions and examining system states and perturbations when combined with Bayes’ theorem. In fact, I might suggest that this is the philosophically “correct” way to approach Bayesian Causal Analysis, by taking this approach and simulating effects to understand causal associations, but there is no denying the practical approach of using propensity scores and the g-formula, nor the catch of having to know the full joint distribution for a sufficiently complete causal model.

Speeding Up Bayesian Analyses with Approximation (INLA)

Usually in classes focusing on Bayesian analysis, I seem to see that what people teach is probably the use of Stan or WINBUGS, which work… but also there are methods that don’t rely on Markov Chain Monte Carlo estimation algorithms. If you want to skip all of that entirely, you could consider the Integrated Nested Laplace Approximator (INLA).¹

Conceptualized in 2009, INLA is a method for approximating, rather than estimating posterior values using a math trick with imagined grid or central composite parameter spaces (models are assumed to be operationalized using a Gaussian Markov random field)(Haovard 2017 / inla_review.pdf). It might be something you’ve never heard of, but don’t think it’s crazy talk. By now this method is well established with a good introductory book and versatile R package R-INLA. R-INLA can’t do absolutely everything - only generalized linear models, time to event models and survival analysis, multilevel and mixed effects models, splines and smoothing, time series models, and my personal favorite, spatial autocorrelation models using Gaussian lattices. It is highly versatile, easy to implement, and extremely fast.

However, it should be noted that INLA is often approximately as accurate as MCMC estimation methods (Likelihood estimation methods tend to be slightly more accurate in simulations, but may be ingorable depending on the research question), and MCMC methods are essentially infinitely flexible. So, I would advise against thinking of INLA as a replacement for MCMC simulation when learning Bayesian methods.

Accurate Risk Estimation in Case-Control Studies Without Poisson links

This is not super complicated, but is a fun tip to know. The flexibility of estimating the likelihood function in Bayesian models means that you don’t have to redefine the linking function between the dependent and independent variables in a regression model. It’s accurate as-is!

The method of using Poisson-linked logistic regression to have odds ratios that approximated risk in generalized regression models was stressed to me in my graduate classes, but the accuracy of risk to odds ratios using different links is a non-issue for Bayesian models.² So, if you find the issue confusing, just take the Bayesian approach and you will deal with less mental gymnastics.

The T-Test Is Not the Most Efficient Method

I have had researchers, whom I very much respect, advocate for a T-test because it is the best powered test to determine the difference between two groups. Well… That is true if you do not consider the Bayesian approach.

John Kruschke has apparently taken this sort of interaction to heart, and has written an extremely thorough paper demonstrating how a Bayesian comparison of two groups provides better “power” and more meaningful results.³ (The paper does include a spirited defense of Kruschke’s statistical philosophy, but he is quite convincing). For more from Kruschke, also see “Baysian Data Analysis,” an authoritative text on Bayesian statistics.

Conclusions

Bayesian methods are not just the other side of the coin relative to traditionally taught frequentist methods. They have a number of differences that may be misunderstood or underappreciated, but hopefully I have introduced some reader somewhere to some interesting ideas they will follow up with by reading more. Most of these items are just interesting points I’ve found from others’ research.

Additionally, I feel like these caveats of Bayesian methods are underexplored in Epidemiology, and I hope that further work can be done to bring an authoritative text to the field. I feel like one is needed not just to introduce the basics, but to show how in application things are not the same as traditionally taught, and in fact can make life easier and analyses more sensible, and to make these methods accessible to those at all levels so that Bayesian statistical methods aren’t relegated to that one advanced class only certain PhD students take and they just struggle with WINBUGS, and then just leave believing the myth that both approaches are the same just with different assumptions.

References

Rue H, Riebler A, Sørbye SH, Illian JB, Simpson DP, Lindgren FK. Bayesian computing with INLA: A review. Annual Review of Statistics and Its Application. 2017;4:395-421. doi:10.1146/annurev-statistics-060116-054045

Pan CERY, Baughman AL. Bayesian logistic regression modeling as a flexible alternative for estimating adjusted risk ratios in studies with common outcomes. Journal of Biometrics & Biostatistics. 2015;06. doi:10.4172/2155-6180.1000253

Kruschke JK. Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. 2013;142:573-603. doi:10.1037/a0029146