MC-Stan on Spark?
Asked Answered
Z

1

5

I hope to use MC-Stan on Spark, but it seems there is no related page searched by Google.

I wonder if this approach is even possible on Spark, therefore I would appreciate if someone let me know.

Moreover, I also wonder what is the widely-used approach to use MCMC on Spark. I heard Scala is widely used, but I need some language that has a decent MCMC library such as MC-Stan.

Zitella answered 8/4, 2016 at 11:30 Comment(1)
Maybe rstan and sparklyr::spark_apply is your best choiceComposition
P
13

Yes it's certainly possible but requires a bit more work. Stan (and popular MCMC tools that I know of) are not designed to be run in a distributed setting, via Spark or otherwise. In general, distributed MCMC is an area of active research. For a recent review, I'd recommend section 4 of Patterns of Scalable Bayesian Inference (PoFSBI). There are multiple possible ways you might want to split up a big MCMC computation but I think one of the more straightforward ways would be splitting up the data and running an off-the-shelf tool like Stan, with the same model, on each partition. Each model will produce a subposterior which can be reduce'd together to form a posterior. PoFSBI discusses several ways of combining such subposteriors.

I've put together a very rough proof of concept using pyspark and pystan (python is the common language with the most Stan and Spark support). It's a rough and limited implementation of the weighted-average consensus algorithm in PoFSBI, running on the tiny 8-schools dataset. I don't think this example would be practically very useful but it should provide some idea of what might be necessary to run Stan as a Spark program: partition data, run stan on each partition, combine the subposteriors.

Plenish answered 19/7, 2016 at 22:9 Comment(1)
I thought this was interesting so I started adapting this into a rough library: github.com/strongh/starkPlenish

© 2022 - 2024 — McMap. All rights reserved.