Bayesian network in Python: both construction and sampling
Asked Answered
D

5

6

For a project, I need to create synthetic categorical data containing specific dependencies between the attributes. This can be done by sampling from a pre-defined Bayesian Network. After some exploration on the internet, I found that Pomegranate is a good package for Bayesian Networks, however - as far as I'm concerned - it seems unpossible to sample from such a pre-defined Bayesian Network. As an example, model.sample() raises a NotImplementedError (despite this solution says so).

Does anyone know if there exists a library which provides a good interface for the construction and sampling of/from a Bayesian network?

Dialectic answered 29/11, 2019 at 15:15 Comment(3)
Are you willing to: 1) switch languages or 2) implement sampling yourself?Argyll
Please note that questions asking for recommendations are usually off-topic here (see the help center). The first part is okay though. I don't know the answer, maybe the Pomegranate package isn't that mature so far.Photocompose
@Argyll I am looking for a library that provides a good interface for defining a Bayesian Network from which I can then sample to obtain a synthetic data-set.Dialectic
D
1

I found out that PyAgrum (https://agrum.gitlab.io/pages/pyagrum.html) does the job. It can both be used to create a Bayesian Network via the BayesNet() class and to sample from such a network by using the .drawSamples() method from the a BNDatabaseGenerator() class.

Dialectic answered 2/12, 2019 at 14:47 Comment(0)
C
5

Using pyAgrum, you just have to :

#import pyAgrum
import pyAgrum as gum

# create a BN
bn=gum.fastBN("A->B[3]<-C{yes|No}->D")
# specify some CPTs (randomly filled by fastBN)
bn.cpt("A").fillWith([0.3,0.7])

# and then generate a database
gum.generateCSV(bn,"sample.csv",1000,with_labels=True,random_order=False) 
# which returns the LL(database)

the code in a notebook

See http://webia.lip6.fr/~phw/aGrUM/docs/last/notebooks/ for more notebooks using pyAgrum

Disclaimer: I am one of the authors of pyAgrum :-)

Conduit answered 3/1, 2020 at 16:7 Comment(1)
Yes I found out about this. Pretty cool that you're responding because I yesterday put a reference to pyAgrum (containing your name) in my paper, as I'm using PyAgrum for many things, but mostly for inference in BN's with soft evidence!Dialectic
S
4

Another option is pgmpy which is a Python library for learning (structure and parameter) and inference (statistical and causal) in Bayesian Networks.

You can generate forward and rejection samples as a Pandas dataframe or numpy recarray.

The following code generates 20 forward samples from the Bayesian network "diff -> grade <- intel" as recarray.

from pgmpy.models.BayesianModel import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.sampling import BayesianModelSampling

student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])

cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ['intel', 'diff'], [2, 2])

student.add_cpds(cpd_d, cpd_i, cpd_g)
inference = BayesianModelSampling(student)
df_samples = inference.forward_sample(size=20, return_type='recarray')

print(df_samples)
Stereochrome answered 20/7, 2020 at 11:50 Comment(0)
D
1

I found out that PyAgrum (https://agrum.gitlab.io/pages/pyagrum.html) does the job. It can both be used to create a Bayesian Network via the BayesNet() class and to sample from such a network by using the .drawSamples() method from the a BNDatabaseGenerator() class.

Dialectic answered 2/12, 2019 at 14:47 Comment(0)
V
1

Another option is Bayespy (https://www.bayespy.org/index.html). You build the network using nodes. And on every node, you can call random() which essentially samples from its distribution: https://www.bayespy.org/dev_api/generated/generated/bayespy.inference.vmp.nodes.stochastic.Stochastic.random.html#bayespy.inference.vmp.nodes.stochastic.Stochastic.random

Veracity answered 2/12, 2019 at 14:55 Comment(0)
J
1

I was also searching for a library in python to work with bayesian networks learning, sampling, inference and I found bnlearn. I tried a couple of examples and it worked. It is possible to import several existing repositories or any .bif type. As per this library,

Sampling of data is based on forward sampling from joint distribution of the Bayesian network. In order to do that, it requires as input a DAG connected with CPDs. It is also possible to create a DAG manually (see create DAG section) or load an existing one

Journalism answered 3/11, 2020 at 21:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.