I have written a C++ program that simulates a certain process I'm studying. It outputs discrete "states" each timestep of the simulation. For example:
a
b
c
b
c
b
would be the output of a simulation run with a as the initial condition (set by me or randomly generated) and b & c would be the states the system keeps oscillating between.
I would like to combine many of these runs into a Markov chain, so that it turns into a graph with the following vertices and edges. (Preferably at runtime, because saving the output first takes a lot of diskspace.) The number between the parentheses indicate the number of times a certain vertex or edge was encountered, so this should also be stored.
Vertices: a(1), b(3) and c(2).
Edges: a->b(1), b->c(2), c->b(2).
The real states contain 112 bits of information and I'm generating billions of these transitions. The problem is that I haven't found a graph library or program to generate the Markov chain efficiently and fast. I have been toying around with:
- Google sparse hash to construct my own graph class in C++.
- Neo4J (I was just getting started with this one)
- Lemon library
I just completed the "Google sparse hash graph", but it turns out to get real slow halfway into the runs. After about a day (memory usage goes above 20 GB, not a problem in itself, because there is way more), it slows down and takes about three weeks to complete.
I have access to computers with 12 or 16 cores and 256 or 512 GB of memory, and my feeling is they should be up for the job.
Since I'm not a trained programmer and I code quite slowly, I'm looking for some information before I spent a lot of time working on another imperfect solution.
- What would be the best program/library that can quickly accept large numbers of vertices and edges to construct the Markov chain?
- Is the slowness a result of using the wrong tools or imperfect coding (which I suspect) or am I simply trying to do something that will always take a lot of time?
I hope I was able to make my issue clear. Thanks in advance for any wisdom or answers.
EDIT:
Based on the questions and answers in the comments I guess my question should have been: what is a suitable fast matrix library for C++?
a
is a non-recurrent state. do you want to generate a different markov chain? or to generate a graph out of an output of an instance of this markov chain? the definition of the markov chain here is a bit unclear. – PepperandsaltNxN
matrix. – PepperandsaltNxN
array in memory for counting, or is the number of states too big for that? – Pepperandsalt