Problem
I have a set of equations with variables denoted with lowercase variables and constants with uppercase variables as such
A = a + b
B = c + d
C = a + b + c + d + e
I'm provided the information as to the structure of these equations in a pandas DataFrame with two columns: Constants and Variables
E.g.
df = pd.DataFrame([['A','a'],['A','b'],['B','c'],['B','d'],['C','a'],['C','b'],
['C','c'],['C','d'],['C','e']],columns=['Constants','Variables'])
I then convert this to a sparse CSC matrix by using NetworkX
table = nx.bipartite.biadjacency_matrix(nx.from_pandas_dataframe(df,'Constants','Variables')
,df.Constants.unique(),df.Variables.unique(),format='csc')
When converted to a dense matrix, table looks like the following
matrix([[1, 1, 0, 0, 0],[0, 0, 1, 1, 0],[1, 1, 1, 1, 1]], dtype=int64)
What I want from here is to find which variables are solvable (in this example, only e is solvable) and for each solvable variable, what constants is its value dependent on (in this case, since e = C-B-A, it is dependent on A, B, and C)
Attempts at Solution
I first tried to use rref to solve for the solvable variables. I used the symbolics library sympy and the function sympy.Matrix.rref, which gave me exactly what I wanted, since any solvable variable would have its own row with almost all zeros and 1 one, which I could check for row by row.
However, this solution was not stable. Primarily, it was exceedingly slow, and didn't make use of the fact that my datasets are likely to be very sparse. Moreover, rref doesn't do too well with floating points. So I decided to move on to another approach motivated by Removing unsolvable equations from an underdetermined system, which suggested using svd
Conveniently, there is a svd function in the scipy.sparse library, namely scipy.sparse.linalg.svds. However, given my lack of linear algebra background, I don't understand the results outputted by running this function on my table, or how to use those results to get what I want.
Further Details in the Problem
- The coefficient of every variable in my problem is 1. This is how the data can be expressed in the two column pandas DataFrame shown earlier
- The vast majority of variables in my actual examples will not be solvable. The goal is to find the few that are solvable
- I'm more than willing to try an alternate approach if it fits the constraints of this problem.
This is my first time posting a question, so I apologize if this doesn't exactly follow guidelines. Please leave constructive criticism but be gentle!