I have experimental data of the form (X,Y)
and a theoretical model of the form (x(t;*params),y(t;*params))
where t
is a physical (but unobservable) variable, and *params
are the parameters that I want to determine. t
is a continuous variable, and there is a 1:1 relationship between x
and t
and between y
and t
in the model.
In a perfect world, I would know the value of T
(the real-world value of the parameter) and would be able to do an extremely basic least-squares fit to find the values of *params
. (Note that I am not trying to "connect" the values of x
and y
in my plot, like in 31243002 or 31464345.) I cannot guarantee that in my real data, the latent value T
is monotonic, as my data is collected across multiple cycles.
I'm not very experienced doing curve fitting manually, and have to use extremely crude methods without easy access to a basic scipy function. My basic approach involves:
- Choose some value of
*params
and apply it to the model - Take an array of
t
values and put it into the model to create an array ofmodel(*params) = (x(*params),y(*params))
- Interpolate
X
(the data values) intomodel
to getY_predicted
- Run a least-squares (or other) comparison between
Y
andY_predicted
- Do it again for a new set of
*params
- Eventually, choose the best values for
*params
There are several obvious problems with this approach.
1) I'm not experienced enough with coding to develop a very good "do it again" other than "try everything in the solution space," of maybe "try everything in a coarse grid" and then "try everything again in a slightly finer grid in the hotspots of the coarse grid." I tried doing MCMC methods, but I never found any optimum values, largely because of problem 2
2) Steps 2-4 are super inefficient in their own right.
I've tried something like (resembling pseudo-code; the actual functions are made up). There are many minor quibbles that could be made about using broadcasting on A,B, but those are less significant than the problem of needing to interpolate for every single step.
People I know have recommended using some sort of Expectation Maximization algorithm, but I don't know enough about that to code one up from scratch. I'm really hoping there's some awesome scipy (or otherwise open-source) algorithm I haven't been able to find that covers my whole problem, but at this point I am not hopeful.
import numpy as np
import scipy as sci
from scipy import interpolate
X_data
Y_data
def x(t,A,B):
return A**t + B**t
def y(t,A,B):
return A*t + B
def interp(A,B):
ts = np.arange(-10,10,0.1)
xs = x(ts,A,B)
ys = y(ts,A,B)
f = interpolate.interp1d(xs,ys)
return f
N = 101
lsqs = np.recarray((N**2),dtype=float)
count = 0
for i in range(0,N):
A = 0.1*i #checks A between 0 and 10
for j in range(0,N):
B = 10 + 0.1*j #checks B between 10 and 20
f = interp(A,B)
y_fit = f(X_data)
squares = np.sum((y_fit - Y_data)**2)
lsqs[count] = (A,b,squares) #puts the values in place for comparison later
count += 1 #allows us to move to the next cell
i = np.argmin(lsqs[:,2])
A_optimal = lsqs[i][0]
B_optimal = lsqs[i][1]
scipy.interpolate.splprep
– Stutz(X,Y)
data, and not for fitting to find*params
– Abradant