Python and lmfit: How to fit multiple datasets with shared parameters?
Asked Answered
S

2

15

I would like to use the lmfit module to fit a function to a variable number of data-sets, with some shared and some individual parameters.

Here is an example generating Gaussian data, and fitting to each data-set individually:

import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, report_fit

def func_gauss(params, x, data=[]):
    A = params['A'].value
    mu = params['mu'].value
    sigma = params['sigma'].value
    model = A*np.exp(-(x-mu)**2/(2.*sigma**2))

    if data == []:
        return model
    return data-model

x  = np.linspace( -1, 2, 100 )
data = []
for i in np.arange(5):
    params = Parameters()
    params.add( 'A'    , value=np.random.rand() )
    params.add( 'mu'   , value=np.random.rand()+0.1 )
    params.add( 'sigma', value=0.2+np.random.rand()*0.1 )
    data.append(func_gauss(params,x))

plt.figure()
for y in data:
    fit_params = Parameters()
    fit_params.add( 'A'    , value=0.5, min=0, max=1)
    fit_params.add( 'mu'   , value=0.4, min=0, max=1)
    fit_params.add( 'sigma', value=0.4, min=0, max=1)
    minimize(func_gauss, fit_params, args=(x, y))
    report_fit(fit_params)

    y_fit = func_gauss(fit_params,x)
    plt.plot(x,y,'o',x,y_fit,'-')
plt.show()


# ideally I would like to write:
#
# fit_params = Parameters()
# fit_params.add( 'A'    , value=0.5, min=0, max=1)
# fit_params.add( 'mu'   , value=0.4, min=0, max=1)
# fit_params.add( 'sigma', value=0.4, min=0, max=1, shared=True)
# minimize(func_gauss, fit_params, args=(x, data))
#
# or:
#
# fit_params = Parameters()
# fit_params.add( 'A'    , value=0.5, min=0, max=1)
# fit_params.add( 'mu'   , value=0.4, min=0, max=1)
#
# fit_params_shared = Parameters()
# fit_params_shared.add( 'sigma', value=0.4, min=0, max=1)
# call_function(func_gauss, fit_params, fit_params_shared, args=(x, data))
Silsbye answered 2/12, 2013 at 22:33 Comment(0)
S
18

I think you're most of the way there. You need to put the data sets into an array or structure that can be used in a single, global objective function that you give to minimize() and fits all data sets with a single set of Parameters for all the data sets. You can share this set among data sets as you like. Expanding on your example a bit, the code below does work to do a single fit to the 5 different Gaussian functions. For an example of tying parameters across data sets, I used nearly identical value for sigma the 5 datasets the same value. I created 5 different sigma Parameters ('sig_1', 'sig_2', ..., 'sig_5'), but then forced these to have the same values using a mathematical constraint. Thus there are 11 variables in the problem, not 15.

import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, report_fit

def gauss(x, amp, cen, sigma):
    "basic gaussian"
    return amp*np.exp(-(x-cen)**2/(2.*sigma**2))

def gauss_dataset(params, i, x):
    """calc gaussian from params for data set i
    using simple, hardwired naming convention"""
    amp = params['amp_%i' % (i+1)].value
    cen = params['cen_%i' % (i+1)].value
    sig = params['sig_%i' % (i+1)].value
    return gauss(x, amp, cen, sig)

def objective(params, x, data):
    """ calculate total residual for fits to several data sets held
    in a 2-D array, and modeled by Gaussian functions"""
    ndata, nx = data.shape
    resid = 0.0*data[:]
    # make residual per data set
    for i in range(ndata):
        resid[i, :] = data[i, :] - gauss_dataset(params, i, x)
    # now flatten this to a 1D array, as minimize() needs
    return resid.flatten()

# create 5 datasets
x  = np.linspace( -1, 2, 151)
data = []
for i in np.arange(5):
    params = Parameters()
    amp   =  0.60 + 9.50*np.random.rand()
    cen   = -0.20 + 1.20*np.random.rand()
    sig   =  0.25 + 0.03*np.random.rand()
    dat   = gauss(x, amp, cen, sig) + np.random.normal(size=len(x), scale=0.1)
    data.append(dat)

# data has shape (5, 151)
data = np.array(data)
assert(data.shape) == (5, 151)

# create 5 sets of parameters, one per data set
fit_params = Parameters()
for iy, y in enumerate(data):
    fit_params.add( 'amp_%i' % (iy+1), value=0.5, min=0.0,  max=200)
    fit_params.add( 'cen_%i' % (iy+1), value=0.4, min=-2.0,  max=2.0)
    fit_params.add( 'sig_%i' % (iy+1), value=0.3, min=0.01, max=3.0)

# but now constrain all values of sigma to have the same value
# by assigning sig_2, sig_3, .. sig_5 to be equal to sig_1
for iy in (2, 3, 4, 5):
    fit_params['sig_%i' % iy].expr='sig_1'

# run the global fit to all the data sets
result = minimize(objective, fit_params, args=(x, data))
report_fit(result)

# plot the data sets and fits
plt.figure()
for i in range(5):
    y_fit = gauss_dataset(fit_params, i, x)
    plt.plot(x, data[i, :], 'o', x, y_fit, '-')

plt.show()

For what it's worth, I would consider holding the multiple data sets in a dictionary or list of DataSet class instead of a multi-dimensional array. Anyway, I hope this helps get you going onto what you really need to do.

Strepitous answered 3/12, 2013 at 2:28 Comment(7)
This is a great example! It really shows how useful the expr argument for the parameters is.Silsbye
I can't understand something here. When you do a global fit you want to find the best common parameters that fit best to your multiple datasets. With this approach don't you just tell all of the other sigmas to equate to the first? How can the global minimum be found this was?Stifling
Furthermore, in the latest version 0.9 of lmfit the parameters object is not changed but copied. So in order for this script to work properly you have to change in the last lines: minimize(objective, fit_params, args=(x, data)) --> result = minimize(objective, fit_params, args=(x, data)) and where: fit_params --> result.paramsStifling
iasonas: 1. the original question asked how to set all sigmas to have the same value. Sometimes that is exactly what you want. 2.. You are right that the example was for lmfit prior to version 0.9. I changed the example.Lucillalucille
Actually, I tried to edit the original answer to correct the documented and changed behavior of the API between 0.8 and 0.9 that was pointed out in a recent comment. Being the lead author of the software in question, I would think this would be acceptable. It was rejected by three SO reviewers who would rather keep an incorrect answer than have an answer updated to reflect a changed API. Sometimes SO seems to prefer disinformation....Lucillalucille
Just replaced result.fit_params by fit_params (result.fit_params not working under python 3.8 and lmfit==1.0.1)Ghassan
@Ghassan thanks for pointing out that this old example was incorrect. But using report_fit(fit_params) will just write out the input values for the parameters. The better thing to do would be to use report_fit(result.params), or better yet report_fit(result) which will report the best-fit parameters and give fit statistics too.Lucillalucille
T
1

I've used simple approach: define a function firs n( = cargsnum) of arguments is common for all data set's other is individual {

def likelihood_common(var, xlist, ylist, mlist, cargsnum):
    cvars = var[:cargsnum]
    iargnum = [model.func_code.co_argcount - 1 - cargsnum for model in mlist]
    argpos = [cargsnum,] + list(np.cumsum(iargnum[:-1]) + cargsnum)
    args = [list(cvars) + list(var[pos:pos+iarg]) for pos, iarg in zip(argpos, iargnum)]
    res = [likelihood(*arg) for arg in zip(args, xlist, ylist, mlist)]
    return np.sum(res)

} here supposed that each data set have the same weight. The issue I faced in this approach is weary low computation speed and instability in case of large number of fitted parameters and data sets.

Tmesis answered 11/2, 2014 at 18:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.