[SciPy-User] Vectorizing scipy.optimize.curve_fit

Paul Hobson pmhobson@gmail....
Sun Feb 3 12:29:57 CST 2013


On Sun, Feb 3, 2013 at 4:28 AM, <josef.pktd@gmail.com> wrote:

> On Sat, Feb 2, 2013 at 9:38 PM, Ryan Nelson <rnelsonchem@gmail.com> wrote:
> > Hi Paul,
> >
> > I've played around with this for scipy.optimize.leastsq, and I've never
> been
> > successful. I'm not an expert, but I don't think the original code was
> set
> > up for that type of usage. (Just speculating here... An equally likely
> > explanation is that I'm not smart enough to figure it out :)
> >
> > What I've found really useful for this type of problem is IPython's new
> > parallel architecture (IPython version 0.13.1). It's easy to set up an
> > IPython cluster and get things running in parallel through the notebook
> > interface, so I've attached a simple notebook that does a trivial fitting
> > (using leastsq) of some noisy data. Before you run the notebook, you'll
> need
> > to set up a cluster through the Cluster tab in the IPython notebook
> > dashboard.
> >
> > Unfortunately, in my experience I've found that the speed improvement is
> > only noticeable until the number of IPython engines in your cluster
> equals
> > the number of cores not the number of processor threads. (This might be
> > obvious for those in the know, but it took me a while to figure out.) But
> > every little bit of improvement helps, I suppose.
> >
> > Ryan
> >
> >
> > On 2/1/2013 6:07 PM, Paul Hobson wrote:
> >
> > Hey folks,
> >
> > I've run  into a bit of a roadblock. I've got some model runs (x) in an
> Nx2
> > array where the first column is the input, and the second column is the
> > output. So in a single case, I'd do:
> >
> > popt, pcov = scipy.optimize.curve_fit(myFit, x[:,0], x[:,1])
> >
> > But how should I handle, say, 5000 model runs such that x.shape = (500,
> N,
> > 2) and I want the 5000 results for popt?
> >
> > This works:
> > popt_array = np.empty(5000, 2)
> > for r, layer in enumerate(model_runs):
> >     popt, pcov = scipy.optimize.curve_fit(myFit, layer[:,0] layer[:,1])
> >     popt_array[r] = popt
> >
> > But is there a better (faster way)? The number of model runs and data
> points
> > may grow sustatially (~10^4 runs and 10^3 data points).
>
> I think there is no efficient way to avoid the loop.
>
> besides going parallel:
>
> If you are only interested in popt, then using optimize.leastsq
> directly will save some calculations.
>
> The other major speedup for this kind of problems is to find good
> starting values. For example, if the solutions in the sequence are
> close to each other, then using previous solutions as starting values
> for the next case will speed up convergence.
> IIRC, in statsmodels we use an average of previous solutions, after
> some initial warm-up, in a similar problem. A moving average could
> also be useful.
>
> Josef
>
> >
> > Thanks,
> > -paul
>
>
Thanks for the advice, Ryan and Josef. I've been meaning to get a feel of
for IPython Parallel for some time now, so this will be a good impetus to
do so.

Also, Josef, the suggestion about using optimize.leastsq with an initial
guess, will be a very good one, I think. A better description of what I'm
doing is actually bootstrapping the regression parameters to find their 95%
confidence intervals. So it make a lot sense, really, to get the initial
guess at the parameters from the full data set for use in the bootstrap
iterations.

Cheers,
-paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130203/e58d11b6/attachment.html 


More information about the SciPy-User mailing list