[SciPy-dev] Mpfit optimizer
Wed Aug 19 18:09:41 CDT 2009
I received an old python version of the mpfit optimizer earlier this summer.
My advisor had previously been in contact with SciPy about including this
optimizer in the optimize package. He was sent to following message with
tips to help improve the package:
> ---------- Forwarded message ----------
From: Pauli Virtanen <firstname.lastname@example.org>
Date: Sat, May 9, 2009 at 11:55 AM
Subject: Re: [SciPy-dev] adding mpfit to scipy optimize (Please respond)
> Fri, 08 May 2009 16:09:35 -0400, william ratcliff wrote:
> > Hi! For a long time, there has been a standard package used by IDL
> users for fitting functions to data called MPFIT:
> However, I see some points where the code could be improved for better
reusability and maintainability.
> If we lived in an ideal world with infinite time to polish everything,
I'd like to see all of the points below addressed before landing this to
Scipy. But since this would be lots of error-prone work, it's probably
best to try to reach some compromise.
> Given these constraints, I'd still like to see at least the coding style
and error handling fixed (which probably are not too difficult to
change), in addition to having better test coverage. The rest could come
later, even if we accrue yet more technical debt with this...
> First, broad maintenance concerns:
> - We already have `leastsq` from MINPACK. Having two MINPACK-derived
least squares fitting routines is not good.
> So, I'd perhaps like to see the `leastsq` core part extracted out of
MPFIT, and the MPFIT interface implemented on top of it as a thin
wrapper, or the other way around.
> Maybe, if the modifications made on MINPACK are small, they can be
backported to the Fortran code and MPFIT can be reimplemented on top
> Any thoughts on this?
> - What is the performance of the Python implementation as compared to the
Fortran code? For small data sets, the Python code is probably much
slower, but for large datasets is the performance is comparable?
> - Much of the code is Fortran written in Python: long routines,
goto-equivalents, 6-letter variable names.
> Good commenting offset this, though.
> Then more specific points of refactoring:
> - The code implements QR factorization with column pivoting from scratch.
> Perhaps some of this could be replaced with suitable LAPACK routines,
or with stuff from scipy.linalg. (Cf. DGEQPF, DGEQP3)
> I'm not sure whether there's something suitable for qrsolve in LAPACK;
the triangular system solver could be replaced with DTRTRS.
> Anyway, it might be useful to refactor qrfac and qrsolve out of MPFIT;
there may be other applications when it's useful to be able to solve
||(A + I lambda) x - b||_2 = min! efficiently for multiple different
`lambda` in sequence.
> - fdjac2 should probably be refactored out of MPFIT; other optimization
algorithms that need Jacobians computed by forward differences can then
use it. Do we already have something similar in Scipy already?
> - `enorm` should be replaced with BLAS xNRM2; it does appropriate scaling
automatically and is probably much faster.
> enorm = scipy.lib.blas.get_blas_funcs(['nrm2'], [some_array])
> - The long mpfit.__init__ routine should be split into smaller parts,
the function body is too long. I'm not sure exactly what parts, though.
> Perhaps at least the covariance matrix computation and input argument
parsing should be separated.
> - "self.errmsg = 'ERROR ...; return'".
Probably should raise exceptions instead, at least for errors in input
> In general, I think the error handling should make better use of the
Python exception and warning system; using `print` is not a correct
way to signal an error in library code.
> - I'm not sure about implementing everything in a class. This tends to
couple tightly parts of the code that wouldn't really need such a strong
> - Does it work with complex-valued inputs?
> Numpy issues:
> - Use numpy.finfo instead of machar
> - Lots of nonzero/put/take combos, probably dates from Numeric.
> I think these should be replaced with boolean array indexing, to
enhance readability and performance.
> - numpy.min/max -> amin/amax
> Minor stylistic issues in the code:
> - `catch_msg` is not used for anything
> - The original fortran documentation could be moved into the method
> - "if foo:", not "if (foo):"
> - if foo:
> if foo: bar
> - "return foo", not "return(foo)"
> - "# comment", not "## comment"
> - It's not necessary to "import types"; you can just use
> I have written a nose-test compatible test. Could someone look at it
> and tell me if it meets the scipy style before I continue adding more
> tests from Craig's test-suite for the C version of his program?
> The test style is OK; if it works with nosetests, it should be OK.
Suggest using "import numpy as np", though, to stay in line with the rest
of the code.
Scipy-dev mailing list
We cleaned up most of these issues and would like SciPy to reconsider the
package. Here is a short list of things we feel we accomplished:
- Minor stylic issues cleaned up
- Numpy issues (still a lot of nonzero calls but no more put/take combos)
- There is an approx_jacobian that uses a forward diff method in
scipy.optimize.fmin_slsq. We were not sure whether we should still factor
out our jacobian approximation. We also just checked for jacobian
approximators in SciPy - we haven't checked to see which is better, cleaner,
- enorm fixed
- long init routine split in half
- Exceptions included (although a lot of them to show specific errors - this
can be changed)
Things that still need to be fixed:
- Complex numbers. We didn't have this in mind in during implementation and
still haven't thoroughly tested.
- Still written as a class. What is the best alternative? Just a simple
- QR from scratch.
- Performance testing vs. Fortran code.
Please let us know if there is anymore work you would like done on this
optimizer. We use it for our own applications and it would be nice to have
it included in SciPy for others to use. Thanks for your time.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Scipy-dev