[SciPy-Dev] adding chkfinite flags to linalg functions
Fri Aug 26 12:02:28 CDT 2011
On Fri, Aug 26, 2011 at 10:38 AM, Bruce Southey <firstname.lastname@example.org> wrote:
> On 08/26/2011 09:40 AM, Nathaniel Smith wrote:
> Would calling it check_finite instead be too horrible?
> - Nathaniel
> On Aug 26, 2011 2:47 AM, "Ralf Gommers" <email@example.com> wrote:
>> https://github.com/scipy/scipy/pull/48 adds a chkfinite flag to
>> functions that allows for disabling a check on whether there are infs/nans
>> in the array, which can be a speed gain if you already know that there's
>> infs/nans in your input array.
>> Is anyone opposed to merging this?
> SciPy-Dev mailing list
> I do oppose this because I do not think it is the correct solution to an
> apparent issue of computational speed. I do say apparent because there is no
> code and no timings to support it.
Some timing data is at the pull request link. (I was comparing the
then-current chkfinite against a modification Ralf suggested.)
"I tried that (calling it chkfinite_mod) on an ndarray A of shape
(10000,500). (The extreme size was mostly to make the differences
clear. The issues in my application had been appearing when A was
(3000,2).) The results of timeit are as follows.
np.asarray_chkfinite(A): 153 ms
chkfinite_mod(A): 59.3 ms
np.asarray(A): 1.9 us
So asarray_chkfinite is 75,000 times as slow, and chkfinite_mod is
30,000 times as slow. That's enough of a change that I'm going to make
the small modification to numpy and make a pull request, but it still
isn't big enough.
To give a feel for the sort of issue I'm thinking of, let's say that A
and b are ndarrays of shape (100000,5) and b is (100000,). Then
currently a call to lstsqr produces (using prun in ipython) that the
entire solve takes 0.060s, and 0.017 seconds is spent in
asarray_chkfinite. That's a huge amount of time wasted verifying the
input, proportionally, so I'd really like orders of magnitude
reduction in the time taken to pass the input on to the solver.
Especially since I want to use these things in situations where this
solver is being called thousands of times over and over on data that's
already been verified as finite. "
> The proposed argument is really not a check but to avoid something that has
> to be done in order to obtain a valid outcome. Hence a possible reason why
> the original author put this check in these functions. If we want to avoid
> this then we need to change the workflow by breaking this into two steps
> rather than this approach.
The check's aren't something that must be done to obtain a valid
outcome. They're a way to ensure the input is valid. If you know for
other reasons that your input is valid then it's a waste of resources.
Breaking it into two steps has its own issues associated with it as
well. Then the user must verify their own inputs before passing them
to scipy linalg methods. If I understand your suggestion correctly.
Why would that be a better approach?
> All this patch does is say if you want to go 'fast' then set
> 'chkfinite=False' regardless of type of input and fails to address any
> issue. We quite often see on numpy/scipy lists where a user has made
> incorrect assumptions about their input which is will not catch or catch at
> the end.
> I do not know if Chris provided any tickets for the issues he pointed out
> "But if you pass a numpy array with inf's in it, then it hangs."
> This should have a ticket with an clear easy example because it highlights a
> large problem.
It has to do with the underlying Lapack call, if IIRC. Not something
> I also agree with Nathaniel that the name of the argument is bad but
> check_finite would not solve my issue with the name.
chkfinite was used because of the underlying numpy function being
bypassed. I don't have a strong preference, but that does make it
easier to relate the option to the actual code.
> Yes, I am being negative because this is a community and it needs to be
> shown that this patch will improve scipy as whole.
> SciPy-Dev mailing list
More information about the SciPy-Dev