[SciPy-Dev] adding chkfinite flags to linalg functions
Fri Aug 26 15:13:47 CDT 2011
On Fri, Aug 26, 2011 at 1:45 PM, Bruce Southey <firstname.lastname@example.org> wrote:
> On 08/26/2011 12:02 PM, Christopher Jordan-Squire wrote:
>> On Fri, Aug 26, 2011 at 10:38 AM, Bruce Southey<email@example.com> wrote:
>>> On 08/26/2011 09:40 AM, Nathaniel Smith wrote:
>>> Would calling it check_finite instead be too horrible?
>>> - Nathaniel
>>> On Aug 26, 2011 2:47 AM, "Ralf Gommers"<firstname.lastname@example.org> wrote:
>>>> https://github.com/scipy/scipy/pull/48 adds a chkfinite flag to
>>>> functions that allows for disabling a check on whether there are infs/nans
>>>> in the array, which can be a speed gain if you already know that there's
>>>> infs/nans in your input array.
>>>> Is anyone opposed to merging this?
>>> SciPy-Dev mailing list
>>> I do oppose this because I do not think it is the correct solution to an
>>> apparent issue of computational speed. I do say apparent because there is no
>>> code and no timings to support it.
>> Some timing data is at the pull request link. (I was comparing the
>> then-current chkfinite against a modification Ralf suggested.)
>> "I tried that (calling it chkfinite_mod) on an ndarray A of shape
>> (10000,500). (The extreme size was mostly to make the differences
>> clear. The issues in my application had been appearing when A was
>> (3000,2).) The results of timeit are as follows.
>> np.asarray_chkfinite(A): 153 ms
>> chkfinite_mod(A): 59.3 ms
>> np.asarray(A): 1.9 us
>> So asarray_chkfinite is 75,000 times as slow, and chkfinite_mod is
>> 30,000 times as slow. That's enough of a change that I'm going to make
>> the small modification to numpy and make a pull request, but it still
>> isn't big enough.
> There is no example or timing code so this is meaningless.
> Also 'asarray_chkfinite' is numpy not scipy so it is not relevant here.
> Also how many different computers have you tested this on that gave this
> difference? Hint one is not enough.
Please don't be rude.
I cut and pasted the text from the github pull request, where
asarray_chkfinite was part of a short exchange with Ralf.
I added code to the pull request, but it simple translates the above
description directly into code. I wasn't trying to hide any fanciness
or massage numbers.
>> To give a feel for the sort of issue I'm thinking of, let's say that A
>> and b are ndarrays of shape (100000,5) and b is (100000,). Then
>> currently a call to lstsqr produces (using prun in ipython) that the
>> entire solve takes 0.060s, and 0.017 seconds is spent in
>> asarray_chkfinite. That's a huge amount of time wasted verifying the
>> input, proportionally, so I'd really like orders of magnitude
>> reduction in the time taken to pass the input on to the solver.
>> Especially since I want to use these things in situations where this
>> solver is being called thousands of times over and over on data that's
>> already been verified as finite. "
> I do understand the time issue but that is your very specific
> situation. You have to provide the sample code that replicates this as
> are using it might be just your environment and computer.
>>> The proposed argument is really not a check but to avoid something that has
>>> to be done in order to obtain a valid outcome. Hence a possible reason why
>>> the original author put this check in these functions. If we want to avoid
>>> this then we need to change the workflow by breaking this into two steps
>>> rather than this approach.
>> The check's aren't something that must be done to obtain a valid
>> outcome. They're a way to ensure the input is valid. If you know for
>> other reasons that your input is valid then it's a waste of resources.
> Yes, I do understand that but I do not think that you are the average
> scipy user nor is your situation typical of scipy.
I think it's a situation anyone with long, thin matrices will run
into, particularly if they're iterating over many replications of some
The situation is typical for fitting statistical models with a small
number of parameters on a large dataset. This is the context where I
ran into the problem. Specifically when I was fitting a linear model
repeatedly on large subsamples of a very large dataset. Both the
general problem and that particular instance seemed like common and
generic use cases.
>> Breaking it into two steps has its own issues associated with it as
>> well. Then the user must verify their own inputs before passing them
>> to scipy linalg methods. If I understand your suggestion correctly.
> That is exactly what your patch is doing because people will misuse that
> argument just because they perceive it as faster.
If it works then the user wasn't misusing it. If it fails then the
user made the wrong choice because they didn't understand their
algorithm. I don't think I understand your point.
>> Why would that be a better approach?
> I currently have no problems with the scipy approach as what you
> indicate below regarding lapack.
>>> All this patch does is say if you want to go 'fast' then set
>>> 'chkfinite=False' regardless of type of input and fails to address any
>>> issue. We quite often see on numpy/scipy lists where a user has made
>>> incorrect assumptions about their input which is will not catch or catch at
>>> the end.
>>> I do not know if Chris provided any tickets for the issues he pointed out
>>> "But if you pass a numpy array with inf's in it, then it hangs."
>>> This should have a ticket with an clear easy example because it highlights a
>>> large problem.
>> It has to do with the underlying Lapack call, if IIRC. Not something
>> in numpy/scipy.
> So what?
> File a bug report so people know it is a problem and can fix it!
> If it is in Lapack then it needs to be pushed upstream. So in the
> meantime we do have a work around it. But this patch permits users to
> incorrectly remove that work around.
>>> I also agree with Nathaniel that the name of the argument is bad but
>>> check_finite would not solve my issue with the name.
>> chkfinite was used because of the underlying numpy function being
>> bypassed. I don't have a strong preference, but that does make it
>> easier to relate the option to the actual code.
> That is poor reason for a name as only someone who knows the code will
> understand the meaning - users don't read documentation :-)
But users do use google. And google will recognize chkfinite faster
than check_finite. Besides, the default is chkfinite=True, so they'd
have to read the documentation to know what to change.
>>> Yes, I am being negative because this is a community and it needs to be
>>> shown that this patch will improve scipy as whole.
> SciPy-Dev mailing list
More information about the SciPy-Dev