[SciPy-Dev] adding chkfinite flags to linalg functions
Christopher Jordan-Squire
cjordan1@uw....
Fri Aug 26 15:13:47 CDT 2011
On Fri, Aug 26, 2011 at 1:45 PM, Bruce Southey <bsouthey@gmail.com> wrote:
> On 08/26/2011 12:02 PM, Christopher Jordan-Squire wrote:
>> On Fri, Aug 26, 2011 at 10:38 AM, Bruce Southey<bsouthey@gmail.com> wrote:
>>> On 08/26/2011 09:40 AM, Nathaniel Smith wrote:
>>>
>>> Would calling it check_finite instead be too horrible?
>>>
>>> - Nathaniel
>>>
>>> On Aug 26, 2011 2:47 AM, "Ralf Gommers"<ralf.gommers@googlemail.com> wrote:
>>>> Hi,
>>>>
>>>> https://github.com/scipy/scipy/pull/48 adds a chkfinite flag to
>>>> scipy.linalg
>>>> functions that allows for disabling a check on whether there are infs/nans
>>>> in the array, which can be a speed gain if you already know that there's
>>>> no
>>>> infs/nans in your input array.
>>>>
>>>> Is anyone opposed to merging this?
>>>>
>>>> Ralf
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>>> I do oppose this because I do not think it is the correct solution to an
>>> apparent issue of computational speed. I do say apparent because there is no
>>> code and no timings to support it.
>>>
>> Some timing data is at the pull request link. (I was comparing the
>> then-current chkfinite against a modification Ralf suggested.)
>>
>> "I tried that (calling it chkfinite_mod) on an ndarray A of shape
>> (10000,500). (The extreme size was mostly to make the differences
>> clear. The issues in my application had been appearing when A was
>> (3000,2).) The results of timeit are as follows.
>>
>> np.asarray_chkfinite(A): 153 ms
>> chkfinite_mod(A): 59.3 ms
>> np.asarray(A): 1.9 us
>>
>> So asarray_chkfinite is 75,000 times as slow, and chkfinite_mod is
>> 30,000 times as slow. That's enough of a change that I'm going to make
>> the small modification to numpy and make a pull request, but it still
>> isn't big enough.
>
> There is no example or timing code so this is meaningless.
> Also 'asarray_chkfinite' is numpy not scipy so it is not relevant here.
> Also how many different computers have you tested this on that gave this
> difference? Hint one is not enough.
>
Please don't be rude.
I cut and pasted the text from the github pull request, where
asarray_chkfinite was part of a short exchange with Ralf.
I added code to the pull request, but it simple translates the above
description directly into code. I wasn't trying to hide any fanciness
or massage numbers.
>> To give a feel for the sort of issue I'm thinking of, let's say that A
>> and b are ndarrays of shape (100000,5) and b is (100000,). Then
>> currently a call to lstsqr produces (using prun in ipython) that the
>> entire solve takes 0.060s, and 0.017 seconds is spent in
>> asarray_chkfinite. That's a huge amount of time wasted verifying the
>> input, proportionally, so I'd really like orders of magnitude
>> reduction in the time taken to pass the input on to the solver.
>> Especially since I want to use these things in situations where this
>> solver is being called thousands of times over and over on data that's
>> already been verified as finite. "
> I do understand the time issue but that is your very specific
> situation. You have to provide the sample code that replicates this as
> are using it might be just your environment and computer.
>
>>> The proposed argument is really not a check but to avoid something that has
>>> to be done in order to obtain a valid outcome. Hence a possible reason why
>>> the original author put this check in these functions. If we want to avoid
>>> this then we need to change the workflow by breaking this into two steps
>>> rather than this approach.
>>>
>> The check's aren't something that must be done to obtain a valid
>> outcome. They're a way to ensure the input is valid. If you know for
>> other reasons that your input is valid then it's a waste of resources.
> Yes, I do understand that but I do not think that you are the average
> scipy user nor is your situation typical of scipy.
I think it's a situation anyone with long, thin matrices will run
into, particularly if they're iterating over many replications of some
computation.
The situation is typical for fitting statistical models with a small
number of parameters on a large dataset. This is the context where I
ran into the problem. Specifically when I was fitting a linear model
repeatedly on large subsamples of a very large dataset. Both the
general problem and that particular instance seemed like common and
generic use cases.
>> Breaking it into two steps has its own issues associated with it as
>> well. Then the user must verify their own inputs before passing them
>> to scipy linalg methods. If I understand your suggestion correctly.
> That is exactly what your patch is doing because people will misuse that
> argument just because they perceive it as faster.
>
If it works then the user wasn't misusing it. If it fails then the
user made the wrong choice because they didn't understand their
algorithm. I don't think I understand your point.
>> Why would that be a better approach?
> I currently have no problems with the scipy approach as what you
> indicate below regarding lapack.
>>> All this patch does is say if you want to go 'fast' then set
>>> 'chkfinite=False' regardless of type of input and fails to address any
>>> issue. We quite often see on numpy/scipy lists where a user has made
>>> incorrect assumptions about their input which is will not catch or catch at
>>> the end.
>>>
>>> I do not know if Chris provided any tickets for the issues he pointed out
>>> in:
>>> http://mail.scipy.org/pipermail/scipy-dev/2011-July/016386.html
>>> "But if you pass a numpy array with inf's in it, then it hangs."
>>>
>>> This should have a ticket with an clear easy example because it highlights a
>>> large problem.
>>>
>> It has to do with the underlying Lapack call, if IIRC. Not something
>> in numpy/scipy.
> So what?
> File a bug report so people know it is a problem and can fix it!
> If it is in Lapack then it needs to be pushed upstream. So in the
> meantime we do have a work around it. But this patch permits users to
> incorrectly remove that work around.
>
>>> I also agree with Nathaniel that the name of the argument is bad but
>>> check_finite would not solve my issue with the name.
>>>
>> chkfinite was used because of the underlying numpy function being
>> bypassed. I don't have a strong preference, but that does make it
>> easier to relate the option to the actual code.
> That is poor reason for a name as only someone who knows the code will
> understand the meaning - users don't read documentation :-)
>
But users do use google. And google will recognize chkfinite faster
than check_finite. Besides, the default is chkfinite=True, so they'd
have to read the documentation to know what to change.
-Chris JS
>>> Yes, I am being negative because this is a community and it needs to be
>>> shown that this patch will improve scipy as whole.
>>>
>>> Bruce
>>>
>>>
>
> Bruce
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
More information about the SciPy-Dev
mailing list