[Numpy-discussion] suggested change of behavior for interp
Charles R Harris
charlesr.harris@gmail....
Wed Jun 5 13:08:45 CDT 2013
On Wed, Jun 5, 2013 at 12:00 PM, Charles R Harris <charlesr.harris@gmail.com
> wrote:
>
>
> On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris <
> charlesr.harris@gmail.com> wrote:
>
>>
>>
>> On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith <njs@pobox.com> wrote:
>>
>>> On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor
>>> <jtaylor.debian@googlemail.com> wrote:
>>> > On 05.06.2013 16:33, Nathaniel Smith wrote:
>>> >> The slow down people are worried about is, suppose that 'xp' has
>>> >> 1,000,000 entries, and the user wants to interpolate 1 point. If we
>>> >> can assume the array is sorted, then we can find which bin the 1 point
>>> >> falls into by using binary search (np.searchsorted), which will
>>> >> require examining ~log2(1,000,000) = 20 entries in the array. Checking
>>> >> that it's sorted, though, will require examining all 1,000,000 points
>>> >> -- it turns an O(log n) operation into an O(n) operation. And if this
>>> >> is being called as part of some larger algorithm, this effect can
>>> >> cascade -- if it gets called m times from a loop, now that's O(mn)
>>> >> instead of (m log n), etc. That's often the difference between
>>> >> tractable and intractable.
>>> >>
>>> >> If you submit a PR that gives interp the option to check for
>>> >> monotonicity, then we'll almost certainly merge it :-). If you also
>>> >> make it the default then there might be some debate, though I'd argue
>>> >> for it...
>>> >
>>> > I would not like it when the performance goes from log to linear by
>>> default.
>>> > It is documented that the coordinates must be increasing after all.
>>>
>>> I agree, I don't like it either. But the problem is there are two
>>> contradictory things and I don't like either of them:
>>> 1) performance going from log to linear by default (for a subset of
>>> somewhat extreme cases)
>>> 2) people silently getting the wrong results (and according to reports
>>> in this thread, the warning in the docs does not actually prevent
>>> this)
>>>
>>> The question is which one do we dislike more. My feeling is that in
>>> the cases where (1) comes up, it will annoy people and get them to
>>> check the docs and find the go-faster switch, while the first warning
>>> of (2) is when your spaceship crashes, so we should worry more about
>>> (2).
>>>
>>> > How about as a compromise the function checks one or two closest
>>> > neighbors around the interpolation point and warns/errors if those are
>>> > not monotonic.
>>> >
>>> > Its not fool proof but should at least catch the very wrong cases with
>>> > an acceptable performance loss.
>>>
>>> There are tons of ways for data to accidentally end up sorted within
>>> each local region, but unsorted overall, e.g., if you're combining
>>> results from parallel simulation runs. Such data would systematically
>>> pass this check, but still give nonsensical answers.
>>>
>>
>> Some actual benchmarks might be useful to the discussion.
>>
>
> And perhaps an inplace C function ismonotone would be generally useful.
>
Or make greater.reduce operate correctly.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130605/5be3a0f6/attachment.html
More information about the NumPy-Discussion
mailing list