[Numpy-discussion] New functions.
Wed Jun 1 10:59:52 CDT 2011
Short-circuiting find would be nice. Right now, to 'find' something you first
make a bool array, then iterate over it. If all you want is the first index
where x[i] = e, not very efficient.
What I just described is a find with a '==' predicate. Not sure if it's
worthwhile to consider other predicates.
Maybe call it 'find_first'
Mark Miller wrote:
> I'd love to see something like a "count_unique" function included. The
> numpy.unique function is handy, but it can be a little awkward to
> efficiently go back and get counts of each unique value after the
> On Wed, Jun 1, 2011 at 8:17 AM, Keith Goodman <firstname.lastname@example.org> wrote:
>> On Tue, May 31, 2011 at 8:41 PM, Charles R Harris
>> <email@example.com> wrote:
>>> On Tue, May 31, 2011 at 8:50 PM, Bruce Southey <firstname.lastname@example.org> wrote:
>>>> How about including all or some of Keith's Bottleneck package?
>>>> He has tried to include some of the discussed functions and tried to
>>>> make them very fast.
>>> I don't think they are sufficiently general as they are limited to 2
>>> dimensions. However, I think the moving filters should go into scipy, either
>>> in ndimage or maybe signals. Some of the others we can still speed of
>>> significantly, for instance nanmedian, by using the new functionality in
>>> numpy, i.e., numpy sort has worked with nans for a while now. It looks like
>>> call overhead dominates the nanmax times for small arrays and this might
>>> improve if the ufunc machinery is cleaned up a bit more, I don't know how
>>> far Mark got with that.
>> Currently Bottleneck accelerates 1d, 2d, and 3d input. Anything else
>> falls back to a slower, non-cython version of the function. The same
>> goes for int32, int64, float32, float64.
>> It should not be difficult to extend to higher nd and more dtypes
>> since everything is generated from template. The problem is that there
>> would be a LOT of cython auto-generated C code since there is a
>> separate function for each ndim, dtype, axis combination.
>> Each of the ndim, dtype, axis functions currently has its own copy of
>> the algorithm (such as median). Pulling that out and reusing it should
>> save a lot of trees by reducing the auto-generated C code size.
>> I recently added a partsort and argpartsort.
>> NumPy-Discussion mailing list
More information about the NumPy-Discussion