[SciPy-dev] Should ndimage.measurements.* should return lists if index is a list?

josef.pktd@gmai... josef.pktd@gmai...
Mon May 4 15:43:04 CDT 2009


On Mon, May 4, 2009 at 4:00 PM, Thouis (Ray) Jones <thouis@broad.mit.edu> wrote:
> On Fri, May 1, 2009 at 16:11,  <josef.pktd@gmail.com> wrote:
>> After a recent comment by Anne, I looked at the weights option in np.bincount.
>> It is as fast to calculate group/label means with np.bincount as with
>> the current ndimage and it scales the same, but it works only for all
>> indices and not for min, max.
>> weights can be calculated for any element wise function.
>> group labels can be anything that np.unique1d can handle, string
>> labels take twice as long
>
> I've attached a rewrite of mean() using this method.  It's still about
> 5-10 times slower than the existing ndimage code.  Perhaps I'm missing
> some obvious optimization.
>
> Ray Jones

The use case, I wrote my application for, also allowed for labels that
are strings. The return inverse of unique1d is very flexible since it
can handle many different types including structured arrays, but for
numeric labels, I just found out, that it is a large speed penalty. If
the labels are already integers or can be easily converted to
integers, then not using unique1d to create the integer labels should
be much faster.

In my timing comparison, I had compared both bincount and ndimage.mean
including unique1d.
bincount should be able to work directly with integer labels, which is
also the restriction with the current (broken) ndimage.measurement.
However, I haven't tried yet what the best way is to use bincount for
float or fixed decimal without using unique1d.

Josef


More information about the Scipy-dev mailing list