[Numpy-discussion] Enum type

Ognen Duzlevski ognen@enthought....
Tue Jan 3 16:34:42 CST 2012


Nathaniel,

On Tue, Jan 3, 2012 at 2:02 PM, Nathaniel Smith <njs@pobox.com> wrote:
> On Tue, Jan 3, 2012 at 9:46 AM, Ognen Duzlevski <ognen@enthought.com> wrote:
>> Hello,
>>
>> I am playing with adding an enum dtype to numpy (to get my feet wet in
>> numpy really). I have looked at the
>> https://github.com/martinling/numpy_quaternion and I feel comfortable
>> with my understanding of adding a simple type to numpy in technical
>> terms.
>
> Hi Ognen,
>
> I'm in the middle of an intercontinental move, so I can't help much,
> but I'd also love to see a proper enum/categorical type in numpy, so
> here are a few notes:
>
> - I wrote a simple cython implementation of this last year, which
> might be useful -- code attached.
>
> - The barrier I ran into, which you'll surely run into as well, is a
> flaw in the ufunc API in numpy. Currently, ufunc inner loops do not
> have any way to access the dtype of the array they are being called
> on. For most dtypes, this isn't an issue -- the inner loop for adding
> together int32's knows that it is being called on an array of int32's,
> it doesn't need to see the dtype to figure that out. But with enums,
> each array has a different set of possible categories, and these will
> be attached to the dtype object somehow. So if you want to do, say,
> equality comparison between an enum-array and a string-array:
>  np.enumarray(["a"", "b", "c"]) == ["a", "c", "b"] -> np.array([True,
> False, True])
> ...you can't actually make this work in current numpy. The solution is
> that the ufunc API needs to be changed to make dtype's somehow
> available to inner loops. (Probably by passing a pointer to the array
> object, like all the PyArray_ArrFuncs do.)
>
> See this thread:
> http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html
>
> - Both the statistical folk (pandas, statsmodels) and the hdf5 folk
> (pytables, h5py) have reasons to want better enum support. (Maybe
> there are other use cases too -- anyone I'm forgetting?) You should
> make sure to talk to both groups to make sure what you come up with
> will work for them.
>
> Cheers,
> -- Nathaniel

Thanks! The above input is exactly what I was looking for (in addition
to my original question). This "corner case" knowledge is good to have
;)
Ognen


More information about the NumPy-Discussion mailing list