[Numpy-discussion] draft enum NEP

Nathaniel Smith njs@pobox....
Thu Mar 15 18:02:30 CDT 2012


On Wed, Mar 14, 2012 at 1:44 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:
> On Fri, Mar 9, 2012 at 8:55 AM, Bryan Van de Ven <bryanv@continuum.io>
> wrote:
>>
>> Hi all,
>>
>> I have started working on a NEP for adding an enumerated type to NumPy.
>> It is on my GitHub:
>>
>>     https://github.com/bryevdv/numpy/blob/enum/doc/neps/enum.rst
>>
>> It is still very rough, and incomplete in places. But I would like to
>> get feedback sooner rather than later in order to refine it. In
>> particular there are a few questions inline in the document that I would
>> like input on. Any comments, suggestions, questions, concerns, etc. are
>> very welcome.
>
>
> This looks like a great start to me.
>
> I think the open/closed enum distinction will need to be explored a little
> bit more, because it interacts with dtype immutability/hashability. Do you
> know if there are any examples of Python objects in the wild that
> dynamically convert from not being hashable (i.e. raising an exception if
> used as a dict key) to become hashable?

I haven't run into any...

Thinking about it, I'm not sure I have any use case for this type
being mutable. Maybe someone else can think of one? The first case
that came to mind was in reading a large text file, where you want to
(1) auto-create an enum, (2) use a pre-allocated array, and (3) don't
know ahead of time what the levels are:

  a = np.empty(lines_in_file, dtype=np.dtype(Enum()))
  for i, line in enumerate(f):
    field = line.split()[0]
    a.dtype.add_level(field)
    a[i] = field
  a.dtype.seal()

But really this is just can be done just as easily and efficiently
without a mutable dtype:

  a = np.empty(lines_in_file, dtype=np.int32)
  intern_table = {}
  next_level = 0
  for i, line in enumerate(f):
    field = line.split()[0]
    val = intern_table.setdefault(field, next_level)
    if val == next_level:
      next_level += 1
    a[i] = val
  a = a.view(dtype=np.dtype(Enum(map=intern_table)))

I notice that the HDF5 C library has a concept of open versus closed
enums, but I can't tell from the documentation at hand why this is; it
looks like it might just be a limitation of the implementation. (Like,
a workaround for C's lack of a standard mapping type, which makes it
inconvenient to pass in all the mappings in to a single API call.)

> It might be worth adding a section which briefly compares and contrasts the
> proposed functionality with enums in various programming languages. Here are
> two links I found to try and get an idea:
>
> MS on C# enum usage:
> http://msdn.microsoft.com/en-us/library/cc138362.aspx
> Wikipedia on C++ enum class:
> http://en.wikipedia.org/wiki/C%2B%2B11#Strongly_typed_enumerations
>
> For example, the C# enum has a way to enable a "flags" mode, which will
> create successive powers of 2. This may not be a feature NumPy needs, but if
> people are finding it useful in C#, maybe it would be useful here too.

There's also a long, ongoing debate about how to do enums in Python -- e.g.:
  http://www.python.org/dev/peps/pep-0354/
  http://pypi.python.org/pypi/enum/
  http://pypi.python.org/pypi/enum_meta/
  http://pypi.python.org/pypi/flufl.enum/
  http://pypi.python.org/pypi/lazr.enum/
  http://pypi.python.org/pypi/pyutilib.enum/
  http://pypi.python.org/pypi/coding/
  http://stackoverflow.com/questions/36932/whats-the-best-way-to-implement-an-enum-in-python
I guess Guido likes flufl.enum:
  http://mail.python.org/pipermail/python-ideas/2011-July/010909.html

BUT, I'm not sure any of this is relevant at all. "Enums" are a
programming language feature that are, first and foremost, about
injecting names into your code's namespace. What I'm hoping to see is
a dtype for holding categorical data, similar to an R "factor"
  http://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html
  https://svn.r-project.org/R/trunk/src/library/base/R/factor.R (NB:
This is GPL code if anyone is paranoid about contamination, but also
the most complete API description available)
or an HDF5 "enum"
  http://www.hdfgroup.org/HDF5/doc/H5.user/Datatypes.html#Datatypes_Enum
I believe pandas has some functionality along these lines too, though
I can't find it in the online docs -- hopefully Wes will fill us in.

These are basically objects that act for most purposes like string
arrays, but in which all strings are required to come from a finite,
specified list. This list acts like some metadata attached to the
array; it's order may or may not be significant. And they're
implemented internally as integer arrays.

I'm not sure what it would even mean to treat this kind of data as
"flags", since you can't take the bitwise-or of two strings...

-- Nathaniel


More information about the NumPy-Discussion mailing list