[Numpy-discussion] Enum/Factor NEP (now with code)
Dag Sverre Seljebotn
Thu Jun 14 05:12:53 CDT 2012
On 06/14/2012 12:06 AM, Bryan Van de Ven wrote:
> On 6/13/12 1:12 PM, Nathaniel Smith wrote:
>> your-branch's-base-master but not in your-repo's-master are new stuff
>> that you did on your branch. Solution is just to do
>> git push<your github remote name> master
> Fixed, thanks.
>> Yes, of course we *could* write the code to implement these "open"
>> dtypes, and then write the documentation, examples, tutorials, etc. to
>> help people work around their limitations. Or, we could just implement
>> np.fromfile properly, which would require no workarounds and take less
>> code to boot.
>> So would a proper implementation of np.fromfile that normalized the
>> level ordering.
> My understanding of the impetus for the open type was sensitivity to the
> performance of having to make two passes over large text datasets. We'll
> have to get more feedback from users here and input from Travis, I think.
Can't you just build up the file using uint8, collecting enum values in
a separate dict, and then recast the array with the final enum in the end?
Or, recast the array with a new enum type every time one wants to add an
enum value? (Similar to how you append to a tuple...)
(Yes, normalizing level ordering requires another pass through the
parsed data array, but that's unavoidable and rather orthogonal to
whether one has an open enum dtype API or not.)
A mutable dtype gives me the creeps. dtypes currently implements
__hash__ and __eq__ and can be used as dict keys, which I think is very
valuable. Making them sometimes mutable would cause a confusing
situations. There are cases for mutable objects that become immutable,
but it should be very well motivated as it makes for a much more
More information about the NumPy-Discussion