[Numpy-discussion] Enum/Factor NEP (now with code)

Dag Sverre Seljebotn d.s.seljebotn@astro.uio...
Thu Jun 14 05:12:53 CDT 2012


On 06/14/2012 12:06 AM, Bryan Van de Ven wrote:
> On 6/13/12 1:12 PM, Nathaniel Smith wrote:
>> your-branch's-base-master but not in your-repo's-master are new stuff
>> that you did on your branch. Solution is just to do
>>     git push<your github remote name>   master
>
> Fixed, thanks.
>
>> Yes, of course we *could* write the code to implement these "open"
>> dtypes, and then write the documentation, examples, tutorials, etc. to
>> help people work around their limitations. Or, we could just implement
>> np.fromfile properly, which would require no workarounds and take less
>> code to boot.
>>
>> [snip]
>> So would a proper implementation of np.fromfile that normalized the
>> level ordering.
>
> My understanding of the impetus for the open type was sensitivity to the
> performance of having to make two passes over large text datasets. We'll
> have to get more feedback from users here and input from Travis, I think.

Can't you just build up the file using uint8, collecting enum values in 
a separate dict, and then recast the array with the final enum in the end?

Or, recast the array with a new enum type every time one wants to add an 
enum value? (Similar to how you append to a tuple...)

(Yes, normalizing level ordering requires another pass through the 
parsed data array, but that's unavoidable and rather orthogonal to 
whether one has an open enum dtype API or not.)

A mutable dtype gives me the creeps. dtypes currently implements 
__hash__ and __eq__ and can be used as dict keys, which I think is very 
valuable. Making them sometimes mutable would cause a confusing 
situations. There are cases for mutable objects that become immutable, 
but it should be very well motivated as it makes for a much more 
confusing API...

Dag


More information about the NumPy-Discussion mailing list