[Numpy-discussion] Adding the ability to "clone" a few fields from a data-type

Robert Kern robert.kern@gmail....
Thu Oct 30 10:26:45 CDT 2008


On Thu, Oct 30, 2008 at 04:33, Francesc Alted <faltet@pytables.org> wrote:
> A Thursday 30 October 2008, Robert Kern escrigué:
>> On Wed, Oct 29, 2008 at 19:05, Travis E. Oliphant
>>
>> <oliphant@enthought.com> wrote:
>> > Hi all,
>> >
>> > I'd like to add to NumPy the ability to clone a data-type object so
>> > that only a view fields are copied over but that it retains the
>> > same total size.
>> >
>> > This would allow, for example, the ability to "select out a few
>> > records" from a structured array using
>> >
>> > subarr = arr.view(cloned_dtype)
>> >
>> > Right now, it is hard to do this because you have to at least add a
>> > "dummy" field at the end.  A simple method on the dtype class
>> > (fromfields or something) would be easy to add.
>>
>> I'm not sure what this accomplishes. Would the dummy fields that fill
>> in the space be inaccessible? E.g. tuple(subarr[i,j,k]) gives a tuple
>> with no numpy.void scalars? That would be a novel feature, but I'm
>> not
>>
>> sure it fits the problem. On the contrary:
>> > It was thought in the past to do this with indexing
>> >
>> > arr['field1', 'field2']
>> >
>> > And that would still be possible (and mostly implemented) if this
>> > feature is added.
>>
>> This appears more like the interface that people want. Except that I
>> think people were thinking that it would follow fancy indexing
>> syntax:
>>
>>   arr[['field1', 'field2']]
>
> I've thought about that too.  That would be a great thing to have, IMO.
>
>> I guess there are two ways to implement this. One is to make a new
>> array that just contains the desired fields. Another is to make a
>> view that just points to the desired fields in the original array
>> provided that we have a new feature for inaccessible dummy fields.
>> One point for the former approach is that it is closer to fancy
>> indexing which must always make a copy. The latter approach breaks
>> that connection.
>
> Yeah.  I'd vote for avoid the copy.
>
>> OTOH, now that I think about it, I don't think there is really any
>> coherent way to mix field selection with any other indexing
>> operations. At least, not within the same brackets. Hmm. So maybe the
>> link to fancy indexing can be ignored as, ahem, fanciful.
>
> Well, one can always check that fields in the fancy list are either
> strings (map to name fields) or integers (map to positional fields).
> However, I'm not sure if this check would be too expensive.

That's not my concern. The problem is that the field-indexing applies
to the entire array, not just an axis. So what would the following
mean?

  a[['foo', 'bar'], [1,2,3]]

Compared to

  a[[5,8,10], [1,2,3]]

>> Overall, I guess, I would present the feature slightly differently.
>> Provide a kind of inaccessible and invisible dtype for implementing
>> dummy fields. This is useful in other places like file parsing. At
>> the same time, implement a function that uses this capability to make
>> views with a subset of the fields of a structured array. I'm not sure
>> that people need an API for replacing the fields of a dtype like
>> this.
>
> Mmh, not sure on what you are proposing there.  You mean something like:
>
> In [21]: t = numpy.dtype([('f0','i4'),('f1', 'f8'), ('f2', 'S20')])
>
> In [22]: nt = t.astype(['f2', 'f0'])
>
> In [23]: ra = numpy.zeros(10, dtype=t)
>
> In [24]: nra = ra.view(nt)
>
> In [25]: ra
> Out[25]:
> array([(0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''),
>       (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''),
>       (0, 0.0, ''), (0, 0.0, '')],
>      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '|S20')])
>
> In [26]: nra
> Out[26]:
> array([('', 0), ('', 0), ('', 0), ('', 0), ('', 0), ('', 0), ('', 0),
>       ('', 0), ('', 0), ('', 0)],
>      dtype=[('f2', '|S20'), ('f0', '<i4')])
>
> ?
>
> In that case, that would be a great feature to add.

That's what Travis is proposing. I would like to see a function that
does this (however it is implemented under the covers):

  nra = subset_fields(ra, ['f0', 'f2'])

With the view, I don't think you can reorder the fields as in your example.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Numpy-discussion mailing list