[Numpy-discussion] Zeros in strides
Sasha
ndarray at mac.com
Fri Feb 3 15:03:16 CST 2006
On 2/3/06, Travis Oliphant <oliphant at ee.byu.edu> wrote:
> I'm very concerned about the speed of PyArray_NewFromDescr. So, I
> don't really want to make changes that will cause it to be slower for
> all cases unless absolutely essential.
>
It is easy to change the code so that it only affects the branch in
PyArray_NewFromDescr that currently raises an exception -- providing
both strides but no buffer. There is no need to call
_array_buffer_size if data is provided.
> Could you give more examples of how you will be using these zero-stride
> arrays? What problem are they actually solving?
>
Currently when I need to represent a statistic that is constant across
population, I use scalars. In many cases this works because thanks to
broadcasting rules a scalar behaves almost like a vector with equal
elements. With the changes introduced in numpy, generic code that
works on both scalars and vectors is becoming increasingly easier to
write, but there are some cases where scalars cannot replace a vector
with equal elements. For example, if you want to combine data for two
populations and the data comes as two scalars, you need to somehow
know the size of each population to add to the size of the result. A
zero-stride array would solve this problem: it takes little memory,
but unlike scalar knows its size.
Another use that I was contemplating was to represent per-row or
per-column mask in ma. It is often the case that in a rectangular
matrix data may be missing only for an entire row. It is tempting to
use rank-1 mask with an element for each row to represent this case.
That will work fine, but if you would not be able to use vectors to
specify either per-row or per-column mask. With zero-stride array,
you can use strides=(1,0) or strides=(0,1) and have the same memory
use as with a vector.
More information about the Numpy-discussion
mailing list