[Numpy-discussion] nditer: possible to manually handle dimensions with different lengths?
John Salvatier
jsalvati@u.washington....
Thu Oct 6 12:24:04 CDT 2011
I ended up fixing my problem by removing the 'buffering' flag and adding the
'copy' flag to each of the input arrays.
I think that nested_iters might be improved by an operand axes specification
for each layer of nesting like nditer uses, though I suppose that 3 layers
of nesting might be confusing for users.
I get an "array too big error" on "values, groups, outs = it1.itviews" when
the shape of the iterator is larger than ~(4728, 125285) even if each of the
arrays should only have actual size along one dimension.
Code:
@cython.boundscheck(False)
def groupwise_accumulate(vals, group, axis ):
cdef np.ndarray[double, ndim = 2] values
cdef np.ndarray[long, ndim = 2] groups
cdef np.ndarray[double, ndim = 2] outs
cdef int size
cdef long g
cdef int i, j
#copy so that swaping the axis doesn't mess up the original arrays
vals = vals.copy()
group = group.copy()
#add a dimension to match up with the new dimension and swap the given axis
to the end
vals.shape = vals.shape + (1,)
vaxes = range(vals.ndim)
vaxes.append(axis)
vaxes.remove(axis)
vals = np.transpose(vals, vaxes)
vals = vals.copy()
#the output should have the same shape as the values except along the
#last two axes (which are the given axis and the new axis)
oshape = list(vals.shape)
bins = len(np.unique(group))
oshape[-1] = 1
oshape[-2] = bins
out = np.empty(tuple(oshape))
#line up grouping with the given axis
gshape = [1] * (len(oshape) - 1) + [vals.shape[-1]]
group.shape = gshape
#nested iterator should go along the last two axes
axes = range(vals.ndim)
axes0 = axes[:-2]
axes1 = axes[-2:]
it0, it1 = np.nested_iters([vals,group, out],
[axes0, axes1],
op_dtypes=['float64', 'int32', 'float64'],
op_flags = [['readonly', 'copy'], ['readonly','copy'],
['readwrite']],
flags = ['multi_index', 'reduce_ok' ])
size = vals.shape[-1]
for x in it0:
values, groups, outs = it1.itviews
i = 0
j = 0
while i < size:
g = groups[0,i]
#accumulation initialization
while i < size and groups[0,i] == g:
#groupwise accumulation
i += 1
outs[j,0] = calculation()
j += 1
#swap back the new axis to the original location of the given axis
out.shape = out.shape[:-1]
oaxes = range(vals.ndim -1)
oaxes.insert(axis, out.ndim-1)
oaxes = oaxes[:-1] #remove the now reduced original given axis
out = np.transpose(out, oaxes)
return out
On Mon, Oct 3, 2011 at 2:03 PM, John Salvatier <jsalvati@u.washington.edu>wrote:
> Some observations and questions about nested_iters. Nested_iters seems to
> require that all input arrays have the same number of dimensions (so you
> will have to pad some input shapes with 1s). Is there a way to specify how
> the axes line are matched together like for nditer?
>
>
> When I try to run the following program,
>
> @cython.boundscheck(False)
> def vars(vals, group, axis ):
> cdef np.ndarray[double, ndim = 2] values
> cdef np.ndarray[long long, ndim = 2] groups
> cdef np.ndarray[double, ndim = 2] outs
> cdef int size
> cdef double value
> cdef int i, j
> cdef long long cgroup
> cdef double min
> cdef double max
> cdef double open
> oshape = list(vals.shape)
> bins = len(np.unique(group))
> oshape = oshape+[bins]
> oshape[axis] = 1
> out = np.empty(tuple(oshape))
> axes = range(vals.ndim)
> axes.remove(axis)
> gshape = [1] * len(oshape)
> gshape[axis] = len(group)
> group.shape = gshape
> vals = vals[...,np.newaxis]
> it0, it1 = np.nested_iters([vals,group, out],
> [axes, [axis,len(oshape) -1]],
> op_dtypes=['float64', 'int64', 'float64'],
> flags = ['multi_index', 'buffered'])
> size = vals.shape[axis]
> for x in it0:
> values, groups, outs = it1.itviews
>
> j = -1
> for i in range(size):
> if cgroup != groups[i,0]:
> if j != -1:
> outs[0,j] = garmanklass(open, values[i,0], min, max)
> cgroup = groups[i,0]
> min = inf
> max = -inf
> open = values[i,0]
> j += 1
>
> min = fmin(min, values[i,0])
> max = fmax(max, values[i,0])
>
> outs[0,j+1] = garmanklass(open, values[size -1], min, max)
> return out
>
>
> I get an error
>
> File "comp.pyx", line 58, in varscale.comp.vars (varscale\comp.c:1565)
> values, groups, outs = it1.itviews
> ValueError: cannot provide an iterator view when buffering is enabled
>
>
> Which I am not sure how to deal with. Any advice?
>
> What I am trying to do here is to do a "grouped" calculation (the group
> specified by the group argument) on the values along the given axis. I try
> to use nested_iter to iterate over the specified axis and a new axis (the
> length of the number of groups) separately so I can do my calculation.
>
> On Mon, Oct 3, 2011 at 9:03 AM, John Salvatier <jsalvati@u.washington.edu>wrote:
>
>> Thanks mark! I think that's exactly what I'm looking for. We even had a
>> previous discussion about this (oops!) (
>> http://mail.scipy.org/pipermail/numpy-discussion/2011-January/054421.html
>> ).
>>
>> I didn't find any documentation, I will try to add some once I understand
>> how it works better.
>>
>> John
>>
>>
>> On Sat, Oct 1, 2011 at 2:53 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
>>
>>> On Sat, Oct 1, 2011 at 1:45 PM, John Salvatier <
>>> jsalvati@u.washington.edu> wrote:
>>>
>>>> I apologize, I picked a poor example of what I want to do. Your
>>>> suggestion would work for the example I provided, but not for a more complex
>>>> example. My actual task is something like a "group by" operation along a
>>>> particular axis (with a known number of groups).
>>>>
>>>> Let me try again: What I would like to be able to do is to specify some
>>>> of the iterator dimensions to be handled manually by me. For example lets
>>>> say I have some kind of a 2d smoothing algorithm. If I start with an array
>>>> of shape [a,b,c,d] and I'd like to do the 2d smoothing over the 2nd and 3rd
>>>> dimensions, I'd like to be able to tell nditer to do normal broadcasting and
>>>> iteration over the 1st and 4th dimensions but leave iteration over the 2nd
>>>> and 3rd dimensions to me and my algorithm. Each iteration of nditer would
>>>> give me a 2d array to which I apply my algorithm. This way I could write
>>>> more arbitrary functions that operate on arrays and support broadcasting.
>>>>
>>>> Is clearer?
>>>>
>>>
>>> Maybe this will work for you:
>>>
>>> In [15]: a = np.arange(2*3*4*5).reshape(2,3,4,5)
>>>
>>> In [16]: it0, it1 = np.nested_iters(a, [[0,3], [1,2]],
>>> flags=['multi_index'])
>>>
>>> In [17]: for x in it0:
>>> ....: print it1.itviews[0]
>>> ....:
>>> [[ 0 5 10 15]
>>> [20 25 30 35]
>>> [40 45 50 55]]
>>> [[ 1 6 11 16]
>>> [21 26 31 36]
>>> [41 46 51 56]]
>>> [[ 2 7 12 17]
>>> [22 27 32 37]
>>> [42 47 52 57]]
>>> [[ 3 8 13 18]
>>> [23 28 33 38]
>>> [43 48 53 58]]
>>> [[ 4 9 14 19]
>>> [24 29 34 39]
>>> [44 49 54 59]]
>>> [[ 60 65 70 75]
>>> [ 80 85 90 95]
>>> [100 105 110 115]]
>>> [[ 61 66 71 76]
>>> [ 81 86 91 96]
>>> [101 106 111 116]]
>>> [[ 62 67 72 77]
>>> [ 82 87 92 97]
>>> [102 107 112 117]]
>>> [[ 63 68 73 78]
>>> [ 83 88 93 98]
>>> [103 108 113 118]]
>>> [[ 64 69 74 79]
>>> [ 84 89 94 99]
>>> [104 109 114 119]]
>>>
>>> Cheers,
>>> Mark
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>> On Fri, Sep 30, 2011 at 5:04 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
>>>>
>>>>> On Fri, Sep 30, 2011 at 8:03 AM, John Salvatier <
>>>>> jsalvati@u.washington.edu> wrote:
>>>>>
>>>>>> Using nditer, is it possible to manually handle dimensions with
>>>>>> different lengths?
>>>>>>
>>>>>> For example, lets say I had an array A[5, 100] and I wanted to sample
>>>>>> every 10 along the second axis so I would end up with an array B[5,10]. Is
>>>>>> it possible to do this with nditer, handling the iteration over the second
>>>>>> axis manually of course (probably in cython)?
>>>>>>
>>>>>> I want something like this (modified from
>>>>>> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#putting-the-inner-loop-in-cython
>>>>>> )
>>>>>>
>>>>>> @cython.boundscheck(False)
>>>>>> def sum_squares_cy(arr):
>>>>>> cdef np.ndarray[double] x
>>>>>> cdef np.ndarray[double] y
>>>>>> cdef int size
>>>>>> cdef double value
>>>>>> cdef int j
>>>>>>
>>>>>> axeslist = list(arr.shape)
>>>>>> axeslist[1] = -1
>>>>>>
>>>>>> out = zeros((arr.shape[0], 10))
>>>>>> it = np.nditer([arr, out], flags=['reduce_ok', 'external_loop',
>>>>>> 'buffered', 'delay_bufalloc'],
>>>>>> op_flags=[['readonly'], ['readwrite',
>>>>>> 'no_broadcast']],
>>>>>> op_axes=[None, axeslist],
>>>>>> op_dtypes=['float64', 'float64'])
>>>>>> it.operands[1][...] = 0
>>>>>> it.reset()
>>>>>> for xarr, yarr in it:
>>>>>> x = xarr
>>>>>> y = yarr
>>>>>> size = x.shape[0]
>>>>>> j = 0
>>>>>> for i in range(size):
>>>>>> #some magic here involving indexing into x[i] and y[j]
>>>>>> return it.operands[1]
>>>>>>
>>>>>> Does this make sense? Is it possible to do?
>>>>>>
>>>>>
>>>>> I'm not sure I understand precisely what you're asking. Maybe you
>>>>> could reshape A to have shape [5, 10, 10], so that one of those 10's can
>>>>> match up with the 10 in B, perhaps with the op_axes?
>>>>>
>>>>> -Mark
>>>>>
>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> NumPy-Discussion mailing list
>>>>>> NumPy-Discussion@scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion@scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111006/810cced0/attachment-0001.html
More information about the NumPy-Discussion
mailing list