[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Matthew Brett matthew.brett@gmail....
Sat Mar 30 23:50:00 CDT 2013


Hi,

On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd@gmail.com> wrote:
> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd@gmail.com> wrote:
>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd@gmail.com> wrote:
>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>> <brad.froehle@gmail.com> wrote:
>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd@gmail.com> wrote:
>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd@gmail.com> wrote:
>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>> >> <matthew.brett@gmail.com> wrote:
>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd@gmail.com> wrote:
>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>> >>>> <matthew.brett@gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>> >>>>> ordering.
>>>>>>> >>>>>
>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>> >>>>> avoid
>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>> >>>>>
>>>>>>> >>>>> Proposal
>>>>>>> >>>>> -------------
>>>>>>> >>>>>
>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>> >>>>> in
>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>> >>>>>
>>>>>>> >>>>> What do y'all think?
>>>>>>> >>>>
>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>> >>>> about
>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>> >>
>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>> >> memory
>>>>>>> >>
>>>>>>>
>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>
>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>> rash to assert there is no problem here.
>>>>>
>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>
>>>> I don't know what you mean by trick question - was there something
>>>> over-complicated in the example?  I deliberately didn't include
>>>> various much more confusing examples in "reshape".
>>>
>>> I meant making the "candidates" think about memory instead of just
>>> column versus row stacking.
>>
>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>> array, it was an image, with time as the 4th dimension (N time
>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>> to do in neuroimaging, as you can imagine.
>>
>> A student asked what he would get back from raveling this array, a
>> concatenated time series, or something spatial?
>>
>> We showed (I'd worked it out by this time) that the first N values
>> were the time series given by [0, 0, 0, :].
>>
>> He said - "Oh - I see - so the data is stored as a whole lot of time
>> series one by one, I thought it would be stored as a series of
>> images'.
>>
>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>
>> So, I think the idea of memory ordering and index ordering is very
>> easy to confuse, and comes up naturally.
>>
>> I would like, as a teacher, to be able to say something like:
>>
>> This is what C memory layout is (it's the memory layout  that gives
>> arr.flags.C_CONTIGUOUS=True)
>> This is what F memory layout is (it's the memory layout  that gives
>> arr.flags.F_CONTIGUOUS=True)
>> It's rather easy to get something that is neither C or F memory layout
>> Numpy does many memory layouts.
>> Ravel and reshape and numpy in general do not care (normally) about C
>> or F layouts, they only care about index ordering.
>>
>> My point, that I'm repeating, is that my job is made harder by
>> 'arr.ravel('F')'.
>
> But once you know that ravel and reshape don't care about memory, the
> ravel is easy to predict (maybe not easy to visualize in 4-D):

But this assumes that you already know that there's such a thing as
memory layout, and there's such a thing as index ordering, and that
'C' and 'F' in ravel refer to index ordering.  Once you have that,
you're golden.  I'm arguing it's markedly harder to get this
distinction, and keep it in mind, and teach it, if we are using the
'C' and 'F" names for both things.

> order=C: stack the last dimension, N, time series of one 3d pixels,
> then stack the time series of the next pixel...
>     process pixels by depth and the row by row (like old TVs)
>
> I assume you did this because your underlying array is C contiguous.
> so your ravel('C') is a c-contiguous view (instead of some weird
> strides or a copy)

Sorry - what do you mean by 'this' in 'did this'?  Reshape?   Why
would it matter what my underlying array memory layout was?

> I usually prefer time in the first dimension, and stack order=F, then
> I can start at the front, stack all time periods of the first pixel,
> keep going and work pixels down the columns, first page, next page,
> ...
> (and I hope I have a F-contiguous array, so my raveled array is also
> F-contiguous.)
>
> (note: I'm bringing memory back in as optimization, but not to predict
> the stacking)
>
> Josef
> (I think brains are designed for Fortran order and C-ordering in numpy
> is a accident,
> except, reading a Western language book is neither)

Yes, I find first axis fastest changing easier to think about, and I
came from MATLAB (about 8 years ago mind), so that also made it more
natural.

I had (until yesterday) simply assumed that numpy unraveled that way,
because it seemed more obvious to me, and knew that the unravel index
order need have nothing to do with the memory order, or the fact that
arrays are C contiguous by default.   Not so of course.  That's not my
complaint as you know - it's just a convention, I guessed the
convention wrong.

Cheers,

Matthew


More information about the NumPy-Discussion mailing list