[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Matthew Brett matthew.brett@gmail....
Sat Mar 30 17:21:45 CDT 2013


Hi,

On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd@gmail.com> wrote:
> On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd@gmail.com> wrote:
>> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd@gmail.com> wrote:
>>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We were teaching today, and found ourselves getting very confused
>>>>> about ravel and shape in numpy.
>>>>>
>>>>> Summary
>>>>> --------------
>>>>>
>>>>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>>>>
>>>>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>>>>> or the first to the last.  This is "ravel index ordering"
>>>>> Idea 2) The physical layout of the array (on disk or in memory) can be
>>>>> "C" or "F" contiguous or neither.
>>>>> This is "memory ordering"
>>>>>
>>>>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>>>>
>>>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>>>>> index ordering, and this mixes the two ideas and is confusing.
>>>>>
>>>>> What the current situation looks like
>>>>> ----------------------------------------------------
>>>>>
>>>>> Specifically, we've been rolling this around 4 experienced numpy users
>>>>> and we all predicted at least one of the results below wrongly.
>>>>>
>>>>> This was what we knew, or should have known:
>>>>>
>>>>> In [2]: import numpy as np
>>>>>
>>>>> In [3]: arr = np.arange(10).reshape((2, 5))
>>>>>
>>>>> In [5]: arr.ravel()
>>>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> So, the 'ravel' operation unravels over the last axis (1) first,
>>>>> followed by axis 0.
>>>>>
>>>>> So far so good (even if the opposite to MATLAB, Octave).
>>>>>
>>>>> Then we found the 'order' flag to ravel:
>>>>>
>>>>> In [10]: arr.flags
>>>>> Out[10]:
>>>>>   C_CONTIGUOUS : True
>>>>>   F_CONTIGUOUS : False
>>>>>   OWNDATA : False
>>>>>   WRITEABLE : True
>>>>>   ALIGNED : True
>>>>>   UPDATEIFCOPY : False
>>>>>
>>>>> In [11]: arr.ravel('C')
>>>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> But we soon got confused.  How about this?
>>>>>
>>>>> In [12]: arr_F = np.array(arr, order='F')
>>>>>
>>>>> In [13]: arr_F.flags
>>>>> Out[13]:
>>>>>   C_CONTIGUOUS : False
>>>>>   F_CONTIGUOUS : True
>>>>>   OWNDATA : True
>>>>>   WRITEABLE : True
>>>>>   ALIGNED : True
>>>>>   UPDATEIFCOPY : False
>>>>>
>>>>> In [16]: arr_F
>>>>> Out[16]:
>>>>> array([[0, 1, 2, 3, 4],
>>>>>        [5, 6, 7, 8, 9]])
>>>>>
>>>>> In [17]: arr_F.ravel('C')
>>>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>>>>> ordering, but is to do with *index* ordering.
>>>>>
>>>>> And in fact, we can ask for memory ordering specifically:
>>>>>
>>>>> In [22]: arr.ravel('K')
>>>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> In [23]: arr_F.ravel('K')
>>>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>>
>>>>> In [24]: arr.ravel('A')
>>>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> In [25]: arr_F.ravel('A')
>>>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>>
>>>>> There are some confusions to get into with the 'order' flag to reshape
>>>>> as well, of the same type.
>>>>>
>>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>>>>
>>>>> This is very confusing.  We think the index ordering and memory
>>>>> ordering ideas need to be separated, and specifically, we should avoid
>>>>> using "C" and "F" to refer to index ordering.
>>>>>
>>>>> Proposal
>>>>> -------------
>>>>>
>>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>> index ordering for ravel, reshape
>>>>> * Prefer "Z" and "N", being graphical representations of unraveling in
>>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>> naming idea by Paul Ivanov)
>>>>>
>>>>> What do y'all think?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Matthew
>>>>> Paul Ivanov
>>>>> JB Poline
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion@scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>>
>>>> I always thought "F" and "C" are easy to understand, I always thought about
>>>> the content and never about the memory when using it.
>>>
>>> I can only say that 4 out of 4 experienced numpy developers found
>>> themselves unable to predict the behavior of these functions before
>>> they saw the output.
>>>
>>> The problem is always that explaining something makes it clearer for a
>>> moment, but, for those who do not have the explanation or who have
>>> forgotten it, at least among us here, the outputs were generating
>>> groans and / or high fives as we incorrectly or correctly guessed what
>>> was going to happen.
>>>
>>> I think the only way to find out whether this really is confusing or
>>> not, is to put someone in front of these functions without any
>>> explanation and ask them to predict what is going to come out of the
>>> various inputs and flags.   Or to try and teach it, which was the
>>> problem we were having.
>>
>> changing the names doesn't make it easier to understand.
>> I think the confusion is because the new A and K refer to existing memory
>>
>>
>> ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I
>> don't remember having seen any weird cases.
>
> example from our statistics use:
> rows are observations/time periods, columns are variables/individuals
>
> using "F" or "C", we can stack either by time-periods (observations)
> or individuals (cross-section units)
> that's easy to understand.

I disagree, I think it's confusing, but I have evidence, and that is
that four out of four of us tested ourselves and got it wrong.

Perhaps we are particularly dumb or poorly informed, but I think it's
rash to assert there is no problem here.

Cheers,

Matthew


More information about the NumPy-Discussion mailing list