[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

josef.pktd@gmai... josef.pktd@gmai...
Sun Mar 31 15:43:36 CDT 2013

On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
> Hi,
> On Sat, Mar 30, 2013 at 10:38 PM,  <josef.pktd@gmail.com> wrote:
>> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>> Hi,
>>> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd@gmail.com> wrote:
>>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>>> Hi,
>>>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd@gmail.com> wrote:
>>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>>>>> Hi,
>>>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd@gmail.com> wrote:
>>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>>>>> <brad.froehle@gmail.com> wrote:
>>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd@gmail.com> wrote:
>>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd@gmail.com> wrote:
>>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>>>>> >> <matthew.brett@gmail.com> wrote:
>>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd@gmail.com> wrote:
>>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>>>>> >>>> <matthew.brett@gmail.com> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>>>>> >>>>> ordering.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>>>>> >>>>> avoid
>>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Proposal
>>>>>>>>>> >>>>> -------------
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>>>>> >>>>> in
>>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> What do y'all think?
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>>>>> >>>> about
>>>>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>>>>> >>
>>>>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>>>>> >> memory
>>>>>>>>>> >>
>>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>>>>> rash to assert there is no problem here.
>>>>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>>>> I don't know what you mean by trick question - was there something
>>>>>>> over-complicated in the example?  I deliberately didn't include
>>>>>>> various much more confusing examples in "reshape".
>>>>>> I meant making the "candidates" think about memory instead of just
>>>>>> column versus row stacking.
>>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>>>>> array, it was an image, with time as the 4th dimension (N time
>>>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>>>>> to do in neuroimaging, as you can imagine.
>>>>> A student asked what he would get back from raveling this array, a
>>>>> concatenated time series, or something spatial?
>>>>> We showed (I'd worked it out by this time) that the first N values
>>>>> were the time series given by [0, 0, 0, :].
>>>>> He said - "Oh - I see - so the data is stored as a whole lot of time
>>>>> series one by one, I thought it would be stored as a series of
>>>>> images'.
>>>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>>>> So, I think the idea of memory ordering and index ordering is very
>>>>> easy to confuse, and comes up naturally.
>>>>> I would like, as a teacher, to be able to say something like:
>>>>> This is what C memory layout is (it's the memory layout  that gives
>>>>> arr.flags.C_CONTIGUOUS=True)
>>>>> This is what F memory layout is (it's the memory layout  that gives
>>>>> arr.flags.F_CONTIGUOUS=True)
>>>>> It's rather easy to get something that is neither C or F memory layout
>>>>> Numpy does many memory layouts.
>>>>> Ravel and reshape and numpy in general do not care (normally) about C
>>>>> or F layouts, they only care about index ordering.
>>>>> My point, that I'm repeating, is that my job is made harder by
>>>>> 'arr.ravel('F')'.
>>>> But once you know that ravel and reshape don't care about memory, the
>>>> ravel is easy to predict (maybe not easy to visualize in 4-D):
>>> But this assumes that you already know that there's such a thing as
>>> memory layout, and there's such a thing as index ordering, and that
>>> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
>>> you're golden.  I'm arguing it's markedly harder to get this
>>> distinction, and keep it in mind, and teach it, if we are using the
>>> 'C' and 'F" names for both things.
>> No, I think you are still missing my point.
>> I think explaining ravel and reshape F and C is easy (kind of) because the
>> students don't need to know at that stage about memory layouts.
>> All they need to know is that we look at n-dimensional objects in
>> C-order or in  F-order
>> (whichever index runs fastest)
> Would you accept that it may or may not be true that it is desirable
> or practical not to mention memory layouts when teaching numpy?

I think they should be in two different sections.

basic usage:
ravel, reshape in pure index order, and indexing, broadcasting, ...

advanced usage:
memory layout and some ability to predict when you get a view and
when you get a copy.

And I still think words can mean different things in different context
(with a qualifier maybe)
indexing in fortran order
memory in fortran order

Disclaimer: I never tried to teach numpy
and with GSOC students my explanations only went a little bit
beyond what they needed to know for the purpose at hand (I hope)

> You believe it is desirable, I believe that it is not - that teaching
> numpy naturally involves some discussion of memory layout.
> As evidence:
> * My student, without any prompting about memory layouts, is asking about it
> * Travis' numpy book has a very early section on this (section 2.3 -
> memory layout)
> * I often think about memory layouts, and from your discussion, you do
> too.  It's uncommon that you don't have to teach something that
> experienced users think about often.

I'm mentioning memory layout because I'm talking to you.
I wouldn't talk about memory layout if I would try to explain ravel,
reshape and indexing for the first time to a student.

> * The most common use of 'order' only refers to memory layout.  For
> example np.array "order" doesn't refer to index ordering but to memory
> layout.

No, as I tried to show with the statsmodels example.
I don't require GSOC students (that are relatively new to numpy) to understand
much about memory layout.
The only use of ``order`` in statsmodels refers to *index* order in
ravel and reshape.

> * The current docstring of 'reshape' cannot be explained without
> referring to memory order.

really ?
I thought reshape only refers to *index* order for "F" and "C"

I don't think I can express my preference for reshape order="F" any
better than I did, so maybe it's time for some additional users/developers
to chime in.


> Cheers,
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list