[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Matthew Brett matthew.brett@gmail....
Mon Apr 1 14:10:09 CDT 2013


Hi,

On Mon, Apr 1, 2013 at 10:23 AM, Sebastian Berg
<sebastian@sipsolutions.net> wrote:
> On Sun, 2013-03-31 at 14:04 -0700, Matthew Brett wrote:
>> Hi,
>>
>> On Sun, Mar 31, 2013 at 1:43 PM,  <josef.pktd@gmail.com> wrote:
>> > On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> >> Hi,
>> >>
>> >> On Sat, Mar 30, 2013 at 10:38 PM,  <josef.pktd@gmail.com> wrote:
>> >>> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> >>>> Hi,
>> >>>>
>> >>>> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd@gmail.com> wrote:
>> >>>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd@gmail.com> wrote:
>> >>>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd@gmail.com> wrote:
>> >>>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>> >>>>>>>>> <brad.froehle@gmail.com> wrote:
>> >>>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd@gmail.com> wrote:
>> >>>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd@gmail.com> wrote:
>> >>>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>> >>>>>>>>>>> >> <matthew.brett@gmail.com> wrote:
>> >>>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd@gmail.com> wrote:
>> >>>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>> >>>>>>>>>>> >>>> <matthew.brett@gmail.com> wrote:
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>> >>>>>>>>>>> >>>>> ordering.
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>> >>>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>> >>>>>>>>>>> >>>>> avoid
>> >>>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> Proposal
>> >>>>>>>>>>> >>>>> -------------
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>> >>>>>>>>>>> >>>>> index ordering for ravel, reshape
>> >>>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>> >>>>>>>>>>> >>>>> in
>> >>>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>> >>>>>>>>>>> >>>>> naming idea by Paul Ivanov)
>> >>>>>>>>>>> >>>>>
>> >>>>>>>>>>> >>>>> What do y'all think?
>> >>>>>>>>>>> >>>>
>> >>>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>> >>>>>>>>>>> >>>> about
>> >>>>>>>>>>> >>>> the content and never about the memory when using it.
>> >>>>>>>>>>> >>
>> >>>>>>>>>>> >> changing the names doesn't make it easier to understand.
>> >>>>>>>>>>> >> I think the confusion is because the new A and K refer to existing
>> >>>>>>>>>>> >> memory
>> >>>>>>>>>>> >>
>> >>>>>>>>>>>
>> >>>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>> >>>>>>>>>>> that four out of four of us tested ourselves and got it wrong.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>> >>>>>>>>>>> rash to assert there is no problem here.
>> >>>>>>>>>
>> >>>>>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>> >>>>>>>>
>> >>>>>>>> I don't know what you mean by trick question - was there something
>> >>>>>>>> over-complicated in the example?  I deliberately didn't include
>> >>>>>>>> various much more confusing examples in "reshape".
>> >>>>>>>
>> >>>>>>> I meant making the "candidates" think about memory instead of just
>> >>>>>>> column versus row stacking.
>> >>>>>>
>> >>>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>> >>>>>> array, it was an image, with time as the 4th dimension (N time
>> >>>>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>> >>>>>> to do in neuroimaging, as you can imagine.
>> >>>>>>
>> >>>>>> A student asked what he would get back from raveling this array, a
>> >>>>>> concatenated time series, or something spatial?
>> >>>>>>
>> >>>>>> We showed (I'd worked it out by this time) that the first N values
>> >>>>>> were the time series given by [0, 0, 0, :].
>> >>>>>>
>> >>>>>> He said - "Oh - I see - so the data is stored as a whole lot of time
>> >>>>>> series one by one, I thought it would be stored as a series of
>> >>>>>> images'.
>> >>>>>>
>> >>>>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>> >>>>>>
>> >>>>>> So, I think the idea of memory ordering and index ordering is very
>> >>>>>> easy to confuse, and comes up naturally.
>> >>>>>>
>> >>>>>> I would like, as a teacher, to be able to say something like:
>> >>>>>>
>> >>>>>> This is what C memory layout is (it's the memory layout  that gives
>> >>>>>> arr.flags.C_CONTIGUOUS=True)
>> >>>>>> This is what F memory layout is (it's the memory layout  that gives
>> >>>>>> arr.flags.F_CONTIGUOUS=True)
>> >>>>>> It's rather easy to get something that is neither C or F memory layout
>> >>>>>> Numpy does many memory layouts.
>> >>>>>> Ravel and reshape and numpy in general do not care (normally) about C
>> >>>>>> or F layouts, they only care about index ordering.
>> >>>>>>
>> >>>>>> My point, that I'm repeating, is that my job is made harder by
>> >>>>>> 'arr.ravel('F')'.
>> >>>>>
>> >>>>> But once you know that ravel and reshape don't care about memory, the
>> >>>>> ravel is easy to predict (maybe not easy to visualize in 4-D):
>> >>>>
>> >>>> But this assumes that you already know that there's such a thing as
>> >>>> memory layout, and there's such a thing as index ordering, and that
>> >>>> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
>> >>>> you're golden.  I'm arguing it's markedly harder to get this
>> >>>> distinction, and keep it in mind, and teach it, if we are using the
>> >>>> 'C' and 'F" names for both things.
>> >>>
>> >>> No, I think you are still missing my point.
>> >>> I think explaining ravel and reshape F and C is easy (kind of) because the
>> >>> students don't need to know at that stage about memory layouts.
>> >>>
>> >>> All they need to know is that we look at n-dimensional objects in
>> >>> C-order or in  F-order
>> >>> (whichever index runs fastest)
>> >>
>> >> Would you accept that it may or may not be true that it is desirable
>> >> or practical not to mention memory layouts when teaching numpy?
>> >
>> > I think they should be in two different sections.
>> >
>> > basic usage:
>> > ravel, reshape in pure index order, and indexing, broadcasting, ...
>> >
>> > advanced usage:
>> > memory layout and some ability to predict when you get a view and
>> > when you get a copy.
>>
>> Right - that is what you think - but I was asking - do you agree that
>> it's possible that that is not best way to teach it?
>>
>> What evidence would you give that it was the best way to teach it?
>>
>> > And I still think words can mean different things in different context
>> > (with a qualifier maybe)
>> > indexing in fortran order
>> > memory in fortran order
>>
>> Right - but you'd probably also accept that using the same word for
>> different and related things is likely to cause confusion?   I'm sure
>> we could come up with some experimental evidence for that if you do
>> doubt it.
>>
>> > Disclaimer: I never tried to teach numpy
>> > and with GSOC students my explanations only went a little bit
>> > beyond what they needed to know for the purpose at hand (I hope)
>> >
>> >>
>> >> You believe it is desirable, I believe that it is not - that teaching
>> >> numpy naturally involves some discussion of memory layout.
>> >>
>> >> As evidence:
>> >>
>> >> * My student, without any prompting about memory layouts, is asking about it
>> >> * Travis' numpy book has a very early section on this (section 2.3 -
>> >> memory layout)
>> >> * I often think about memory layouts, and from your discussion, you do
>> >> too.  It's uncommon that you don't have to teach something that
>> >> experienced users think about often.
>> >
>> > I'm mentioning memory layout because I'm talking to you.
>> > I wouldn't talk about memory layout if I would try to explain ravel,
>> > reshape and indexing for the first time to a student.
>> >
>> >> * The most common use of 'order' only refers to memory layout.  For
>> >> example np.array "order" doesn't refer to index ordering but to memory
>> >> layout.
>> >
>> > No, as I tried to show with the statsmodels example.
>> > I don't require GSOC students (that are relatively new to numpy) to understand
>> > much about memory layout.
>> > The only use of ``order`` in statsmodels refers to *index* order in
>> > ravel and reshape.
>> >
>> >> * The current docstring of 'reshape' cannot be explained without
>> >> referring to memory order.
>> >
>> > really ?
>> > I thought reshape only refers to *index* order for "F" and "C"
>>
>> Here's the docstring for 'reshape':
>>
>> order : {'C', 'F', 'A'}, optional
>>     Determines whether the array data should be viewed as in C
>>     (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN
>>     order should be preserved.
>>
>> The 'A' option cannot be explained without reference to 'C' or 'F'
>> *memory* layout - i.e. a different meaning of the 'C' and 'F" in the
>> indexing interpretation.
>>
>> Actually, as a matter of interest - how would you explain the behavior
>> of 'A' when the array is neither 'C' or 'F' memory layout?  Maybe that
>> could be a good test case?
>>
>
> The 'A' means C-order unless `ndarray.flags.fnc == True` (which means
> "fortran not C"). The detail about "not C" should not matter really for
> copies, for reshape it should maybe be mentioned more clearly. Though
> honestly, reshaping with 'A' seems so weird to me, I doubt anyone ever
> does it. As for ravel... you can probably just as well use 'K' instead
> which is even less restrictive.

I was arguing that it is not possible to explain the docstring(s)
without reference to memory order - I guess you agree.

Cheers,

Matthew


More information about the NumPy-Discussion mailing list