[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Matthew Brett matthew.brett@gmail....
Mon Apr 1 17:29:34 CDT 2013


Hi,

On Mon, Apr 1, 2013 at 1:34 PM,  <josef.pktd@gmail.com> wrote:
> On Mon, Apr 1, 2013 at 3:10 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> Hi,
>>
>> On Mon, Apr 1, 2013 at 10:23 AM, Sebastian Berg
>> <sebastian@sipsolutions.net> wrote:
>>> On Sun, 2013-03-31 at 14:04 -0700, Matthew Brett wrote:
>>>> Hi,
>>>>
>>>> On Sun, Mar 31, 2013 at 1:43 PM,  <josef.pktd@gmail.com> wrote:
>>>> > On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>> >> Hi,
>>>> >>
>>>> >> On Sat, Mar 30, 2013 at 10:38 PM,  <josef.pktd@gmail.com> wrote:
>>>> >>> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd@gmail.com> wrote:
>>>> >>>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>> >>>>>> Hi,
>>>> >>>>>>
>>>> >>>>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd@gmail.com> wrote:
>>>> >>>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>> >>>>>>>> Hi,
>>>> >>>>>>>>
>>>> >>>>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd@gmail.com> wrote:
>>>> >>>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>> >>>>>>>>> <brad.froehle@gmail.com> wrote:
>>>> >>>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett@gmail.com>
>>>> >>>>>>>>>> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd@gmail.com> wrote:
>>>> >>>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd@gmail.com> wrote:
>>>> >>>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>> >>>>>>>>>>> >> <matthew.brett@gmail.com> wrote:
>>>> >>>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd@gmail.com> wrote:
>>>> >>>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>> >>>>>>>>>>> >>>> <matthew.brett@gmail.com> wrote:
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>> >>>>>>>>>>> >>>>> ordering.
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>> >>>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>> >>>>>>>>>>> >>>>> avoid
>>>> >>>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> Proposal
>>>> >>>>>>>>>>> >>>>> -------------
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>> >>>>>>>>>>> >>>>> index ordering for ravel, reshape
>>>> >>>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>> >>>>>>>>>>> >>>>> in
>>>> >>>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>> >>>>>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>> >>>>>>>>>>> >>>>>
>>>> >>>>>>>>>>> >>>>> What do y'all think?
>>>> >>>>>>>>>>> >>>>
>>>> >>>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>> >>>>>>>>>>> >>>> about
>>>> >>>>>>>>>>> >>>> the content and never about the memory when using it.
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>> >> changing the names doesn't make it easier to understand.
>>>> >>>>>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>> >>>>>>>>>>> >> memory
>>>> >>>>>>>>>>> >>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>> >>>>>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>> >>>>>>>>>>> rash to assert there is no problem here.
>>>> >>>>>>>>>
>>>> >>>>>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>> >>>>>>>>
>>>> >>>>>>>> I don't know what you mean by trick question - was there something
>>>> >>>>>>>> over-complicated in the example?  I deliberately didn't include
>>>> >>>>>>>> various much more confusing examples in "reshape".
>>>> >>>>>>>
>>>> >>>>>>> I meant making the "candidates" think about memory instead of just
>>>> >>>>>>> column versus row stacking.
>>>> >>>>>>
>>>> >>>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>>>> >>>>>> array, it was an image, with time as the 4th dimension (N time
>>>> >>>>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>>>> >>>>>> to do in neuroimaging, as you can imagine.
>>>> >>>>>>
>>>> >>>>>> A student asked what he would get back from raveling this array, a
>>>> >>>>>> concatenated time series, or something spatial?
>>>> >>>>>>
>>>> >>>>>> We showed (I'd worked it out by this time) that the first N values
>>>> >>>>>> were the time series given by [0, 0, 0, :].
>>>> >>>>>>
>>>> >>>>>> He said - "Oh - I see - so the data is stored as a whole lot of time
>>>> >>>>>> series one by one, I thought it would be stored as a series of
>>>> >>>>>> images'.
>>>> >>>>>>
>>>> >>>>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>>> >>>>>>
>>>> >>>>>> So, I think the idea of memory ordering and index ordering is very
>>>> >>>>>> easy to confuse, and comes up naturally.
>>>> >>>>>>
>>>> >>>>>> I would like, as a teacher, to be able to say something like:
>>>> >>>>>>
>>>> >>>>>> This is what C memory layout is (it's the memory layout  that gives
>>>> >>>>>> arr.flags.C_CONTIGUOUS=True)
>>>> >>>>>> This is what F memory layout is (it's the memory layout  that gives
>>>> >>>>>> arr.flags.F_CONTIGUOUS=True)
>>>> >>>>>> It's rather easy to get something that is neither C or F memory layout
>>>> >>>>>> Numpy does many memory layouts.
>>>> >>>>>> Ravel and reshape and numpy in general do not care (normally) about C
>>>> >>>>>> or F layouts, they only care about index ordering.
>>>> >>>>>>
>>>> >>>>>> My point, that I'm repeating, is that my job is made harder by
>>>> >>>>>> 'arr.ravel('F')'.
>>>> >>>>>
>>>> >>>>> But once you know that ravel and reshape don't care about memory, the
>>>> >>>>> ravel is easy to predict (maybe not easy to visualize in 4-D):
>>>> >>>>
>>>> >>>> But this assumes that you already know that there's such a thing as
>>>> >>>> memory layout, and there's such a thing as index ordering, and that
>>>> >>>> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
>>>> >>>> you're golden.  I'm arguing it's markedly harder to get this
>>>> >>>> distinction, and keep it in mind, and teach it, if we are using the
>>>> >>>> 'C' and 'F" names for both things.
>>>> >>>
>>>> >>> No, I think you are still missing my point.
>>>> >>> I think explaining ravel and reshape F and C is easy (kind of) because the
>>>> >>> students don't need to know at that stage about memory layouts.
>>>> >>>
>>>> >>> All they need to know is that we look at n-dimensional objects in
>>>> >>> C-order or in  F-order
>>>> >>> (whichever index runs fastest)
>>>> >>
>>>> >> Would you accept that it may or may not be true that it is desirable
>>>> >> or practical not to mention memory layouts when teaching numpy?
>>>> >
>>>> > I think they should be in two different sections.
>>>> >
>>>> > basic usage:
>>>> > ravel, reshape in pure index order, and indexing, broadcasting, ...
>>>> >
>>>> > advanced usage:
>>>> > memory layout and some ability to predict when you get a view and
>>>> > when you get a copy.
>>>>
>>>> Right - that is what you think - but I was asking - do you agree that
>>>> it's possible that that is not best way to teach it?
>>>>
>>>> What evidence would you give that it was the best way to teach it?
>>>>
>>>> > And I still think words can mean different things in different context
>>>> > (with a qualifier maybe)
>>>> > indexing in fortran order
>>>> > memory in fortran order
>>>>
>>>> Right - but you'd probably also accept that using the same word for
>>>> different and related things is likely to cause confusion?   I'm sure
>>>> we could come up with some experimental evidence for that if you do
>>>> doubt it.
>>>>
>>>> > Disclaimer: I never tried to teach numpy
>>>> > and with GSOC students my explanations only went a little bit
>>>> > beyond what they needed to know for the purpose at hand (I hope)
>>>> >
>>>> >>
>>>> >> You believe it is desirable, I believe that it is not - that teaching
>>>> >> numpy naturally involves some discussion of memory layout.
>>>> >>
>>>> >> As evidence:
>>>> >>
>>>> >> * My student, without any prompting about memory layouts, is asking about it
>>>> >> * Travis' numpy book has a very early section on this (section 2.3 -
>>>> >> memory layout)
>>>> >> * I often think about memory layouts, and from your discussion, you do
>>>> >> too.  It's uncommon that you don't have to teach something that
>>>> >> experienced users think about often.
>>>> >
>>>> > I'm mentioning memory layout because I'm talking to you.
>>>> > I wouldn't talk about memory layout if I would try to explain ravel,
>>>> > reshape and indexing for the first time to a student.
>>>> >
>>>> >> * The most common use of 'order' only refers to memory layout.  For
>>>> >> example np.array "order" doesn't refer to index ordering but to memory
>>>> >> layout.
>>>> >
>>>> > No, as I tried to show with the statsmodels example.
>>>> > I don't require GSOC students (that are relatively new to numpy) to understand
>>>> > much about memory layout.
>>>> > The only use of ``order`` in statsmodels refers to *index* order in
>>>> > ravel and reshape.
>>>> >
>>>> >> * The current docstring of 'reshape' cannot be explained without
>>>> >> referring to memory order.
>>>> >
>>>> > really ?
>>>> > I thought reshape only refers to *index* order for "F" and "C"
>>>>
>>>> Here's the docstring for 'reshape':
>>>>
>>>> order : {'C', 'F', 'A'}, optional
>>>>     Determines whether the array data should be viewed as in C
>>>>     (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN
>>>>     order should be preserved.
>>>>
>>>> The 'A' option cannot be explained without reference to 'C' or 'F'
>>>> *memory* layout - i.e. a different meaning of the 'C' and 'F" in the
>>>> indexing interpretation.
>>>>
>>>> Actually, as a matter of interest - how would you explain the behavior
>>>> of 'A' when the array is neither 'C' or 'F' memory layout?  Maybe that
>>>> could be a good test case?
>>>>
>>>
>>> The 'A' means C-order unless `ndarray.flags.fnc == True` (which means
>>> "fortran not C"). The detail about "not C" should not matter really for
>>> copies, for reshape it should maybe be mentioned more clearly. Though
>>> honestly, reshaping with 'A' seems so weird to me, I doubt anyone ever
>>> does it. As for ravel... you can probably just as well use 'K' instead
>>> which is even less restrictive.
>>
>> I was arguing that it is not possible to explain the docstring(s)
>> without reference to memory order - I guess you agree.
>
> I was carefully to always refer to "C" and "F" options.
>
> I've never seen a usage of "A", nor the "K" in ravel ("K" is not
> available in numpy 1.5)
> and I don't expect to run into a case where I need "A" or "K".

Right.  I am only pointing out that one cannot explain the docstring
without reference to memory order.

> My impression is that both "A" and "K" are only good for memory
> optimization, when we do *not* care (much) about the actual sequence.
> (So, in my opinion, it's mostly useless to try to figure out what the
> sequence is.)
>
> So, I would categorize a question for predicting what happens with "A" or "K"
> as a question to separate developers in the style of,
> Do you really understand the tricky parts of numpy? or
> Do you just have a working knowledge of numpy?
>
> (I just avoid certain parts of numpy because they make my head spin.
> e.g. mixing slices and fancy indexing in more than 2d ?)
>
> I'm just against taking away the easy to understand and frequently used
> (names) "F" and "C", to come back to the original question

I agree 'F' and 'C' are frequently used, but I estimate they are most
frequently used with a different meaning.

"Easy to understand" is obviously subjective, and not much use for the
discussion, hence my attempt to try and find some evidence on the
point.

'F' and 'C' are clearly not simple, in a technical sense, because they
have two different meanings.

The use of C and F are of course familiar, and that gives us a bias to
believe they are easy for some someone else to understand.  I was
hoping for some attempt to get past that bias, which is obviously
going to be strong,

I believe that evidence on that point is your requirement that someone
learning this stuff does not come across 'C' or 'F' in the sense of
memory layout, until they are advanced, and my earlier assertion (with
some evidence) that that is neither desirable nor practical.

Cheers,

Matthew


More information about the NumPy-Discussion mailing list