[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Matthew Brett matthew.brett@gmail....
Sat Mar 30 23:12:51 CDT 2013


Hi,

On Sat, Mar 30, 2013 at 9:05 PM,  <josef.pktd@gmail.com> wrote:
> On Sat, Mar 30, 2013 at 11:43 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd@gmail.com> wrote:
>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd@gmail.com> wrote:
>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>> <brad.froehle@gmail.com> wrote:
>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd@gmail.com> wrote:
>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd@gmail.com> wrote:
>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>> >> <matthew.brett@gmail.com> wrote:
>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd@gmail.com> wrote:
>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>> >>>> <matthew.brett@gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>> >>>>> ordering.
>>>>>>> >>>>>
>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>> >>>>> avoid
>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>> >>>>>
>>>>>>> >>>>> Proposal
>>>>>>> >>>>> -------------
>>>>>>> >>>>>
>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>> >>>>> in
>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>> >>>>>
>>>>>>> >>>>> What do y'all think?
>>>>>>> >>>>
>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>> >>>> about
>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>> >>
>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>> >> memory
>>>>>>> >>
>>>>>>>
>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>
>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>> rash to assert there is no problem here.
>>>>>
>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>
>>>> I don't know what you mean by trick question - was there something
>>>> over-complicated in the example?  I deliberately didn't include
>>>> various much more confusing examples in "reshape".
>>>
>>> I meant making the "candidates" think about memory instead of just
>>> column versus row stacking.
>>> I don't think I ever get confused about reshape "F" in 2d.
>>> But when I work with 3d or larger ndim nd-arrays, I always have to
>>> try an example to check my intuition (in general not just reshape).
>>>
>>>>
>>>>> ravel F and C have *nothing* to do with memory layout.
>>>>
>>>> We do agree on this of course - but you said in an earlier mail that
>>>> you thought of 'C" and 'F' as referring to target memory layout (which
>>>> they don't in this case) so I think we also agree that "C" and "F" do
>>>> often refer to memory layout elsewhere in numpy.
>>>
>>> I guess that wasn't so helpful.
>>> (emphasis on *target*, There are very few places where an order
>>> keyword refers to *existing* memory layout.
>>
>> It is helpful because it shows how easy it is to get confused between
>> memory order and index order.
>>
>>> What's reverse index order?
>>
>> I am not being clear, sorry about that:
>>
>> import numpy as np
>>
>> def ravel_iter_last_fastest(arr):
>>     res = []
>>     for i in range(arr.shape[0]):
>>         for j in range(arr.shape[1]):
>>             for k in range(arr.shape[2]):
>>                 # Iterating over last dimension fastest
>>                 res.append(arr[i, j, k])
>>     return np.array(res)
>>
>>
>> def ravel_iter_first_fastest(arr):
>>     res = []
>>     for k in range(arr.shape[2]):
>>         for j in range(arr.shape[1]):
>>             for i in range(arr.shape[0]):
>>                 # Iterating over first dimension fastest
>>                 res.append(arr[i, j, k])
>>     return np.array(res)
>
> good example
>
> that's just C and F order in the terminology of numpy
> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#controlling-iteration-order
> (independent of memory)
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.flatiter.html#numpy.flatiter
>
> I don't think we want to rename a large part of the basic terminology of numpy

Sometimes two ideas get conflated together, and it seems natural to
keep together, until people get confused, and you realize that there
are two separate ideas.

For example here's a quote from the 'flatiter' doc :

    Iteration is done in C-contiguous style

Now - that seems really ugly to me.  For example, 'contiguous' should
not be in that sentence, although it's easy to see why it is, and it
seems to me to be a sign of the confusion between the ideas.

Cheers,

Matthew


More information about the NumPy-Discussion mailing list