[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

josef.pktd@gmai... josef.pktd@gmai...
Thu Apr 4 13:26:51 CDT 2013


On Thu, Apr 4, 2013 at 12:21 PM, Chris Barker - NOAA Federal
<chris.barker@noaa.gov> wrote:
> On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>> We all agree that 'order' is used with two different and orthogonal
>>> meanings in numpy.
>
> well, not entirely orthogonal -- they are the some concept, used in
> different contexts, so there is some benefit to their having
> similarity. So I"d advocate for using the same flag names in any case
> -- i.e. "C" and "F" in both cases.
>
>>> I think we are now more or less agreeing that:
>>>
>>> np.reshape(a, (3, 4), index_order='F')
>>>
>>> is at least as clear as:
>>>
>>> np.reshape(a, (3, 4), order='F')
>
> sure.
>
> The trick is:
>
> np.reshape(a, (3, 4), index_order='A')
>
> which in mingling index_order and memory order......
>
>> I believe our job here is to come to some consensus.
>
> yup.
>
>> In that spirit, I think we do agree on these statements above.
>
> with the caveats I just added...
>
>> Now we have the cost / benefit.
>>
>> Benefit : Some people may find it easier to understand numpy when
>> these constructs are separated.
>>
>> Cost : There might be some confusion because we have changed the
>> default keywords.
>>
>> Benefit
>> -----------
>>
>> What proportion of people would find it easier to understand with the
>> order constructs separated?
>
> It's not just numbers -- it's depth of confusion -- if, once you "get"
> it, you remember it for the rest of your numpy use, then it's not big
> deal. However, if you need to re-think and test every time you
> re-visit reshape or ravel, then there's a significant benefit.

I would also add: If you need it, it's easy to find and understand, even
if it's not completely "obvious" just reading the current docstring.
("Proof": I haven't seen anyone having problems with "column-stacking"
in statsmodels.)

>
> We are talking about "separating the concepts", but I think it takes
> more than a keyword change to do that -- the 'A' and 'K' flags mingle
> the concpets, and are going to be confusing with new keywords -- maybe
> even more so (it says index_order, but the docstring talks about
> memory order)
>
> Does anyone think we should depreciate the 'A' and 'K' flags?
>
> Before you answer that -- does anyone see a use case for the 'A' and
> 'K' flags that can't be reasonably easily accomplished with .view() or
> asarray() or ???

What order does   a[a>2]  use to create the returned 1-D array?
I didn't know, don't remember if I ever knew, and I had to try it out.
How do you find a docstring for this?
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html?highlight=order#boolean

However, I never needed to know and never cared
a[a>2] = 5
a[a>2] = b[a>2]

Now, after this thread, I know about "K", and there might be cases
where it would be appropriate to minimize copying memory, as
Sebastian said, when (index) order doesn't matter.
(Although I'm still using an older numpy, and won't have it for a while.)

>
> if we get rid of the 'A' and 'K' flags, I think think the docstring
> will be more clear, and there may be less need for two names for the
> different "order" concepts (though we could change the flags and the
> keywords...)
>
>> The ravel docstring would looks something like this:
>>
>> index_order : {'C','F', 'A', 'K'}, optional
>>     ...   This keyword used to be called simply 'order', and you can
>> also use the keyword 'order' to specify index_order (this parameter).
>>
>> The problem would then be that, for a while, there will be older code
>> and docs using 'order' instead of 'index_order'.  I think this would
>> not cause much trouble.  Reading the docstring will explain the
>> change.  The old code will continue to work.
>
> not a killer, I agree.

not a killer, but not worth the effort either, I still think.

As I tried to explain, order is consistently used in the documentation
both introduction and in many functions, as general concept with two
levels of application.

Either you have to rewrite it everywhere, or you get inconsistency.
Newbie: "Why are they talking suddenly about index_order, did I miss
something, which other orders are there?"

I think adding a section to explain order more explicitly (Sebastian above)
and improving the docstrings would be very helpful,
but changing the name of the keyword is secondary. (and will mainly
help as a reminder for users that are focused on memory, and not
on the values in their arrays.)

Josef
----------------------
>>> aa.shape
(5, 5)
>>> aa.var()
340.0
>>> np.all(aa.ravel("A") == aa.ravel("C"))
True
>>> np.all(aa.ravel("A") == aa.ravel("F"))
True
>>> np.all(aa.ravel("C") == aa.ravel("F"))
True
---------------------

>
> -Chris
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


More information about the NumPy-Discussion mailing list