[Numpy-discussion] C vs. Fortran order -- misleading documentation?

David Goldsmith d.l.goldsmith@gmail....
Tue Jun 8 16:17:06 CDT 2010


On Tue, Jun 8, 2010 at 1:56 PM, Benjamin Root <ben.root@ou.edu> wrote:

>
> On Tue, Jun 8, 2010 at 1:36 PM, Eric Firing <efiring@hawaii.edu> wrote:
>
>> On 06/08/2010 08:16 AM, Eric Firing wrote:
>> > On 06/08/2010 05:50 AM, Charles R Harris wrote:
>> >>
>> >> On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmith<
>> d.l.goldsmith@gmail.com
>> >> <mailto:d.l.goldsmith@gmail.com>>  wrote:
>> >>
>> >>      On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant<MaxPlanck@seznam.cz
>> >>      <mailto:MaxPlanck@seznam.cz>>  wrote:
>> >>
>> >>           >  >  Correct me if I am wrong, but the paragraph
>> >>           >  >
>> >>           >  >  Note to those used to IDL or Fortran memory order as it
>> >>          relates to
>> >>           >  >  indexing. Numpy uses C-order indexing. That means that
>> the
>> >>          last index
>> >>           >  >  usually (see xxx for exceptions) represents the most
>> >>          rapidly changing memory
>> >>           >  >  location, unlike Fortran or IDL, where the first index
>> >>          represents the most
>> >>           >  >  rapidly changing location in memory. This difference
>> >>          represents a great
>> >>           >  >  potential for confusion.
>> >>           >  >
>> >>           >  >  in
>> >>           >  >
>> >>           >  >
>> http://docs.scipy.org/doc/numpy/user/basics.indexing.html
>> >>           >  >
>> >>           >  >  is quite misleading, as C-order means that the last
>> index
>> >>          changes rapidly,
>> >>           >  >  not the
>> >>           >  >  memory location.
>> >>           >  >
>> >>           >  >
>> >>           >  Any index can change rapidly, depending on whether is in
>> an
>> >>          inner loop or
>> >>           >  not. The important distinction between C and Fortran order
>> is
>> >>          how indices
>> >>           >  translate to memory locations. The documentation seems
>> >>          correct to me,
>> >>           >  although it might make more sense to say the last index
>> >>          addresses a
>> >>           >  contiguous range of memory. Of course, with modern
>> >>          processors, actual
>> >>           >  physical memory can be mapped all over the place.
>> >>           >
>> >>           >  Chuck
>> >>
>> >>          To me, saying that the last index represents the most rapidly
>> >>          changing memory
>> >>          location means that if I change the last index, the memory
>> >>          location changes
>> >>          a lot, which is not true for C-order. So for C-order, supposed
>> >>          one scans the memory
>> >>          linearly (the desired scenario),  it is the last *index* that
>> >>          changes most rapidly.
>> >>
>> >>          The inverted picture looks like this: For C-order,  changing
>> the
>> >>          first index
>> >>          leads to the most rapid jump in *memory*.
>> >>
>> >>          Still have the feeling the doc is very misleading at this
>> >>          important issue.
>> >>
>> >>          Pavel
>> >>
>> >>
>> >>      The distinction between your two perspectives is that one is using
>> >>      for-loop traversal of indices, the other is using
>> pointer-increment
>> >>      traversal of memory; from each of your perspectives, your
>> >>      conclusions are "correct," but my inclination is that the
>> >>      pointer-increment traversal of memory perspective is closer to the
>> >>      "spirit" of the docstring, no?
>> >>
>> >>
>> >> I think the confusion is in "most rapidly changing memory location",
>> >> which is kind of ambiguous because a change in the indices is always a
>> >> change in memory location if one hasn't used index tricks and such. So
>> >> from a time perspective it means nothing, while from a memory
>> >> perspective the largest address changes come from the leftmost indices.
>> >
>> > Exactly.  Rate of change with respect to what, or as you do what?
>> >
>> > I suggest something like the following wording, if you don't mind the
>> > verbosity as a means of conjuring up an image (although putting in
>> > diagrams would make it even clearer--undoubtedly there are already good
>> > illustrations somewhere on the web):
>> >
>> > ------------
>> >
>> > Note to those used to Matlab, IDL, or Fortran memory order as it relates
>> > to indexing. Numpy uses C-order indexing by default, although a numpy
>> > array can be designated as using Fortran order. [With C-order,
>> > sequential memory locations are accessed by incrementing the last
>>
>> Maybe change "sequential" to "contiguous".
>>
>> I was thinking maybe "subsequent" might be a better word.
>

IMV, contiguous has more of a "physical" connotation.  (That just isn't
valid in Numpy, correct?)  So I'd prefer subsequent as an alternative to
sequential.

>
> In the end, we need to communicate this clearly.  No matter which language,
> I have always found it difficult to get new programmers to understand the
> importance of knowing the difference between row-major and column-major.  A
> "thick" paragraph isn't going to help to get the idea across to a person who
> doesn't even know that a problem exists.
>
> Maybe a car analogy would be good here...
>
> Maybe if one imagine city streets (where many of the streets are one-way),
> and need to drop off mail at each address.  Would it be more efficient to go
> up and back a street or to drop off mail at the first address of the street
> and then move on to the first address of the next street?
>

But the issue isn't one of efficiency, it's merely an arbitrarily chosen
convention.  (Does anyone know the history of the choices for FORTRAN and C,
esp. why K&R chose the opposite of what was already in common usage in
FORTRAN?  Just curious?)

Does the OP have an opinion on the various alternatives offered so far?

DG

>
> Just my two cents...
>
> Ben Root
>
>
>>
>> > index.]  For a two-dimensional array, think if it as a table.  With
>> > C-order indexing the table is stored as a series of rows, so that one is
>> > reading from left to right, incrementing the column (last) index, and
>> > jumping ahead in memory to the next row by incrementing the row (first)
>> > index. With Fortran order, the table is stored as a series of columns,
>> > so one reads memory sequentially from top to bottom, incrementing the
>> > first index, and jumps ahead in memory to the next column by
>> > incrementing the last index.
>> >
>> > One more difference to be aware of: numpy, like python and C, uses
>> > zero-based indexing; Matlab, [IDL???], and Fortran start from one.
>> >
>> > -----------------
>> >
>> > If you want to keep it short, the key wording is in the sentence in
>> > brackets, and you can chop out the table illustration.
>> >
>> > Eric
>> >
>> >
>> >>
>> >> Chuck
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100608/9c65b5b3/attachment.html 


More information about the NumPy-Discussion mailing list