[Numpy-discussion] C vs. Fortran order -- misleading documentation?
David Goldsmith
d.l.goldsmith@gmail....
Tue Jun 8 16:17:06 CDT 2010
On Tue, Jun 8, 2010 at 1:56 PM, Benjamin Root <ben.root@ou.edu> wrote:
>
> On Tue, Jun 8, 2010 at 1:36 PM, Eric Firing <efiring@hawaii.edu> wrote:
>
>> On 06/08/2010 08:16 AM, Eric Firing wrote:
>> > On 06/08/2010 05:50 AM, Charles R Harris wrote:
>> >>
>> >> On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmith<
>> d.l.goldsmith@gmail.com
>> >> <mailto:d.l.goldsmith@gmail.com>> wrote:
>> >>
>> >> On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant<MaxPlanck@seznam.cz
>> >> <mailto:MaxPlanck@seznam.cz>> wrote:
>> >>
>> >> > > Correct me if I am wrong, but the paragraph
>> >> > >
>> >> > > Note to those used to IDL or Fortran memory order as it
>> >> relates to
>> >> > > indexing. Numpy uses C-order indexing. That means that
>> the
>> >> last index
>> >> > > usually (see xxx for exceptions) represents the most
>> >> rapidly changing memory
>> >> > > location, unlike Fortran or IDL, where the first index
>> >> represents the most
>> >> > > rapidly changing location in memory. This difference
>> >> represents a great
>> >> > > potential for confusion.
>> >> > >
>> >> > > in
>> >> > >
>> >> > >
>> http://docs.scipy.org/doc/numpy/user/basics.indexing.html
>> >> > >
>> >> > > is quite misleading, as C-order means that the last
>> index
>> >> changes rapidly,
>> >> > > not the
>> >> > > memory location.
>> >> > >
>> >> > >
>> >> > Any index can change rapidly, depending on whether is in
>> an
>> >> inner loop or
>> >> > not. The important distinction between C and Fortran order
>> is
>> >> how indices
>> >> > translate to memory locations. The documentation seems
>> >> correct to me,
>> >> > although it might make more sense to say the last index
>> >> addresses a
>> >> > contiguous range of memory. Of course, with modern
>> >> processors, actual
>> >> > physical memory can be mapped all over the place.
>> >> >
>> >> > Chuck
>> >>
>> >> To me, saying that the last index represents the most rapidly
>> >> changing memory
>> >> location means that if I change the last index, the memory
>> >> location changes
>> >> a lot, which is not true for C-order. So for C-order, supposed
>> >> one scans the memory
>> >> linearly (the desired scenario), it is the last *index* that
>> >> changes most rapidly.
>> >>
>> >> The inverted picture looks like this: For C-order, changing
>> the
>> >> first index
>> >> leads to the most rapid jump in *memory*.
>> >>
>> >> Still have the feeling the doc is very misleading at this
>> >> important issue.
>> >>
>> >> Pavel
>> >>
>> >>
>> >> The distinction between your two perspectives is that one is using
>> >> for-loop traversal of indices, the other is using
>> pointer-increment
>> >> traversal of memory; from each of your perspectives, your
>> >> conclusions are "correct," but my inclination is that the
>> >> pointer-increment traversal of memory perspective is closer to the
>> >> "spirit" of the docstring, no?
>> >>
>> >>
>> >> I think the confusion is in "most rapidly changing memory location",
>> >> which is kind of ambiguous because a change in the indices is always a
>> >> change in memory location if one hasn't used index tricks and such. So
>> >> from a time perspective it means nothing, while from a memory
>> >> perspective the largest address changes come from the leftmost indices.
>> >
>> > Exactly. Rate of change with respect to what, or as you do what?
>> >
>> > I suggest something like the following wording, if you don't mind the
>> > verbosity as a means of conjuring up an image (although putting in
>> > diagrams would make it even clearer--undoubtedly there are already good
>> > illustrations somewhere on the web):
>> >
>> > ------------
>> >
>> > Note to those used to Matlab, IDL, or Fortran memory order as it relates
>> > to indexing. Numpy uses C-order indexing by default, although a numpy
>> > array can be designated as using Fortran order. [With C-order,
>> > sequential memory locations are accessed by incrementing the last
>>
>> Maybe change "sequential" to "contiguous".
>>
>> I was thinking maybe "subsequent" might be a better word.
>
IMV, contiguous has more of a "physical" connotation. (That just isn't
valid in Numpy, correct?) So I'd prefer subsequent as an alternative to
sequential.
>
> In the end, we need to communicate this clearly. No matter which language,
> I have always found it difficult to get new programmers to understand the
> importance of knowing the difference between row-major and column-major. A
> "thick" paragraph isn't going to help to get the idea across to a person who
> doesn't even know that a problem exists.
>
> Maybe a car analogy would be good here...
>
> Maybe if one imagine city streets (where many of the streets are one-way),
> and need to drop off mail at each address. Would it be more efficient to go
> up and back a street or to drop off mail at the first address of the street
> and then move on to the first address of the next street?
>
But the issue isn't one of efficiency, it's merely an arbitrarily chosen
convention. (Does anyone know the history of the choices for FORTRAN and C,
esp. why K&R chose the opposite of what was already in common usage in
FORTRAN? Just curious?)
Does the OP have an opinion on the various alternatives offered so far?
DG
>
> Just my two cents...
>
> Ben Root
>
>
>>
>> > index.] For a two-dimensional array, think if it as a table. With
>> > C-order indexing the table is stored as a series of rows, so that one is
>> > reading from left to right, incrementing the column (last) index, and
>> > jumping ahead in memory to the next row by incrementing the row (first)
>> > index. With Fortran order, the table is stored as a series of columns,
>> > so one reads memory sequentially from top to bottom, incrementing the
>> > first index, and jumps ahead in memory to the next column by
>> > incrementing the last index.
>> >
>> > One more difference to be aware of: numpy, like python and C, uses
>> > zero-based indexing; Matlab, [IDL???], and Fortran start from one.
>> >
>> > -----------------
>> >
>> > If you want to keep it short, the key wording is in the sentence in
>> > brackets, and you can chop out the table illustration.
>> >
>> > Eric
>> >
>> >
>> >>
>> >> Chuck
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
--
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide. (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100608/9c65b5b3/attachment.html
More information about the NumPy-Discussion
mailing list