[Numpy-discussion] arange(start, stop, step) and floating point (Ticket #8)

Tim Hochberg tim.hochberg at cox.net
Wed Feb 8 21:01:01 CST 2006


Sasha wrote:

>On 2/8/06, Tim Hochberg <tim.hochberg at cox.net> wrote:
>  
>
>>Sasha wrote:
>>Isn't that bad numerically? That is, isn't (n*step) much more accurate
>>than (step + step + ....)?
>>    
>>
>
>It does not matter whether n*step is more accurate than step+...+step.
> As long as arange uses stop+=step loop to fill in the values, the
>last element may exceed stop even if start + length*step does not. 
>One may argue that filling with start + i*step is more accurate, but
>that will probably be much slower (even than my O(N) algorithm).
>
>  
>
>>It also seems needlessly inefficient;
>>    
>>
>I proposed O(N) algorithm just to counter Robert's argument that it is
>not possible to ensure the invariant.  On the other hand I don't think
>it is that bad - I would expect the length computing loop to be much
>faster than the main loop that involves main memory.
>
>  
>
>>you
>>should be able to to it in at most a few steps:
>>
>>length = (stop - start)/step
>>while length * step < stop:
>>    length += 1
>>while length * step >= stop:
>>   length -= 1
>>
>>Fix errors, convert to C and enjoy. It should normally take only a few
>>tries to get the right N.
>>    
>>
>
>This will not work (even if you fix the error of missing start+ in the
>conditions :-): start + length*step < stop  does not guarantee than
>start + step + ... + step < stop.
>  
>

Indeed. When I first was thinking about this, I assumed that arange was 
computed as essentially start + range(0, n)*step. Not, as an 
accumulation. After I actually looked at what arange did, I failed to 
update my thinking -- my mistake, sorry.

>  
>
>>I see that the internals of range use repeated adding to make the range.
>>I imagine that is why you proposed the repeated adding. I think that
>>results in error that's on the order of length ULP, while multiplying
>>would result in error on the order of 1 ULP. So perhaps we should fix
>>XXX_fill to be more accurate if nothing else.
>>
>>    
>>
>
>I don't think accuracy of XXX_fill for fractional steps is worth improving.
>  
>
I would think that would depend on (a) how hard it is to do (I think the 
answer to that is not hard at all), (b) how much of a performance impact 
would it have (some, probably, since one's adding a multiply, and (c) 
how much one values the minor increase in accuracy versus whatever 
performance impact this might have. The change I was referring to would 
look more or less like:

static void
FLOAT_fill(float *buffer, intp length, void *ignored)
{
    intp i;   
    float start = buffer[0];
    float delta = buffer[1];
    delta -= start;
    /*start += (delta + delta); */
    buffer += 2;
    for (i=2; i<length; i++, buffer++) {
        *buffer = start + i*delta;
       /** buffer = start; */
       /* start += delta; */
    }
}


 [I will note that performance doesn't seem to have been of overriding 
concern in writing XXX_fill, since unless I'm mistaken, there's an 
integer add in the inner loop that could be optimized out (buffermax = 
buffer+length could be calculated and used as the loop boundary; i could 
be dropped entirely -- of course doing that would make the 
multiplication based solution look that much worse performance wise].

>In the cases where accuracy matters, one can always use integral step
>and multiply the result by a float.  However, if anything is done to
>that end, I would suggest to generalize XXX_fill functions to allow
>accumulation be performed using a different type similarly to the way
>op.reduce and op.accumulate functions us their (new in numpy) dtype
>argument.
>  
>
Really? That seems unnecessarily baroque.

>>>3. Change arange to ensure that arange(..., stop, ...)[-1] < stop.
>>>
>>>      
>>>
>>I see that Travis has vetoed this in any event, but perhaps we should
>>fix up the fill functions to be more accurate and maybe most of the
>>problem would just magically go away.
>>    
>>
>
>The more I think about this, the more I am convinced that using arange
>with a non-integer step is a bad idea.  Since making it illegal is not
>an option, I don't see much of a point in changing exactly how bad it
>is. Users who want fractional steps should just be educated about
>linspace.
>  
>
Are integer steps with noninteger start and stop safe?  For that matter 
are integer steps safe for sufficiently large, floating point, but 
integral, values of start and stop. It seems like they might well not 
be, but I haven't thought it through very well. I suppose that even if 
this was technically unsafe, in practice it would probably be pretty 
hard to get into trouble in that way.

Regards,

-tim






More information about the Numpy-discussion mailing list