[Numpy-discussion] summing over more than one axis

Bruce Southey bsouthey@gmail....
Thu Aug 19 16:53:00 CDT 2010


  On 08/19/2010 04:20 PM, josef.pktd@gmail.com wrote:
> On Thu, Aug 19, 2010 at 4:03 PM, John Salvatier
> <jsalvati@u.washington.edu>  wrote:
>> Precise in what sense? Numerical accuracy? If so, why is that?
> I don't remember where I ran into this example, maybe integer
> underflow (?) with addition.
> NIST ANOVA test cases have some nasty badly scaled variables
>
> but I have problems creating one, difference in 10th or higher digit
>
>>>> a = 1000000*np.random.randn(10000,1000)
>>>> a.sum()
> -820034796.05545747
>>>> np.sort(a.ravel())[::-1].sum()
> -820034795.87886333
>>>> np.sort(a.ravel()).sum()
> -820034795.88172638
>>>> np.sort(a,0)[::-1].sum()
> -820034795.82333243
>>>> np.sort(a,1)[::-1].sum()
> -820034796.05559027
>>>> a.sum(-1).sum(-1)
> -820034796.05551744
>>>> np.sort(a,1)[::-1].sum(-1).sum(-1)
> -820034796.05543578
>>>> np.sort(a,0)[::-1].sum(-1).sum(-1)
> -820034796.05590343
>>>> np.sort(a,1).sum(-1).sum(-1)
> -820034796.05544424
>>>> am = a.mean()
>>>> am*a.size + np.sort(a-am,1).sum(-1).sum(-1)
> -820034796.05554879
>>>> a.size * np.sort(a,1).mean(-1).mean(-1)
> -820034796.05544722
>
> badly scaled or badly sorted arrays don't add up well
>
> but I'm not able to get worse than 10th or 11th decimal in some random
> generated examples with size 10000x1000
>
> Josef
>
>
>
>> On Thu, Aug 19, 2010 at 12:13 PM,<josef.pktd@gmail.com>  wrote:
>>> On Thu, Aug 19, 2010 at 11:29 AM, Joe Harrington<jh@physics.ucf.edu>
>>> wrote:
>>>> On Thu, 19 Aug 2010 09:06:32 -0500, G?khan Sever<gokhansever@gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Aug 19, 2010 at 9:01 AM, greg whittier<gregwh@gmail.com>  wrote:
>>>>>
>>>>>> I frequently deal with 3D data and would like to sum (or find the
>>>>>> mean, etc.) over the last two axes.  I.e. sum a[i,j,k] over j and k.
>>>>>> I find using .sum() really convenient for 2d arrays but end up
>>>>>> reshaping 2d arrays to do this.  I know there has to be a more
>>>>>> convenient way.  Here's what I'm doing
>>>>>>
>>>>>> a = np.arange(27).reshape(3,3,3)
>>>>>>
>>>>>> # sum over axis 1 and 2
>>>>>> result = a.reshape((a.shape[0], a.shape[1]*a.shape[2])).sum(axis=1)
>>>>>>
>>>>>> Is there a cleaner way to do this?  I'm sure I'm missing something
>>>>>> obvious.
>>>>>>
>>>>>> Thanks,
>>>>>> Greg
>>>>>>
>>>>> Using two sums
>>>>>
>>>>> np.sum(np.sum(a, axis=-2), axis=1)
>>>> Be careful.  This works for sums, but not for operations like median;
>>>> the median of the row medians may not be the global median.  So, you
>>>> need to do the medians in one step.  I'm not aware of a method cleaner
>>>> than manually reshaping first.  There may also be speed reasons to do
>>>> things in one step.  But, two steps may look cleaner in code.
>>> I think, two .sums() are the most accurate, if precision matters. One
>>> big summation is often not very precise.
>>>
>>> Josef
>>>
>>>
You can use dtype option in many functions like sum that allow a dtype 
with a higher precision to be used than the input dtype. It also helps 
with overflow as well such as summing integers as you don't have to 
convert the input dtype first.  However, the value very much depends on 
your operating system notably windows platforms that don't support 
highest dtypes (so float128 is not going to help over float64).

Alternative use another approach to avoid loss of precision such as
Python's math.fsum()
http://docs.python.org/library/math.html

Or Recipe 393090: Binary floating point summation accurate to full 
precision:
http://code.activestate.com/recipes/393090/

Or Recipe 298339: More accurate sum (Python)
http://code.activestate.com/recipes/298339/

These are probably more accurate than first sorting the data from low to 
high and then summing from low to high.

Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100819/ab8d4b5e/attachment.html 


More information about the NumPy-Discussion mailing list