[SciPy-User] scipy.stats.nanmedian
josef.pktd@gmai...
josef.pktd@gmai...
Fri Jan 22 11:03:26 CST 2010
On Fri, Jan 22, 2010 at 11:52 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Fri, Jan 22, 2010 at 8:46 AM, <josef.pktd@gmail.com> wrote:
>> On Fri, Jan 22, 2010 at 11:09 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>> On Thu, Jan 21, 2010 at 8:18 PM, <josef.pktd@gmail.com> wrote:
>>>> On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>>>> On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
>>>>>> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote:
>>>>>>> That's the only was I was able to figure out how to pull 1.0 out of
>>>>>>> np.array(1.0). Is there a better way?
>>>>>>
>>>>>>
>>>>>> .item()
>>>>>
>>>>> Thanks. item() looks better than tolist().
>>>>>
>>>>> I simplified the function:
>>>>>
>>>>> def nanmedian(x, axis=0):
>>>>> x, axis = _chk_asarray(x,axis)
>>>>> if x.ndim == 0:
>>>>> return float(x.item())
>>>>> x = x.copy()
>>>>> x = np.apply_along_axis(_nanmedian,axis,x)
>>>>> if x.ndim == 0:
>>>>> x = float(x.item())
>>>>> return x
>>>>>
>>>>> and opened a ticket:
>>>>>
>>>>> http://projects.scipy.org/scipy/ticket/1098
>>>>
>>>>
>>>> How about getting rid of apply_along_axis? see attachment
>>>>
>>>> I don't know whether or how much faster it is, but there is a ticket
>>>> that the current version is slow.
>>>> No hidden bug or corner case guarantee yet.
>>>
>>> It is faster. But here is one case it does not handle:
>>>
>>>>> nanmedian([1, 2])
>>> array([ 1.5])
>>>>> np.median([1, 2])
>>> 1.5
>>>
>>> I'm sure it could be fixed. But having to fix it, and the fact that it
>>> is a larger change, decreases the likelihood that it will make it into
>>> the next version of scipy. One option is to make the small bug fix I
>>> suggested (ticket #1098) and add the corresponding unit tests. Then we
>>> can take our time to design a better version of nanmedian.
>>
>> I didn't see the difference to np.median for this case, I think I was
>> taking the shape answer from the other thread on the return of splines
>> and interpolation.
>>
>> If I change the last 3 lines to
>> if nanmed.size == 1:
>> return nanmed.item()
>> return nanmed
>>
>> then I get agreement with numpy for the following test cases
>>
>> print nanmedian(1), np.median(1)
>> print nanmedian(np.array(1)), np.median(1)
>> print nanmedian(np.array([1])), np.median(np.array([1]))
>> print nanmedian(np.array([[1]])), np.median(np.array([[1]]))
>> print nanmedian(np.array([1,2])), np.median(np.array([1,2]))
>> print nanmedian(np.array([[1,2]])), np.median(np.array([[1,2]]),axis=0)
>> print nanmedian([1]), np.median([1])
>> print nanmedian([[1]]), np.median([[1]])
>> print nanmedian([1,2]), np.median([1,2])
>> print nanmedian([[1,2]]), np.median([[1,2]],axis=0)
>> print nanmedian([1j,2]), np.median([1j,2])
>>
>> Am I still missing any cases?
>>
>> The vectorized version should be faster for this case
>> http://projects.scipy.org/scipy/ticket/740
>> but maybe not for long and narrow arrays.
>
> Here is an odd one:
>
>>> nanmedian(True)
> 1.0
>>> nanmedian([True])
> 0.5 # <--- strange
>
>>> np.median(True)
> 1.0
>>> np.median([True])
> 1.0
definitely weird
>>> (np.array(True)+np.array(True))/2.
0.5
>>> np.array([True, True]).sum()
2
>>> np.array([True, True]).mean()
1.0
I assumed mean (is used by np.ma.median) is the same as adding and dividing by 2
Josef
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list