[SciPy-User] multivariate empirical distribution function, avoid double loop ?

josef.pktd@gmai... josef.pktd@gmai...
Sat Aug 27 19:00:29 CDT 2011


On Wed, Aug 24, 2011 at 9:23 PM,  <josef.pktd@gmail.com> wrote:
> On Wed, Aug 24, 2011 at 7:25 PM, Robert Kern <robert.kern@gmail.com> wrote:
>> On Wed, Aug 24, 2011 at 09:23,  <josef.pktd@gmail.com> wrote:
>>> Does anyone know whether there is an algorithm that avoids the double
>>> loop to get a multivariate empirical distribution function?
>>>
>>> for point in data:
>>>     count how many points in data are smaller or equal to point
>>>
>>> with 1d data it's just argsort(argsort(data))
>>>
>>> double loop version with some test cases is attached.
>>>
>>> I didn't see a way that sorting would help.
>>
>> If you can bear to make a few (nobs, nobs) bool arrays, you can do
>> just a kvars-sized loop in Python:
>>
>> dominates = np.ones((len(data), len(data)), dtype=bool)
>> for x in data.T:
>>    dominates &= x[:,np.newaxis] > x
>> sorta_ranks = dominates.sum(axis=1)
>
> Thanks, quite a bit better, 14 times faster for (5000,2) and still 2.5
> times faster for (5000,20),
> 12 times for (10000,3) compared to my original.

attached a first draft of what I'm after

Josef

>
> Josef
>
>>
>> --
>> Robert Kern
>>
>> "I have come to believe that the whole world is an enigma, a harmless
>> enigma that is made terrible by our own mad attempt to interpret it as
>> though it had an underlying truth."
>>   -- Umberto Eco
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mvecdf.py
Type: text/x-python
Size: 5167 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20110827/0284065b/attachment.py 


More information about the SciPy-User mailing list