[SciPy-User] multivariate empirical distribution function, avoid double loop ?
Sat Aug 27 19:00:29 CDT 2011
On Wed, Aug 24, 2011 at 9:23 PM, <email@example.com> wrote:
> On Wed, Aug 24, 2011 at 7:25 PM, Robert Kern <firstname.lastname@example.org> wrote:
>> On Wed, Aug 24, 2011 at 09:23, <email@example.com> wrote:
>>> Does anyone know whether there is an algorithm that avoids the double
>>> loop to get a multivariate empirical distribution function?
>>> for point in data:
>>> count how many points in data are smaller or equal to point
>>> with 1d data it's just argsort(argsort(data))
>>> double loop version with some test cases is attached.
>>> I didn't see a way that sorting would help.
>> If you can bear to make a few (nobs, nobs) bool arrays, you can do
>> just a kvars-sized loop in Python:
>> dominates = np.ones((len(data), len(data)), dtype=bool)
>> for x in data.T:
>> dominates &= x[:,np.newaxis] > x
>> sorta_ranks = dominates.sum(axis=1)
> Thanks, quite a bit better, 14 times faster for (5000,2) and still 2.5
> times faster for (5000,20),
> 12 times for (10000,3) compared to my original.
attached a first draft of what I'm after
>> Robert Kern
>> "I have come to believe that the whole world is an enigma, a harmless
>> enigma that is made terrible by our own mad attempt to interpret it as
>> though it had an underlying truth."
>> -- Umberto Eco
>> SciPy-User mailing list
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 5167 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20110827/0284065b/attachment.py
More information about the SciPy-User