[SciPy-User] multivariate empirical distribution function, avoid double loop ?
Wed Aug 24 20:23:09 CDT 2011
On Wed, Aug 24, 2011 at 7:25 PM, Robert Kern <firstname.lastname@example.org> wrote:
> On Wed, Aug 24, 2011 at 09:23, <email@example.com> wrote:
>> Does anyone know whether there is an algorithm that avoids the double
>> loop to get a multivariate empirical distribution function?
>> for point in data:
>> count how many points in data are smaller or equal to point
>> with 1d data it's just argsort(argsort(data))
>> double loop version with some test cases is attached.
>> I didn't see a way that sorting would help.
> If you can bear to make a few (nobs, nobs) bool arrays, you can do
> just a kvars-sized loop in Python:
> dominates = np.ones((len(data), len(data)), dtype=bool)
> for x in data.T:
> dominates &= x[:,np.newaxis] > x
> sorta_ranks = dominates.sum(axis=1)
Thanks, quite a bit better, 14 times faster for (5000,2) and still 2.5
times faster for (5000,20),
12 times for (10000,3) compared to my original.
> Robert Kern
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> -- Umberto Eco
> SciPy-User mailing list
More information about the SciPy-User