# [SciPy-User] multivariate empirical distribution function, avoid double loop ?

josef.pktd@gmai... josef.pktd@gmai...
Wed Aug 24 13:59:09 CDT 2011

```On Wed, Aug 24, 2011 at 2:27 PM, Alan G Isaac <alan.isaac@gmail.com> wrote:
> On 8/24/2011 10:23 AM, josef.pktd@gmail.com wrote:
>> Does anyone know whether there is an algorithm that avoids the double
>> loop to get a multivariate empirical distribution function?
>
> I think that is pretty standard.
> I'll attach something posted awhile ago.
> It seemed right at the time, but I did
> not test it.  Once upon a time it was at
> http://svn.scipy.org/svn/scipy/trunk/scipy/sandbox/dhuard/stats.py
>
> Cheers,
> Alan
>
>
> def empiricalcdf(data, method='Hazen'):
>     """Return the empirical cdf.
>
>     Methods available (here i goes from 1 to N)
>         Hazen:       (i-0.5)/N
>         Weibull:     i/(N+1)
>         Chegodayev:  (i-.3)/(N+.4)
>         Cunnane:     (i-.4)/(N+.2)
>         Gringorten:  (i-.44)/(N+.12)
>         California:  (i-1)/N
>
>     :author: David Huard
>     """
>     i = np.argsort(np.argsort(data)) + 1.
>     nobs = len(data)
>     method = method.lower()
>     if method == 'hazen':
>         cdf = (i-0.5)/nobs
>     elif method == 'weibull':
>         cdf = i/(nobs+1.)
>     elif method == 'california':
>         cdf = (i-1.)/nobs
>     elif method == 'chegodayev':
>         cdf = (i-.3)/(nobs+.4)
>     elif method == 'cunnane':
>         cdf = (i-.4)/(nobs+.2)
>     elif method == 'gringorten':
>         cdf = (i-.44)/(nobs+.12)
>     else:
>         raise 'Unknown method. Choose among Weibull, Hazen, Chegodayev, Cunnane, Gringorten and California.'
>     return cdf

Unfortunately it's 1d only, and I am working on multivariate, at least
bivariate.

Pierre has a 1d version similar to this in scipy.stats.mstats and a,
so far unused, copy is in statsmodels.

Thanks,
Josef

>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
```