[Numpy-discussion] [ANN] Nanny, faster NaN functions

Keith Goodman kwgoodman@gmail....
Sun Nov 21 17:37:26 CST 2010


On Sun, Nov 21, 2010 at 3:16 PM, Wes McKinney <wesmckinn@gmail.com> wrote:

> What would you say to a single package that contains:
>
> - NaN-aware NumPy and SciPy functions (nanmean, nanmin, etc.)

I'd say yes.

> - moving window functions (moving_{count, sum, mean, var, std, etc.})

Yes.

BTW, we both do arr=arr.astype(float), I think, before doing the
moving statistics. So I speeded things up by running the moving window
backwards and writing the result in place.

> - core subroutines for labeled data

Not sure what this would be. Let's discuss.

> - group-by functions

Yes. I have some ideas on function signatures.

> - other things to add to this list?

A no-op function with a really long doc string!

> In other words, basic building computational tools for making
> libraries like larry, pandas, etc. and doing time series / statistical
> / other manipulations on real world (messy) data sets. The focus isn't
> so much "NaN-awareness" per se but more practical "data wrangling". I
> would be happy to work on such a package and to move all the Cython
> code I've written into it. There's a little bit of datarray overlap
> potentially but I think that's OK

Maybe we should make a list of function signatures along with brief
doc strings to get a feel for what we (and hopefully others) have in
mind?

Where should we continue the discussion? The pystatsmodels mailing
list? By now the numpy list probably thinks of NaN as "Not ANother"
email from this guy.


More information about the NumPy-Discussion mailing list