[SciPy-User] Proposal for a new data analysis toolbox
Mon Nov 22 09:58:54 CST 2010
On Mon, Nov 22, 2010 at 11:52 PM, <firstname.lastname@example.org> wrote:
> On Mon, Nov 22, 2010 at 10:35 AM, Keith Goodman <email@example.com>
> > This thread started on the numpy list:
> > I think we should narrow the focus of the package by only including
> > functions that operate on numpy arrays. That would cut out date
> > utilities, label indexing utilities, and binary operations with
> > various join methods on the labels. It would leave us with three
> > categories: faster versions of numpy/scipy nan functions, moving
> > window statistics, and group functions.
> > I suggest we add a fourth category: normalization.
> > FASTER NUMPY/SCIPY NAN FUNCTIONS
> > This work is already underway: http://github.com/kwgoodman/nanny
> > The function signatures for these are easy: we copy numpy, scipy. (I
> > am tempted to change nanstd from scipy's bias=False to ddof=0.)
> scipy.stats.nanstd is supposed to switch to ddof, so don't copy
> inconsistent signatures that are supposed to be depreciated.
I added a patch for nanstd to make this switch to
http://projects.scipy.org/scipy/ticket/1200 just yesterday. Unfortunately
this can not be done in a backwards-compatible way. So it would be helpful
to deprecate the current signature in 0.9.0 if this change is to be made.
> I would like statistics (scipy.stats and statsmodels) to stick with
> default axis=0.
> I would be in favor of axis=None for nan extended versions of numpy
> functions and axis=0 for stats functions as defaults, but since it
> will be a standalone package with wider usage, I will be able to keep
> track of axis=-1.
> > I'd like to use a partial sort for nanmedian. Anyone interested in coding
> > dtype: int32, int64, float 64 for now
> > ndim: 1, 2, 3 (need some recursive magic for nd > 3; that's an open
> > project for anyone)
> > MOVING WINDOW STATISTICS
> > I already have doc strings and unit tests
> > (https://github.com/kwgoodman/la/blob/master/la/farray/mov.py). And I
> > have a cython prototype that moves the window backwards so that the
> > stats can be filled in place. (This assumes we make a copy of the data
> > at the top of the function: arr = arr.astype(float))
> > Proposed function signature: mov_sum(arr, window, axis=-1),
> > mov_nansum(arr, window, axis=-1)
> > If you don't like mov, then: move? roll?
> > I think requesting a minimum number of non-nan elements in a window or
> > else returning NaN is clever. But I do like the simple signature
> > above.
> > Binary moving window functions: mov_nancorr(arr1, arr2, window, axis=-1),
> > Optional: moving window bootstrap estimate of error (std) of the
> > moving statistic. So, what's the std of each erstimate in the
> > mov_median output? Too specialized?
> > dtype: float64
> > ndim: 1, 2, 3, recursive for nd > 0
> > NORMALIZATION
> > I already have nd versions of ranking, zscore, quantile, demean,
> > demedian, etc in larry. We should rename to nanzscore etc.
> > ranking and quantile could use some cython love.
> > I don't know, should we cut this category?
> > GROUP FUNCTIONS
> > Input: array, sequence of labels such as a list, axis.
> > For an array of shape (n,m), axis=0, and a list of n labels with d
> > distinct values, group_nanmean would return a (d,m) array. I'd also
> > like a groupfilter_nanmean which would return a (n,m) array and would
> > have an additional, optional input: exclude_self=False.
> > NAME
> > What should we call the package?
> > Numa, numerical analysis with numpy arrays
> > Dana, data analysis with numpy arrays
> > import dana as da (da=data analysis)
> > ARE YOU CRAZY?
> > If you read this far, you are crazy and would be a good fit for this
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> SciPy-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User