>> What do others think is missing? Off the top of my head: basic
wavelets
>> (dwt primarily) and more complete interpolation strategies (I'd like to
>> finish the basic interpolation approaches I started a while ago).
>> Originally, I used GAMS as an "overview" of the kinds of things needed
in
>> SciPy. Are there other relevant taxonomies these days?
> Well, probably not something that fits these ideas for scipy
> one-to-one, but the Berkeley 'thirteen dwarves' list from the 'View
> from Berkeley' paper on parallel computing is not a bad starting
> point; summarized here they are:
> >
> > Dense Linear Algebra
> > Sparse Linear Algebra [1]
> > Spectral Methods
> > N-Body Methods
> > Structured Grids
> > Unstructured Grids
> > MapReduce
> > Combinational Logic
> > Graph Traversal
> > Dynamic Programming
> > Backtrack and Branch-and-Bound
> > Graphical Models
> > Finite State Machines
> Descriptions of each can be found here:
> http://view.eecs.berkeley.edu/wiki/Dwarf_Mine and the full study is
> here:
> >
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
> >
> That list is biased towards the classes of codes used in
> supercomputing environments, and some of the topics are probably
> beyond the scope of scipy (say structured/unstructured grids, at least
> for now).
> >
> But it can be a decent guiding outline to reason about what are the
> 'big areas' of scientific computing, so that scipy at least provides
> building blocks that would be useful in these directions.
> >
> One area that hasn't been directly mentioned too much is the situation
> with statistical tools. On the one hand, we have the phenomenal work
> of pandas, statsmodels and sklearn, which together are helping turn
> python into a great tool for statistical data analysis (understood in
> a broad sense). But it would probably be valuable to have enough of a
> statistical base directly in numpy/scipy so that the 'out of the box'
> experience for statistical work is improved. I know we have
> scipy.stats, but it seems like it needs some love.
> It seems like scipy stats has received quite a bit of attention. There
> is always more to do, of course, but I'm not sure what specifically you
> think is missing or needs work.
Test coverage, for example. I recently fixed several wildly incorrect
skewness and kurtosis formulas for some distributions, and I now have very
little confidence that any of the other distributions are correct. Of
course, most of them probably *are* correct, but without tests, all are in
doubt.
Warren
A big question to me is the impact of data-frames as the underlying
> data-representation of the algorithms and the relationship between the
> data-frame and a NumPy array.
