[SciPy-Dev] SciPy Goal

Travis Oliphant travis@continuum...
Thu Jan 5 00:26:05 CST 2012


On Jan 5, 2012, at 12:02 AM, Warren Weckesser wrote:

> 
> 
> On Wed, Jan 4, 2012 at 9:29 PM, Travis Oliphant <travis@continuum.io> wrote:
> 
> On Jan 4, 2012, at 8:22 PM, Fernando Perez wrote:
> 
> > Hi all,
> >
> > On Wed, Jan 4, 2012 at 5:43 PM, Travis Oliphant <travis@continuum.io> wrote:
> >> What do others think is missing?  Off the top of my head:   basic wavelets
> >> (dwt primarily) and more complete interpolation strategies (I'd like to
> >> finish the basic interpolation approaches I started a while ago).
> >> Originally, I used GAMS as an "overview" of the kinds of things needed in
> >> SciPy.   Are there other relevant taxonomies these days?
> >
> > Well, probably not something that fits these ideas for scipy
> > one-to-one, but the Berkeley 'thirteen dwarves' list from the 'View
> > from Berkeley' paper on parallel computing is not a bad starting
> > point; summarized here they are:
> >
> >    Dense Linear Algebra
> >    Sparse Linear Algebra [1]
> >    Spectral Methods
> >    N-Body Methods
> >    Structured Grids
> >    Unstructured Grids
> >    MapReduce
> >    Combinational Logic
> >    Graph Traversal
> >    Dynamic Programming
> >    Backtrack and Branch-and-Bound
> >    Graphical Models
> >    Finite State Machines
> 
> 
> This is a nice list, thanks!
> 
> >
> > Descriptions of each can be found here:
> > http://view.eecs.berkeley.edu/wiki/Dwarf_Mine and the full study is
> > here:
> >
> > http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
> >
> > That list is biased towards the classes of codes used in
> > supercomputing environments, and some of the topics are probably
> > beyond the scope of scipy (say structured/unstructured grids, at least
> > for now).
> >
> > But it can be a decent guiding outline to reason about what are the
> > 'big areas' of scientific computing, so that scipy at least provides
> > building blocks that would be useful in these directions.
> >
> 
> Thanks for the links.
> 
> 
> > One area that hasn't been directly mentioned too much is the situation
> > with statistical tools.  On the one hand, we have the phenomenal work
> > of pandas, statsmodels and sklearn, which together are helping turn
> > python into a great tool for statistical data analysis (understood in
> > a broad sense).  But it would probably be valuable to have enough of a
> > statistical base directly in numpy/scipy so that the 'out of the box'
> > experience for statistical work is improved.  I know we have
> > scipy.stats, but it seems like it needs some love.
> 
> It seems like scipy stats has received quite a bit of attention.   There is always more to do, of course, but I'm not sure what specifically you think is missing or needs work.
> 
> 
> Test coverage, for example.  I recently fixed several wildly incorrect skewness and kurtosis formulas for some distributions, and I now have very little confidence that any of the other distributions are correct.  Of course, most of them probably *are* correct, but without tests, all are in doubt.

There is such a thing as *over-reliance* on tests as well.   Tests help but it is not a black or white kind of thing as seems to come across in many of the messages on this list about what part of scipy is in "good shape" or "easy to maintain" or "has love."    Just because tests exist doesn't mean that you can trust the code --- you also then have to trust the tests.   Ultimately, trust is built from successful *usage*.   Tests are only a pseudo-subsitute for that usage.  It so happens that usage that comes along with the code itself makes it easier to iterate on changes and catch some of the errors that can happen on re-factoring. 

In summary, tests are good!  But, they also add overhead and themselves must be maintained, and I don't think it helps to disparage working code.   I've seen a lot of terrible code that has *great* tests and seen projects fail because developers focus too much on the tests and not enough on what the code is actually doing.   Great tests can catch many things but they cannot make up for not paying attention when writing the code.  

-Travis


> 
> Warren
> 
> 
>    A big question to me is the impact of data-frames as the underlying data-representation of the algorithms and the relationship between the data-frame and a NumPy array.
> 
> -Travis
> 
> 
> >
> > Cheers,
> >
> > f
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev@scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20120105/6b05947e/attachment.html 


More information about the SciPy-Dev mailing list