[SciPy-Dev] SciPy Goal
Thu Jan 5 00:47:13 CST 2012
On Thu, Jan 5, 2012 at 7:26 AM, Travis Oliphant <email@example.com> wrote:
> On Jan 5, 2012, at 12:02 AM, Warren Weckesser wrote:
> On Wed, Jan 4, 2012 at 9:29 PM, Travis Oliphant <firstname.lastname@example.org>wrote:
>> On Jan 4, 2012, at 8:22 PM, Fernando Perez wrote:
>> > Hi all,
>> > On Wed, Jan 4, 2012 at 5:43 PM, Travis Oliphant <email@example.com>
>> >> What do others think is missing? Off the top of my head: basic
>> >> (dwt primarily) and more complete interpolation strategies (I'd like to
>> >> finish the basic interpolation approaches I started a while ago).
>> >> Originally, I used GAMS as an "overview" of the kinds of things needed
>> >> SciPy. Are there other relevant taxonomies these days?
>> > Well, probably not something that fits these ideas for scipy
>> > one-to-one, but the Berkeley 'thirteen dwarves' list from the 'View
>> > from Berkeley' paper on parallel computing is not a bad starting
>> > point; summarized here they are:
>> > Dense Linear Algebra
>> > Sparse Linear Algebra 
>> > Spectral Methods
>> > N-Body Methods
>> > Structured Grids
>> > Unstructured Grids
>> > MapReduce
>> > Combinational Logic
>> > Graph Traversal
>> > Dynamic Programming
>> > Backtrack and Branch-and-Bound
>> > Graphical Models
>> > Finite State Machines
>> This is a nice list, thanks!
>> > Descriptions of each can be found here:
>> > http://view.eecs.berkeley.edu/wiki/Dwarf_Mine and the full study is
>> > here:
>> > http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
>> > That list is biased towards the classes of codes used in
>> > supercomputing environments, and some of the topics are probably
>> > beyond the scope of scipy (say structured/unstructured grids, at least
>> > for now).
>> > But it can be a decent guiding outline to reason about what are the
>> > 'big areas' of scientific computing, so that scipy at least provides
>> > building blocks that would be useful in these directions.
>> Thanks for the links.
>> > One area that hasn't been directly mentioned too much is the situation
>> > with statistical tools. On the one hand, we have the phenomenal work
>> > of pandas, statsmodels and sklearn, which together are helping turn
>> > python into a great tool for statistical data analysis (understood in
>> > a broad sense). But it would probably be valuable to have enough of a
>> > statistical base directly in numpy/scipy so that the 'out of the box'
>> > experience for statistical work is improved. I know we have
>> > scipy.stats, but it seems like it needs some love.
>> It seems like scipy stats has received quite a bit of attention. There
>> is always more to do, of course, but I'm not sure what specifically you
>> think is missing or needs work.
> Test coverage, for example. I recently fixed several wildly incorrect
> skewness and kurtosis formulas for some distributions, and I now have very
> little confidence that any of the other distributions are correct. Of
> course, most of them probably *are* correct, but without tests, all are in
> There is such a thing as *over-reliance* on tests as well.
True in principle, but we're so far from that point that you don't have to
worry about that for the foreseeable future.
> Tests help but it is not a black or white kind of thing as seems to come
> across in many of the messages on this list about what part of scipy is in
> "good shape" or "easy to maintain" or "has love." Just because tests
> exist doesn't mean that you can trust the code --- you also then have to
> trust the tests. Ultimately, trust is built from successful *usage*.
> Tests are only a pseudo-subsitute for that usage. It so happens that usage
> that comes along with the code itself makes it easier to iterate on changes
> and catch some of the errors that can happen on re-factoring.
> In summary, tests are good! But, they also add overhead and themselves
> must be maintained, and I don't think it helps to disparage working code.
> I've seen a lot of terrible code that has *great* tests and seen projects
> fail because developers focus too much on the tests and not enough on what
> the code is actually doing. Great tests can catch many things but they
> cannot make up for not paying attention when writing the code.
Certainly, but besides giving more confidence that code is correct, a major
advantage is that it is a massive help when working on existing code -
especially for new developers. Now we have to be extremely careful in
reviewing patches to check nothing gets broken (including backwards
compatibility). Tests in that respect are not a maintenance burden, but a
As an example, last week I wanted to add a way to easily adjust the
bandwidth of gaussian_kde. This was maybe 10 lines of code, didn't take
long at all. Then I spent some time adding tests and improving the docs,
and thought I was done. After sending the PR, I spent at least an equal
amount of time reworking everything a couple of times to not break any of
the existing subclasses that could be found. In addition it took a lot of
Josef's time to review it all and convince me of the error of my way. A few
tests could have saved us a lot of time.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev