[SciPy-Dev] SciPy Goal

Travis Oliphant travis@continuum...
Wed Jan 4 19:43:45 CST 2012


Thanks for the feedback.      My point was to generate discussion and start the ball rolling on exactly the kind of conversation that has started.  

Exactly as Ralf mentioned, the point is to get development on sub-packages --- something that the scikits effort and other individual efforts have done very, very well.   In fact, it has worked so well, that it taught me a great deal about what is important in open source.   My perhaps irrational dislike for the *name* "scikits" should not be interpreted as anything but a naming taste preference (and I am not known for my ability to choose names well anyway).     I very much like and admire the community around scikits.  I just would have preferred something easier to type (even just sci_* would have been better in my mind as high-level packages:  sci_learn, sci_image, sci_statsmodels, etc.).    I didn't feel like I was able to fully participate in that discussion when it happened, so you can take my comments now as simply historical and something I've been wanting to get off my chest for a while.  

Without better packaging and dependency management systems (especially on Windows and Mac), splitting out code doesn't help those who are not distribution dependent (who themselves won't be impacted much).   There are scenarios under which it could make sense to split out SciPy, but I agree that right now it doesn't make sense to completely split everything.   However, I do think it makes sense to clean things up and move some things out in preparation for SciPy 1.0 

One thing that would be nice is what is the view of documentation and examples for the different packages.   Where is work there most needed?  

> 
> Looking at Travis' list of non-core packages I'd say that sparse certainly belongs in the core and integrate probably too. Looking at what's left:
> - constants : very small and low cost to keep in core. Not much to improve there.

Agreed.

> - cluster : low maintenance cost, small. not sure about usage, quality.  

I think cluster overlaps with scikits-learn quite a bit.   It basically contains a K-means vector quantization code with functionality that I suspect  exists in scikits-learn.   I would recommend deprecation and removal while pointing people to scikits-learn for equivalent functionality (or moving it to scikits-learn).  

> - ndimage : difficult one. hard to understand code, may not see much development either way.

This overlaps with scikits-image but has quite a bit of useful functionality on its own.   The package is fairly mature and just needs maintenance.  

> - spatial : kdtree is widely used, of good quality. low maintenance cost.

Good to hear maintenance cost is low. 

> - odr : quite small, low cost to keep in core. pretty much done as far as I can tell.

Agreed.

> - maxentropy : is deprecated, will disappear.

Great. 

> - signal : not in great shape, could be viable independent package. On the other hand, if scikits-signal takes off and those developers take care to improve and build on scipy.signal when possible, that's OK too.

What are the needs of this package?  What needs to be fixed / improved?   It is a broad field and I could see fixing scipy.signal with a few simple algorithms (the filter design, for example), and then pushing a separate package to do more advanced signal processing algorithms.    This sounds fine to me.   It looks like I can put attention to scipy.signal then, as It was one of the areas I was most interested in originally.  

> - weave : no point spending any effort on it. keep for backwards compatibility only, direct people to Cython instead.

Agreed.   Anyway we can deprecate this for SciPy 1.0? 


> Overall, I don't see many viable independent packages there. So here's an alternative to spending a lot of effort on reorganizing the package structure:
> 1. Formulate a coherent vision of what in principle belongs in scipy (current modules + what's missing). 

O.K.  so SciPy should contain "basic" modules that are going to be needed for a lot of different kinds of analysis to be a dependency for other more advanced packages.  This is somewhat vague, of course.   

What do others think is missing?  Off the top of my head:   basic wavelets (dwt primarily) and more complete interpolation strategies (I'd like to finish the basic interpolation approaches I started a while ago).     Originally, I used GAMS as an "overview" of the kinds of things needed in SciPy.   Are there other relevant taxonomies these days? 

http://gams.nist.gov/cgi-bin/serve.cgi


> 2. Focus on making it easier to contribute to scipy. There are many ways to do this; having more accessible developer docs, having a list of "easy fixes", adding info to tickets on how to get started on the reported issues, etc. We can learn a lot from Sympy and IPython here.

Definitely!

> 3. Recognize that quality of code and especially documentation is important, and fill the main gaps.

Is there a write-up of recognized gaps here that we can start with? 

> 4. Deprecate sub-modules that don't belong in scipy (anymore), and remove them for scipy 1.0. I think that this applies only to maxentropy and weave.

I think it also applies to cluster as described above. 

> 5. Find a clear (group of) maintainer(s) for each sub-module. For people familiar with one module, responding to
> tickets and pull requests for that module would not cost so much time.

Is there a list where this is kept?  

> 
> In my opinion, spending effort on improving code/documentation quality and attracting new developers (those go hand in hand) instead of reorganizing will have both more impact and be more beneficial for our users.

Agreed.  Thanks for the feedback. 

Best,

-Travis


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20120104/94ef4ba8/attachment-0001.html 


More information about the SciPy-Dev mailing list