[SciPy-dev] GSoC Project Proposal: Datasource and Jonathan Taylor's statistical models

Skipper Seabold jsseabold@gmail....
Fri Mar 27 12:43:54 CDT 2009

Hello all,

I am a first year PhD student in Economics at American University, and
I would very much like to participate in the GSoC with the NumPy/SciPy
community.  I am looking for some feedback and discussion before I
submit a proposal.

Judging by the ideas page and the discussion in this thread (
http://mail.scipy.org/pipermail/scipy-dev/2009-February/011373.html )
I think the following project proposal would be useful to the

My proposal would have two parts, the first would be to improve
datasource and integrate it into the numpy/scipy io.  I see this as a
way to get my feet wet working on a project.  I do not imagine that it
would take more than 2-3 weeks work on my end.

The second part would be to get Jonathan Taylor's statistical models
from the NiPy project into scipy.stats.  I think that I would be a
good candidate for this work, as I am currently studying statistics
and learning the ins and outs of NumPy/SciPy, so I don't mind doing
some of the less appealing work as this is also a great learning
opportunity.  Also I see this as a great way to get involved in the
SciPy community in an area that currently needs some attention.  I am
a student, so I would be able to help maintain the code, bug fix, and
address other areas of the statistical capabilities that need

Below is a general outline of my proposal with some areas that I have
identified as needing work.  I am eager to discuss some aspects of the
projects with those that are interested and to work on the appropriate

1) Improve datasource and integrate it into all the numpy/scipy io

Bug Fixes
    Catch and handle malformed URLs


    Improve findfile method
    Improve cache method
    Add zip archive, tar file handling capabilities
    Improve networking interface to handle timeouts and proxies if
there is sufficient interest

    Document changes

    Implement test coverage for new changes

Copy/Move to scipy.io

2) Integrate Jonathan Taylor's statistical models into scipy.stats

These models are currently in the NiPy project
Merge relevant branches (branch trunk-josef models has the most recent
changes, I believe)

I will focus mostly on bringing over the linear models, which I
believe would include at the least:
bspline.py, contrast.py, gam.py, glm.py, model.py, regression.py, utils.py

Bug Fixes
    Bug hunting
    Improve existing test coverage

    Eliminate existing and created duplicate functionality
    Make sure parameters are consistent, etc.


    Document changes
    Make any necessary changes to stats/info.py

    Make sure test coverage is adequate

More information about the Scipy-dev mailing list