[SciPy-dev] GSoC Project Proposal: Datasource and Jonathan Taylor's statistical models
Fri Mar 27 17:09:20 CDT 2009
Bruce Southey wrote:
> Not getting into the merits of either part, I think you are asking for
> trouble doing both because there is not clear connection between the two
> parts. Knowing one part is not going to help you with the other. (The
> argument that it helps get 'your feet wet' is rather lame.)
Your point is well taken. I think I will focus on the second part, as
there seems to be much more interest in the statistical functionality.
And my work would undoubtedly be better if focused.
>I would strongly suggest that the main emphasis is just to get
>Jonathan's code integrated into Scipy and perhaps something from various
>places like the Scikit learn (how many logistic regression or least
>squares codes do we really need?) and EconPy
I will have a closer look through Scikit learn and econpy and revise.
>I would think that it is essential to get these to work with masked
>arrays (allows missing observations) or record array (enables the use of
>'variable' names in model statements like most statistics packages do).
I agree. There has been some discussion of the most appropriate way
to handle this in your thread previously mentioned (eg., it would not
always be appropriate to force conversion to a masked array, should
stats and mstats be merged, etc.), and I would appreciate any
direction that could be offered. I like the idea of the "usemask"
flag here http://mail.scipy.org/pipermail/scipy-dev/2009-February/011414.html
but obviously would defer to others for the best solution. Should I
be spending most of my time looking through mstats rather than stats?
>I would like to see the inclusion of Statistical Reference Datasets Project:
>The datasets would allow us to validate the accuracy of the code.
Very good idea.
Thanks for some initial feedback. I will take under advisement and
revise my proposal as needed.
More information about the Scipy-dev