[SciPy-user] predicting values based on (linear) models

josef.pktd@gmai... josef.pktd@gmai...
Wed Jan 14 23:50:56 CST 2009

On Wed, Jan 14, 2009 at 11:24 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> On Jan 14, 2009, at 10:15 PM, josef.pktd@gmail.com wrote:
>> The function in stats, that I tested or rewrote, are usually identical
>> to around 1e-15, but in some cases R has a more accurate test
>> distribution for small samples (option "exact" in R), while in
>> scipy.stats we only have the asymptotic distribution.
> We could try to reimplement part of it in C,. In any   case, it might
> be worth to output a warning (or at least be very explicit in the doc)
> that the results may not hold for samples smaller than 10-20.

I am not a "C" person and I never went much beyond HelloWorld in C.
I just checked some of the doc strings, and I am usually mention that
we use the asymptotic distribution, but there are still pretty vague
statements in some of the doc strings, such as

"The p-values are not entirely reliable but are probably reasonable for
datasets larger than 500 or so."

>> Also, not all
>> existing functions in scipy.stats are tested (yet).
> We should also try to make sure missing data are properly supported
> (not always possible) and that the results are consistent between the
> masked and non-masked versions.

I added a ticket so we don't forget to check this.

> IMHO, the readiness to incorporate user feedback is here. The feedback
> is not, or at least not as much as we'd like.

That depends on the subpackage, some problems in stats have been
reported and known for quite some time and the expected lifetime of a
ticket can be pretty long. I was looking at different python packages
that use statistics, and many of them are reluctant to use scipy while
numpy looks very well established. But, I suppose this will improve
with time and the user base will increase, especially with the recent
improvements in the build/distribution and the documentation.


More information about the SciPy-user mailing list