[SciPy-Dev] Bootstrap confidence limits code
Wed Aug 8 13:57:20 CDT 2012
On Wed, Aug 8, 2012 at 2:38 PM, Constantine Evans <firstname.lastname@example.org>wrote:
> Hello everyone,
> A few years ago I implemented a scikit for bootstrap confidence limits
> (https://github.com/cgevans/scikits-bootstrap). I didn’t think much
> about it after that until recently, when I realized that some people
> are actually using it, and that there’s apparently been some talk
> about implementing this functionality in either scipy.stats or
> statsmodels (I should thank Randal Olson for discussing this and
> bringing it to my attention).
> As such I’ve rewritten most of the code, and written up some
> docstrings. The current code can do confidence intervals with basic
> percentile interval, bias-corrected accelerated, and approximate
> bootstrap confidence methods, and can also provide bootstrap and
> jackknife indexes. Most of it is implemented from the descriptions in
> Efron and Tibshirani’s Introduction to the Bootstrap, but the ABC code
> at the moment is a port from the modified-BSD-licensed bootstrap
> package for R (not the boot package) as I’m not entirely confident in
> my understanding of the method.
> And so, I have a few questions for everyone:
> * Is there any interest in including this sort of code in either
> scipy.stats or statsmodels? If so, where do people think would be the
> better place? The code is relatively small; at the moment it is less
> than 200 lines, with docstrings probably making up 100 of those lines.
I think it would be great to have this in statsmodels. I filed an
enhancement ticket about it this morning (also brought to my attention by
Randy's blog post).
> * Also, if so, what would need to be changed, added, and improved
> beyond what is mentioned in the Contributing to Scipy part of the
> reference guide? I’m never a fan of my own code, and imagine quite a
> bit would need to be fixed; I know tests will need to be added too.
We can discuss further on the statsmodels mailing list (cc'd) unless
someone feels strongly that this should go into scipy. I'm not sure about
API yet so that it can be general and used across all the models in
statsmodels. It's one of the reasons I've put off incorporating code like
this for so long.
> In addition, I have a few questions about what would be better
> practice for the API, and I haven’t really found a guide on best
> practices for Scipy:
> * When I started writing the code, I wrote a single function ci for
> confidence intervals, with a method argument to choose the method.
> This is easy for users, especially so that they don’t have to look
> through documentation to realize that BCA is the most generally useful
> method (at least from everything I’ve read) and that there really
> isn’t any reason to use many of the simpler methods. However, ABC
> takes different paramenters, and needs a statistic function that takes
> weights, which makes this single-function organization trickier. At
> the moment, I have a separate function for ABC. Would it be better to
> split up all the methods to their own functions?
I think this might be preferable.
> * ABC requires a statistic function that takes weights. I’ve noticed
> that things like np.average takes a weights= argument. Would it be
> better to require input of a stat(data,weights) function, or input of
> a stat(data,weights=) with weights as a named argument? The latter
> would be nice in terms of allowing the same function to be used for
> all methods, but would make it impossible to use a lambda for the
> function. Is there some other method of doing this entirely?
> * Are there any missing features that anyone thinks should be added?
> I apologize if much of this is answered elsewhere, I just haven’t
> found any of it; I also apologize if this is far too long-winded and
> Constantine Evans
> SciPy-Dev mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev