[SciPy-User] [ANN] scikit.statsmodels 0.2.0 release

Bruce Southey bsouthey@gmail....
Fri Feb 19 10:57:01 CST 2010

On 02/19/2010 10:29 AM, Gael Varoquaux wrote:
> On Fri, Feb 19, 2010 at 09:42:14AM -0600, Bruce Southey wrote:
>> I really do think that the scikits learn and statsmodels must talk
>> together now that learn has had a release as well ( I don't recall
>> seeing it mentioned hint hint!).
> That's a good point. In the long run, I think I would like statsmodels to
> be a dependency of scikit learn, because I hate reimplementing stuff.
I agree with that reimplementing stuff but it is hard to a common ground.

> The difference that I see between scikit.learn and statsmodels is that we
> have C code[*], and we will most probably end up with C++ code.
Will it end up as cython?
(I just used the supplied Python bindings of libsvm so this could be 
> Lets say that the focus between scikit.learn and statsmodel is most
> probably going to be slightly different.
Having done both (with papers), I find this type of comment assuming 
because underlying both is the same concepts. What I would like to avoid 
is having different user syntax for basic models for the same model. For 
example, with logistic regression in SAS you have to be careful of which 
is the default event setting as it varies across procedures. At least 
these SAS procedures use the same unmodified dataset unlike some of the 
R packages that do lars/lasso.

>> What would be nice is the acceptance of input data types between learn
>> and statsmodels especially for things like logistic regression. While I
>> understand the need for duplicate functions, it may be desirable share
>> at least code since both code bases are still relatively 'new'.
> Well, as far as I am concerned, data types are numpy arrays. I am weary
> of implmenting higher level abstractions. Its more the APIs that may
> different, and that we will have to keep in sync.
> My 2 cents,
> Gaël
> [*] For instance, we are starting to get really nice libsvm bindings.
I do agree especially now that I have learnt the 'array' approach of 
doing things.

In some way my view of integration of things is Zelig -not that I have 
really looked at it (as it is in R) :

The seamless ability to link packages is rather appealing and both 
scikits share at least numpy.


More information about the SciPy-User mailing list