[Numpy-discussion] Multiple Linear Regression?

Alexandre Alexandre.Fayolle at logilab.fr
Mon Apr 29 06:20:03 CDT 2002


On Mon, Apr 29, 2002 at 03:13:44AM -0700, Jasper Phillips wrote:
> I'm helping my wife with programming for her economics thesis, which needs
> to calculate a "Multiple Linear Regression" on her data.
> 
> Does anyone know of any (preferably though not necesarrily free) software
> that can do this? I'm working in Python, but not limited to it as I
> can relatively freely access other languages.
> 
> I'm still looking for a library written in Python, but haven't had any luck.
> 

I'm helping my wife with her History PhD, and have to deal with similar
stuff. I found R to be a very useful environment for statistical
computations. R is a free software clone of S-plus, which is to statistics
what Matlab is to linear algebra and automation. 

Pros: 
 - programming environment, with a high level programming language
 - extensive statistical and linalg library (using C and FORTRAN code)
 - lots of third party code available, covering a very wide range of
   situations
 - Python bindings available if you don't want to learn the Scheme-like
   language
 - Tons of documentation available
 - Excellent support through the mailing lists
 - GPL'd
 - Tons of way to import data (ranging from CSV files to ODBC queries)
 - 2 printed books available, at Springer Verlag
 - postscript, png, wmf, X outputs, with precise control of the layout
   of the graphs and figures available for a nice colourful thesis

Cons:
 - the language can be a bit weird at times (it took me some time to get
   used to '.' being used instead of '_' and vice versa in the scoping
   and variable naming), but you can use Python to script R, thanks to
   RPython
 - it's quite a big piece of code, with a rather steep learning curve
   and you need time to get inside it
 - the documentation is aimed at professional statisticians. I had to
   dig back in my statistics courses and to buy a couple of books on
   that topic for the software to become really useful. Asking newbie
   statistician questions on the r-help mailing list is off-topic
 - the springer verlag books are very expensive (Modern Applied
   Statistics with S-plus costs something like 70 euros), but they are
   great

So you have a powerful tool available at your fingertips, designed to do
precisely what you need. I think it's worth taking the time to look at
it carefully. The more I get to understand the topic, the more ideas I
get for new ways of exploring the data of my wife's PhD. 

 
Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).




More information about the Numpy-discussion mailing list