[SciPy-User] R vs Python for simple interactive data analysis

josef.pktd@gmai... josef.pktd@gmai...
Sat Aug 27 14:55:29 CDT 2011


On Sat, Aug 27, 2011 at 2:44 PM, Christopher Jordan-Squire
<cjordan1@uw.edu> wrote:
> On Sat, Aug 27, 2011 at 2:27 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> Hi,
>>
>> On Sat, Aug 27, 2011 at 11:19 AM, Christopher Jordan-Squire
>> <cjordan1@uw.edu> wrote:
>>> Hi--I've been a moderately heavy R user for the past two years, so
>>> about a month ago I took an (abbreviated) version of a simple data
>>> analysis I did in R and tried to rewrite as much of it as possible,
>>> line by line, into python using numpy and statsmodels. I didn't use
>>> pandas, and I can't comment on how much it might have simplified
>>> things.
>>>
>>> This comparison might be useful to some people, so I stuck it up on a
>>> github repo. My overall impression is that R is much stronger for
>>> interactive data analysis. Click on the link for more details why,
>>> which are summarized in the README file.
>>>
>>> https://github.com/chrisjordansquire/r_vs_py
>>>
>>> The code examples should run out of the box with no downloads (other
>>> than R, Python, numpy, scipy, and statsmodels) required.
>>
>> Thank you very much for doing that - it's a very useful exercise.  I
>> hope we can make use of it to discuss how to get better, in the true
>
> Hopefully. I suppose I should also mention, for those that don't want
> to click on the link, that the two largest reasons R was much simpler
> to use were because it was easier to construct models and easier to
> view entries I'd stuck into matrices. R's graphing capabilities seemed
> slightly more friendly, but that might have just been my familiarity
> with them.
>
> (As an aside, numpy arrays' print method don't make them friendly for
> interactive viewing. Even ipython couldn't make a few of the matrices
> I made very intelligible, and it's easy to construct examples that
> make numpy arrays hideous to behold. For example,

for interactive viewing spyder has an array viewer (variable explorer)
similar to matlab

>
> x = np.arange(5).reshape(5,1)
> y = np.ones(5).reshape(1,5)
> z = x*y
> z[0,0] += 0.0001
> print z
>
> [[  1.00000000e-04   0.00000000e+00   0.00000000e+00   0.00000000e+00
>    0.00000000e+00]
>  [  1.00000000e+00   1.00000000e+00   1.00000000e+00   1.00000000e+00
>    1.00000000e+00]
>  [  2.00000000e+00   2.00000000e+00   2.00000000e+00   2.00000000e+00
>    2.00000000e+00]
>  [  3.00000000e+00   3.00000000e+00   3.00000000e+00   3.00000000e+00
>    3.00000000e+00]
>  [  4.00000000e+00   4.00000000e+00   4.00000000e+00   4.00000000e+00
>    4.00000000e+00]]

>>> from scikits.statsmodels.iolib import SimpleTable
>>> print SimpleTable(z)
======================
0.0001 0.0 0.0 0.0 0.0
 1.0   1.0 1.0 1.0 1.0
 2.0   2.0 2.0 2.0 2.0
 3.0   3.0 3.0 3.0 3.0
 4.0   4.0 4.0 4.0 4.0
----------------------

>>> z[0,0] = 1e-6
>>> print SimpleTable(z)
=====================
1e-06 0.0 0.0 0.0 0.0
 1.0  1.0 1.0 1.0 1.0
 2.0  2.0 2.0 2.0 2.0
 3.0  3.0 3.0 3.0 3.0
 4.0  4.0 4.0 4.0 4.0
---------------------

>
> (Strangely, it looks much more tolerable if x  =
> np.arange(1,6).reshape(5,1) instead.)
>
> If you do the same thing in R,
>
> x = rep(0:4,5)
> x = matrix(x,ncol=5)
> x[1,1] = 0.000001
> x
>
> you get
>
>      [,1] [,2] [,3] [,4] [,5]
> [1,] 1e-06    0    0    0    0
> [2,] 1e+00    1    1    1    1
> [3,] 2e+00    2    2    2    2
> [4,] 3e+00    3    3    3    3
> [5,] 4e+00    4    4    4    4
>
> much more readable.)
>
>
> As a simple metric, my .r file was about 1/2 the size of the .py file,
> even though I couldn't do everything in python that I could in R.
> (These commands were meant to be entered interactively, so the length
> of the length of the file is, perhaps, a more valid metric then usual
> to be concerned about.)

predefining your categorical variables would save quite a few lines.

Josef

>
> -Chris Jordan-Squire
>
>
>> spirit of:
>>
>> Confront the Brutal Facts
>> http://en.wikipedia.org/wiki/Good_to_Great
>>
>> See you,
>>
>> Matthew
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list