[SciPy-User] R vs Python for simple interactive data analysis

Christopher Jordan-Squire cjordan1@uw....
Sat Aug 27 13:44:12 CDT 2011


On Sat, Aug 27, 2011 at 2:27 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
> Hi,
>
> On Sat, Aug 27, 2011 at 11:19 AM, Christopher Jordan-Squire
> <cjordan1@uw.edu> wrote:
>> Hi--I've been a moderately heavy R user for the past two years, so
>> about a month ago I took an (abbreviated) version of a simple data
>> analysis I did in R and tried to rewrite as much of it as possible,
>> line by line, into python using numpy and statsmodels. I didn't use
>> pandas, and I can't comment on how much it might have simplified
>> things.
>>
>> This comparison might be useful to some people, so I stuck it up on a
>> github repo. My overall impression is that R is much stronger for
>> interactive data analysis. Click on the link for more details why,
>> which are summarized in the README file.
>>
>> https://github.com/chrisjordansquire/r_vs_py
>>
>> The code examples should run out of the box with no downloads (other
>> than R, Python, numpy, scipy, and statsmodels) required.
>
> Thank you very much for doing that - it's a very useful exercise.  I
> hope we can make use of it to discuss how to get better, in the true

Hopefully. I suppose I should also mention, for those that don't want
to click on the link, that the two largest reasons R was much simpler
to use were because it was easier to construct models and easier to
view entries I'd stuck into matrices. R's graphing capabilities seemed
slightly more friendly, but that might have just been my familiarity
with them.

(As an aside, numpy arrays' print method don't make them friendly for
interactive viewing. Even ipython couldn't make a few of the matrices
I made very intelligible, and it's easy to construct examples that
make numpy arrays hideous to behold. For example,

x = np.arange(5).reshape(5,1)
y = np.ones(5).reshape(1,5)
z = x*y
z[0,0] += 0.0001
print z

[[  1.00000000e-04   0.00000000e+00   0.00000000e+00   0.00000000e+00
    0.00000000e+00]
 [  1.00000000e+00   1.00000000e+00   1.00000000e+00   1.00000000e+00
    1.00000000e+00]
 [  2.00000000e+00   2.00000000e+00   2.00000000e+00   2.00000000e+00
    2.00000000e+00]
 [  3.00000000e+00   3.00000000e+00   3.00000000e+00   3.00000000e+00
    3.00000000e+00]
 [  4.00000000e+00   4.00000000e+00   4.00000000e+00   4.00000000e+00
    4.00000000e+00]]

(Strangely, it looks much more tolerable if x  =
np.arange(1,6).reshape(5,1) instead.)

If you do the same thing in R,

x = rep(0:4,5)
x = matrix(x,ncol=5)
x[1,1] = 0.000001
x

you get

      [,1] [,2] [,3] [,4] [,5]
[1,] 1e-06    0    0    0    0
[2,] 1e+00    1    1    1    1
[3,] 2e+00    2    2    2    2
[4,] 3e+00    3    3    3    3
[5,] 4e+00    4    4    4    4

much more readable.)


As a simple metric, my .r file was about 1/2 the size of the .py file,
even though I couldn't do everything in python that I could in R.
(These commands were meant to be entered interactively, so the length
of the length of the file is, perhaps, a more valid metric then usual
to be concerned about.)

-Chris Jordan-Squire


> spirit of:
>
> Confront the Brutal Facts
> http://en.wikipedia.org/wiki/Good_to_Great
>
> See you,
>
> Matthew
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list