[SciPy-User] R vs Python for simple interactive data analysis
josef.pktd@gmai...
josef.pktd@gmai...
Sat Aug 27 14:55:29 CDT 2011
On Sat, Aug 27, 2011 at 2:44 PM, Christopher Jordan-Squire
<cjordan1@uw.edu> wrote:
> On Sat, Aug 27, 2011 at 2:27 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> Hi,
>>
>> On Sat, Aug 27, 2011 at 11:19 AM, Christopher Jordan-Squire
>> <cjordan1@uw.edu> wrote:
>>> Hi--I've been a moderately heavy R user for the past two years, so
>>> about a month ago I took an (abbreviated) version of a simple data
>>> analysis I did in R and tried to rewrite as much of it as possible,
>>> line by line, into python using numpy and statsmodels. I didn't use
>>> pandas, and I can't comment on how much it might have simplified
>>> things.
>>>
>>> This comparison might be useful to some people, so I stuck it up on a
>>> github repo. My overall impression is that R is much stronger for
>>> interactive data analysis. Click on the link for more details why,
>>> which are summarized in the README file.
>>>
>>> https://github.com/chrisjordansquire/r_vs_py
>>>
>>> The code examples should run out of the box with no downloads (other
>>> than R, Python, numpy, scipy, and statsmodels) required.
>>
>> Thank you very much for doing that - it's a very useful exercise. I
>> hope we can make use of it to discuss how to get better, in the true
>
> Hopefully. I suppose I should also mention, for those that don't want
> to click on the link, that the two largest reasons R was much simpler
> to use were because it was easier to construct models and easier to
> view entries I'd stuck into matrices. R's graphing capabilities seemed
> slightly more friendly, but that might have just been my familiarity
> with them.
>
> (As an aside, numpy arrays' print method don't make them friendly for
> interactive viewing. Even ipython couldn't make a few of the matrices
> I made very intelligible, and it's easy to construct examples that
> make numpy arrays hideous to behold. For example,
for interactive viewing spyder has an array viewer (variable explorer)
similar to matlab
>
> x = np.arange(5).reshape(5,1)
> y = np.ones(5).reshape(1,5)
> z = x*y
> z[0,0] += 0.0001
> print z
>
> [[ 1.00000000e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00
> 0.00000000e+00]
> [ 1.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+00
> 1.00000000e+00]
> [ 2.00000000e+00 2.00000000e+00 2.00000000e+00 2.00000000e+00
> 2.00000000e+00]
> [ 3.00000000e+00 3.00000000e+00 3.00000000e+00 3.00000000e+00
> 3.00000000e+00]
> [ 4.00000000e+00 4.00000000e+00 4.00000000e+00 4.00000000e+00
> 4.00000000e+00]]
>>> from scikits.statsmodels.iolib import SimpleTable
>>> print SimpleTable(z)
======================
0.0001 0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0 1.0
2.0 2.0 2.0 2.0 2.0
3.0 3.0 3.0 3.0 3.0
4.0 4.0 4.0 4.0 4.0
----------------------
>>> z[0,0] = 1e-6
>>> print SimpleTable(z)
=====================
1e-06 0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0 1.0
2.0 2.0 2.0 2.0 2.0
3.0 3.0 3.0 3.0 3.0
4.0 4.0 4.0 4.0 4.0
---------------------
>
> (Strangely, it looks much more tolerable if x =
> np.arange(1,6).reshape(5,1) instead.)
>
> If you do the same thing in R,
>
> x = rep(0:4,5)
> x = matrix(x,ncol=5)
> x[1,1] = 0.000001
> x
>
> you get
>
> [,1] [,2] [,3] [,4] [,5]
> [1,] 1e-06 0 0 0 0
> [2,] 1e+00 1 1 1 1
> [3,] 2e+00 2 2 2 2
> [4,] 3e+00 3 3 3 3
> [5,] 4e+00 4 4 4 4
>
> much more readable.)
>
>
> As a simple metric, my .r file was about 1/2 the size of the .py file,
> even though I couldn't do everything in python that I could in R.
> (These commands were meant to be entered interactively, so the length
> of the length of the file is, perhaps, a more valid metric then usual
> to be concerned about.)
predefining your categorical variables would save quite a few lines.
Josef
>
> -Chris Jordan-Squire
>
>
>> spirit of:
>>
>> Confront the Brutal Facts
>> http://en.wikipedia.org/wiki/Good_to_Great
>>
>> See you,
>>
>> Matthew
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list