[SciPy-User] calculating the mean for each factor (like tapply in R)
Wes McKinney
wesmckinn@gmail....
Wed Aug 1 20:32:39 CDT 2012
On Wed, Aug 1, 2012 at 9:35 AM, Oleksandr Huziy <guziy.sasha@gmail.com> wrote:
> Hi,
>
> It is pretty much the same as looping, but you could do the following
>
> In [1]: import numpy as np
>
> In [2]: exps = np.array([10,13,12,3,4,6,33,44,55])
>
> In [3]: x = np.array([10,13,12,3,4,6,33,44,55])
>
> In [4]: exps = np.array([1,1,1,2,2,2,3,3,3])
>
> z = [np.mean(x[exps == i]) for i in np.unique( exps )]
>
> --
> Oleksandr (Sasha) Huziy
>
>
> 2012/8/1 Andreas Hilboll <lists@hilboll.de>
>>
>> > Hi there,
>> >
>> > I've just moved from R to IPython and wondered if there was a good way
>> > of
>> > finding the means and/or variance of values in a dataframe given a
>> > factor
>> >
>> > e.g.:
>> > if df =
>> > x experiment
>> > 10 1
>> > 13 1
>> > 12 1
>> > 3 2
>> > 4 2
>> > 6 2
>> > 33 3
>> > 44 3
>> > 55 3
>> >
>> > in tapply you would do:
>> >
>> > tapply(df$x, list(df$experiment), mean)
>> > tapply(df$x, list(df$experiment), var)
>> >
>> > I guess I can always loop through the array for each experiment type,
>> > but
>> > thought that this is the kind of functionality that would be included in
>> > a
>> > core library.
>>
>> Pandas (http://pandas.pydata.org/) seems to be what you're looking for. It
>> has a DataFrame class which allows grouping of data.
>>
>> Cheers, Andreas.
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
For the #lazyweb, here is what this looks like in pandas:
In [24]: df
Out[24]:
x experiment
0 10 1
1 13 1
2 12 1
3 3 2
4 4 2
5 6 2
6 33 3
7 44 3
8 55 3
In [25]: df.groupby('experiment').x.mean()
Out[25]:
experiment
1 11.666667
2 4.333333
3 44.000000
Name: x
In [26]: df.groupby('experiment').x.var()
Out[26]:
experiment
1 2.333333
2 2.333333
3 121.000000
Name: x
or if you want to be fancy:
In [27]: df.groupby('experiment').x.agg(['mean', 'var'])
Out[27]:
mean var
experiment
1 11.666667 2.333333
2 4.333333 2.333333
3 44.000000 121.000000
There are good reasons to use pandas over a DIY approach with NumPy
array operations; notably I use smart algorithms so that the runtime
scales linearly with the side of the data instead of quadratically.
- Wes
More information about the SciPy-User
mailing list