[Numpy-discussion] How to implement a 'pivot table?'

Travis Vaught travis@enthought....
Wed Aug 1 23:22:31 CDT 2007


Greetings,

Speaking of brute force... I've attached a rather ugly module that  
let's you do things with a pretty simple interface (session shown  
below).  I haven't fully tested the performance, but a million  
records with 5 fields takes about 11 seconds on my Mac to do a  
'mean'.  I'm not sure what your performance considerations are, but  
this may be useful.  Record arrays are really nice if they make sense  
for your data.

Travis


(from an ipython command prompt)

In [1]: import testpivot as p

In [2]: a = p.sample_data()

In [3]: a
Out[3]:
recarray([('ACorp', 'Region 1', 'Q1', 20000.0),
        ('ACorp', 'Region 1', 'Q2', 22000.0),
        ('ACorp', 'Region 1', 'Q3', 21000.0),
        ('ACorp', 'Region 1', 'Q4', 26000.0),
        ('ACorp', 'Region 2', 'Q1', 23000.0),
        ('ACorp', 'Region 2', 'Q2', 20000.0),
        ('ACorp', 'Region 2', 'Q3', 22000.0),
        ('ACorp', 'Region 2', 'Q4', 21000.0),
        ('ACorp', 'Region 3', 'Q1', 26000.0),
        ('ACorp', 'Region 3', 'Q2', 23000.0),
        ('ACorp', 'Region 3', 'Q3', 29000.0),
        ('ACorp', 'Region 3', 'Q4', 27000.0),
        ('BCorp', 'Region 1', 'Q1', 20000.0),
        ('BCorp', 'Region 1', 'Q2', 20000.0),
        ('BCorp', 'Region 1', 'Q3', 24000.0),
        ('BCorp', 'Region 1', 'Q4', 24000.0),
        ('BCorp', 'Region 2', 'Q1', 21000.0),
        ('BCorp', 'Region 2', 'Q2', 21000.0),
        ('BCorp', 'Region 2', 'Q3', 22000.0),
        ('BCorp', 'Region 2', 'Q4', 29000.0),
        ('BCorp', 'Region 3', 'Q1', 28000.0),
        ('BCorp', 'Region 3', 'Q2', 25000.0),
        ('BCorp', 'Region 3', 'Q3', 22000.0),
        ('BCorp', 'Region 3', 'Q4', 21000.0)],
       dtype=[('company', '|S5'), ('region', '|S8'), ('quarter', '| 
S2'), ('income', '<f8')])

In [4]: p.pivot(a, 'company', 'region', 'income', p.psum)
######## Summary by company and region ##########
cols:['ACorp' 'BCorp']
rows:['Region 1' 'Region 2' 'Region 3']
[[  89000.   88000.]
[  86000.   93000.]
[ 105000.   96000.]]

In [5]: p.pivot(a, 'company', 'quarter', 'income', p.psum)
######## Summary by company and quarter ##########
cols:['ACorp' 'BCorp']
rows:['Q1' 'Q2' 'Q3' 'Q4']
[[ 69000.  69000.]
[ 65000.  66000.]
[ 72000.  68000.]
[ 74000.  74000.]]

In [6]: p.pivot(a, 'company', 'quarter', 'income', p.pmean)
######## Summary by company and quarter ##########
cols:['ACorp' 'BCorp']
rows:['Q1' 'Q2' 'Q3' 'Q4']
[[ 23000.          23000.        ]
[ 21666.66666667  22000.        ]
[ 24000.          22666.66666667]
[ 24666.66666667  24666.66666667]]

-------------- next part --------------
A non-text attachment was scrubbed...
Name: testpivot.py
Type: text/x-python-script
Size: 3833 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20070801/82cd7c40/attachment.bin 
-------------- next part --------------


On Aug 1, 2007, at 2:02 PM, Bruce Southey wrote:

> Hi,
> The hard part is knowing what aggregate function that you want. So a
> hard way, even after cheating, to take the data provided is given
> below. (The Numpy Example List was very useful especially on the where
> function)!
>
> I tried to be a little generic so you can replace the sum by any
> suitable function and probably the array type as well. Of course it is
> not complete because you still need to know the levels of the 'rows'
> and 'columns' and also is not efficient as it has loops.
>
> Bruce
>
> from numpy import *
> A=array([[1,1,10],
>          [1,1,20],
>          [1,2,30],
>          [2,1,40],
>          [2,2,50],
>          [2,2,60] ])
> C = zeros((2,2))
>
> for i in range(2):
>       crit1 = (A[:,0]==1+i)
>       subA=A[crit1,1:]
>       for j in range(2):
>             crit2 = (subA[:,0]==1+j)
>             subB=subA[crit2,1:]
>             C[i,j]=subB.sum()
>
>
> print C
>
> On 7/30/07, Geoffrey Zhu <zyzhu2000@gmail.com> wrote:
>> Hi Everyone,
>>
>> I am wondering what is the best (and fast) way to build a pivot table
>> aside from the 'brute force way?'
>>
>> I want to transform an numpy array into a pivot table. For  
>> example, if
>> I have a numpy array like below:
>>
>> Region     Date          # of Units
>> ----------    ----------        --------------
>> East        1/1             10
>> East        1/1             20
>> East        1/2             30
>> West       1/1             40
>> West       1/2             50
>> West       1/2             60
>>
>> I want  to transform this into the following table, where f() is a
>> given aggregate function:
>>
>>            Date
>> Region           1/1          1/2
>> ----------
>> East         f(10,20)         f(30)
>> West        f(40)             f(50,60)
>>
>>
>> I can regroup them into 'sets' and do it the brute force way, but  
>> that
>> is kind of slow to execute. Does anyone know a better way?
>>
>>
>> Thanks,
>> Geoffrey
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the Numpy-discussion mailing list