[Numpy-discussion] recarray slow?

Pauli Virtanen pav@iki...
Wed Jul 21 14:44:14 CDT 2010


Wed, 21 Jul 2010 15:12:14 -0400, wheres pythonmonks wrote:

> I have an recarray -- the first column is date.
> 
> I have the following function to compute the number of unique dates in
> my data set:
> 
> 
> def byName(): return(len(list(set(d['Date'])) ))

What this code does is:

1. d['Date']

   Extract an array slice containing the dates. This is fast.

2. set(d['Date'])

   Make copies of each array item, and box them into Python objects. 
   This is slow.

   Insert each of the objects in the set. Also this is somewhat slow.

3. list(set(d['Date']))

   Get each item in the set, and insert them to a new list.
   This is somewhat slow, and unnecessary if you only want to
   count.

4. len(list(set(d['Date'])))


So the slowness arises because the code is copying data around, and 
boxing it into Python objects.

You should try using Numpy functions (these don't re-box the data) to do 
this. http://docs.scipy.org/doc/numpy/reference/routines.set.html

-- 
Pauli Virtanen



More information about the NumPy-Discussion mailing list