[Numpy-discussion] Timing array construction

Bruce Southey bsouthey@gmail....
Thu Apr 30 14:32:47 CDT 2009


Mark Janikas wrote:
> Thanks Eric!
>
> I have a lot of array constructions in my code that use NUM.array([list of values])... I am going to replace it with the empty allocation and insertion.  It is indeed twice as fast as "c_" (when it matters, I.e. N is relatively large):
>
> 	"c_", "empty"
> 100 0.0007, 0.0230
> 200 0.0007, 0.0002
> 400 0.0007, 0.0002
> 800 0.0020, 0.0002
> 1600 0.0009, 0.0003
> 3200 0.0010, 0.0003
> 6400 0.0013, 0.0005
> 12800 0.0058, 0.0032
>
> -----Original Message-----
> From: numpy-discussion-bounces@scipy.org [mailto:numpy-discussion-bounces@scipy.org] On Behalf Of Eric Firing
> Sent: Wednesday, April 29, 2009 11:49 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Timing array construction
>
> Mark Janikas wrote:
>   
>> Hello All,
>>
>>  
>>
>> I was exploring some different ways to concatenate arrays, and using 
>> "c_" is the fastest by far.  Is there a difference I am missing that can 
>> account for the huge disparity?  Obviously the "zip" function makes the 
>> "as array" and "array" calls slower, but the same arguments (xCoords, 
>> yCoords) are being passed to the methods... so if there is no difference 
>> in the outputs (there doesn't appear to be) then what reason would I 
>> have to use "array" or "as array" in this context?  Thanks so much ahead 
>> of time..
>>     
>
> If you really want speed, use something like this:
>
> import numpy as np
> def useEmpty(xCoords, yCoords):
>      out = np.empty((len(xCoords), 2), dtype=xCoords.dtype)
>      out[:,0] = xCoords
>      out[:,1] = yCoords
>      return out
>
> It is quite a bit faster than using c_; more than a factor of two on my 
> machine for all your test cases.
>
> All your methods using zip and array are doing a lot of unpacking, 
> repacking, checking, iterating... Even the c_ method is slower than it 
> needs to be for this case because it is more general and flexible.
>
> Eric
>   
>>  
>>
>> MJ
>>
>>  
>>
>> ############## Snippet ###################
>>
>> import numpy as NUM
>>
>>  
>>
>> def useAsArray(xCoords, yCoords):
>>
>>     return NUM.asarray(zip(xCoords, yCoords))
>>
>>  
>>
>> def useArray(xCoords, yCoords):
>>
>>     return NUM.array(zip(xCoords, yCoords))
>>
>>  
>>
>> def useC(xCoords, yCoords):
>>
>>     return NUM.c_[xCoords, yCoords]
>>
>>  
>>
>>  
>>
>> if __name__ == "__main__":
>>
>>     from timeit import Timer
>>
>>     import numpy.random as RAND
>>
>>     import collections as COLL
>>
>>  
>>
>>     resAsArray = COLL.defaultdict(float)
>>
>>     resArray = COLL.defaultdict(float)
>>
>>     resMat = COLL.defaultdict(float)
>>
>>     numTests = 0.0
>>
>>     sameTests = 0.0
>>
>>     N = [100, 200, 400, 800, 1600, 3200, 6400, 12800]
>>
>>     for i in N:
>>
>>         print "Time Join List into Array for N = " + str(i)
>>
>>         xCoords = RAND.normal(10, 1, i)
>>
>>         yCoords = RAND.normal(10, 1, i)
>>
>>  
>>
>>         statement = 'from __main__ import xCoords, yCoords, useAsArray'
>>
>>         t1 = Timer('useAsArray(xCoords, yCoords)', statement)
>>
>>         resAsArray[i] = t1.timeit(10)
>>
>>  
>>
>>         statement = 'from __main__ import xCoords, yCoords, useArray'
>>
>>         t2 = Timer('useArray(xCoords, yCoords)', statement)
>>
>>         resArray[i] = t2.timeit(10)
>>
>>  
>>
>>         statement = 'from __main__ import xCoords, yCoords, useC'
>>
>>         t3 = Timer('useC(xCoords, yCoords)', statement)
>>
>>         resMat[i] = t3.timeit(10)          
>>
>>  
>>
>>     for n in N:
>>
>>         print "%i, %0.4f, %0.4f, %0.4f" % (n, resAsArray[n], 
>> resArray[n], resMat[n])
>>
>> ###############################################################
>>
>>  
>>
>> RESULT
>>
>>  
>>
>> N, useAsArray, useArray, useC
>>
>> 100, 0.0066, 0.0065, 0.0007
>>
>> 200, 0.0137, 0.0140, 0.0008
>>
>> 400, 0.0277, 0.0288, 0.0007
>>
>> 800, 0.0579, 0.0577, 0.0008
>>
>> 1600, 0.1175, 0.1289, 0.0009
>>
>> 3200, 0.2291, 0.2309, 0.0012
>>
>> 6400, 0.4561, 0.4564, 0.0013
>>
>> 12800, 0.9218, 0.9122, 0.0019
>>
>>  
>>
>>  
>>
>> Mark Janikas
>>
>> Product Engineer
>>
>> ESRI, Geoprocessing
>>
>> 380 New York St.
>>
>> Redlands, CA 92373
>>
>> 909-793-2853 (2563)
>>
>> mjanikas@esri.com <mailto:mjanikas@esri.com>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>     
Hi,
You can also use column_stack (due to the desired result) as in:
numpy.column_stack((xCoords, yCoords))
numpy.concatenate() is more general.

While not as fast as using numpy.empty(), it does provide a more 
readable and flexible syntax (for example, you do not have to know in 
advance how many columns).

Bruce



More information about the Numpy-discussion mailing list