[Numpy-discussion] Alternative to record array

Ryan May rmay31@gmail....
Mon Dec 29 11:38:56 CST 2008


Jean-Baptiste Rudant wrote:
> Hello,
> 
> I like to use record arrays to access fields by their name, and because
> they are esay to use with pytables. But I think it's not very effiicient
> for what I have to do. Maybe I'm misunderstanding something.
> 
> Example : 
> 
> import numpy as np
> age = np.random.randint(0, 99, 10e6)
> weight = np.random.randint(0, 200, 10e6)
> data = np.rec.fromarrays((age, weight), names='age, weight')
> # the kind of operations I do is :
> data.age += data.age + 1
> # but it's far less efficient than doing :
> age += 1
> # because I think the record array stores [(age_0, weight_0) ...(age_n,
> weight_n)]
> # and not [age0 ... age_n] then [weight_0 ... weight_n].
> 
> So I think I don't use record arrays for the right purpose. I only need
> something which would make me esasy to manipulate data by accessing
> fields by their name.
> 
> Am I wrong ? Is their something in numpy for my purpose ? Do I have to
> implement my own class, with something like :
> 
> 
> class FieldArray:
>     def __init__(self, array_dict):
>         self.array_list = array_dict
>             
>     def __getitem__(self, field):
>         return self.array_list[field]
>     
>     def __setitem__(self, field, value):
>         self.array_list[field] = value
>     
> my_arrays = {'age': age, 'weight' : weight}
> data = FieldArray(my_arrays)
> 
> data['age'] += 1

You can accomplish what your FieldArray class does using numpy dtypes:

	import numpy as np
	dt = np.dtype([('age', np.int32), ('weight', np.int32)])
	N = int(10e6)
	data = np.empty(N, dtype=dt)
	data['age'] = np.random.randint(0, 99, 10e6)
	data['weight'] = np.random.randint(0, 200, 10e6)

	data['age'] += 1

Timing for recarrays (your code):

In [10]: timeit data.age += 1
10 loops, best of 3: 221 ms per loop

Timing for my example:

In [2]: timeit data['age']+=1
10 loops, best of 3: 150 ms per loop

Hope this helps.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


More information about the Numpy-discussion mailing list