[Numpy-discussion] recarray slow?

wheres pythonmonks wherespythonmonks@gmail....
Wed Jul 21 15:57:31 CDT 2010


My code had a bug:

idx_by_name = dict((n,i) for i,n in enumerate(d.dtype.names))



On Wed, Jul 21, 2010 at 4:49 PM, Pauli Virtanen <pav@iki.fi> wrote:
> Wed, 21 Jul 2010 16:22:37 -0400, wheres pythonmonks wrote:
>> However: is there an automatic way to convert a named index to a
>> position?
>
> It's not really a named index -- it's a field name. Since the fields of
> an array element can be of different size, they cannot be referred to
> with an array index (in the sense that Numpy understands the concept).
>
>> What about looping over tuples of my recarray:
>>
>> for t in d:
>>     date = t['Date']
>>     ....
>>
>> I guess that the above does have to lookup 'Date' each time.
>
> As Pierre said, you can move the lookups outside the loop.
>
>        for date in t['Date']:
>            ...
>
> If you want to iterate over multiple fields, it may be best to use
> itertools.izip so that you unbox a single element at a time.
>
> However, I'd be quite surprised if the hash lookups would actually take a
> significant part of the run time:
>
> 1) Python dictionaries are ubiquitous and the implementation appears
>   heavily optimized to be fast with strings.
>
> 2) The hash of a Python string is cached, and only computed only once.
>
> 3) String literals are interned, and represented by a single object only:
>
>   >>> 'Date' is 'Date'
>   True
>
>   So when running the above Python code, the hash of 'Date' is computed
>   exactly once.
>
> 4) For small dictionaries containing strings, such as the fields
>   dictionary, I'd expect 1-3) to be dwarfed by the overhead involved
>   in making Python function calls (PyArg_*) and interpreting the
>   bytecode.
>
> So as the usual optimization mantra applies here: measure first :)
>
> Of course, if you measure and show that the expectations 1-4) are
> actually wrong, that's fine.
>
> --
> Pauli Virtanen
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list