[Numpy-discussion] recarray slow?
Wed Jul 21 15:57:31 CDT 2010
My code had a bug:
idx_by_name = dict((n,i) for i,n in enumerate(d.dtype.names))
On Wed, Jul 21, 2010 at 4:49 PM, Pauli Virtanen <firstname.lastname@example.org> wrote:
> Wed, 21 Jul 2010 16:22:37 -0400, wheres pythonmonks wrote:
>> However: is there an automatic way to convert a named index to a
> It's not really a named index -- it's a field name. Since the fields of
> an array element can be of different size, they cannot be referred to
> with an array index (in the sense that Numpy understands the concept).
>> What about looping over tuples of my recarray:
>> for t in d:
>> date = t['Date']
>> I guess that the above does have to lookup 'Date' each time.
> As Pierre said, you can move the lookups outside the loop.
> for date in t['Date']:
> If you want to iterate over multiple fields, it may be best to use
> itertools.izip so that you unbox a single element at a time.
> However, I'd be quite surprised if the hash lookups would actually take a
> significant part of the run time:
> 1) Python dictionaries are ubiquitous and the implementation appears
> heavily optimized to be fast with strings.
> 2) The hash of a Python string is cached, and only computed only once.
> 3) String literals are interned, and represented by a single object only:
> >>> 'Date' is 'Date'
> So when running the above Python code, the hash of 'Date' is computed
> exactly once.
> 4) For small dictionaries containing strings, such as the fields
> dictionary, I'd expect 1-3) to be dwarfed by the overhead involved
> in making Python function calls (PyArg_*) and interpreting the
> So as the usual optimization mantra applies here: measure first :)
> Of course, if you measure and show that the expectations 1-4) are
> actually wrong, that's fine.
> Pauli Virtanen
> NumPy-Discussion mailing list
More information about the NumPy-Discussion