[Numpy-discussion] array constructor from generators?
tim.hochberg at cox.net
Tue Apr 4 17:41:15 CDT 2006
Zachary Pincus wrote:
> Hi folks,
> Sorry if this has already been discussed, but do you all think it a
> good idea to extend the array constructor so that it can accept
> generators instead of lists?
> I often construct arrays from list comprehensions on generators, e.g.
> to read a tab-delimited file in:
> numpy.array([map(float, line.split()) for line in file])
> or making an array of pairs of numbers:
> numpy.array([f for f in unique_combinations(input, 2)])
> If the array constructor accepted generators (and turned them into
> lists behind the scenes, or even evaluated them lazily while filling
> in the memory buffer, not sure what would be more efficient), the
> above could be written somewhat more cleanly:
> numpy.array(map(float, line.split() for line in file) (using a
> generator expression)
> numpy.array(unique_combinations(input, 2))
> the latter is especially a win.
> Moreover, it's becoming more standard for any python thing that can
> accept a list to also accept a generator.
> The downside is that currently, passing array() an object makes a 0-d
> object array with that object. If this were changed, then passing
> array() an iterator object would be handled differently than passing
> array any other object. This might possibly be a fatal flaw in this
You pretty much can't count on anything when trying to implicitly create
object arrays anyway. There's already buckets of special cases to make
the other array types user friendly. In other words I don't think we
should care. You do have to be careful to special case iterators after
all the other special case machinery, so that lists and whatnot that are
treated efficiently don't get slowed down.
> I'd be happy to look in to implementing this functionality if people
> think it is a good idea, and could give me some tips as to the best
> way to implement it.
I brought this up last week and Travis was OK with it. I have it on my
todo list, but if you are in a hurry you're welcome to do it instead.
If you do look at it, consider looking into the '__length_hint__
parameter that's slated to go into Python 2.5. When this is present,
it's potentially a big win, since you can preallocate the array and fill
it directly from the iterator. Without this, you probably can't do much
better than just building a list from the array. What would work well
would be to build a list, then steal its memory. I'm not sure if that's
feasible without leaking a reference to the list though.
Also, with iterators, specifying dtype will make a huge difference. If
an object has __length_hint__ and you specify dtype, then you can
preallocate the array as I suggested above. However, if dtype is not
specified, you still need to build the list completely, determine what
type it is, allocate the array memory and then copy the values into it.
Much less efficient!
> This SF.Net email is sponsored by xPML, a groundbreaking scripting
> that extends applications into web and mobile media. Attend the live
> and join the prime developer group breaking into this new coding
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
More information about the Numpy-discussion