[Numpy-discussion] array constructor from generators?

Tim Hochberg tim.hochberg at cox.net
Tue Apr 4 17:41:15 CDT 2006


Zachary Pincus wrote:

> Hi folks,
>
> Sorry if this has already been discussed, but do you all think it a  
> good idea to extend the array constructor so that it can accept  
> generators instead of lists?
>
> I often construct arrays from list comprehensions on generators, e.g.  
> to read a tab-delimited file in:
> numpy.array([map(float, line.split()) for line in file])
> or making an array of pairs of numbers:
> numpy.array([f for f in unique_combinations(input, 2)])
>
> If the array constructor accepted generators (and turned them into  
> lists behind the scenes, or even evaluated them lazily while filling  
> in the memory buffer, not sure what would be more efficient), the  
> above could be written somewhat more cleanly:
> numpy.array(map(float, line.split() for line in file) (using a  
> generator expression)
> and
> numpy.array(unique_combinations(input, 2))
>
> the latter is especially a win.
>
> Moreover, it's becoming more standard for any python thing that can  
> accept a list to also accept a generator.
>
> The downside is that currently, passing array() an object makes a 0-d  
> object array with that object. If this were changed, then passing  
> array() an iterator object would be handled differently than passing  
> array any other object. This might possibly be a fatal flaw in this  
> idea.

You pretty much can't count on anything when trying to implicitly create 
object arrays anyway. There's already buckets of special cases to make 
the other array types user friendly. In other words I don't think we 
should care. You do have to be careful to special case iterators after 
all the other special case machinery, so that lists and whatnot that are 
treated efficiently don't get slowed down.

>
> I'd be happy to look in to implementing this functionality if people  
> think it is a good idea, and could give me some tips as to the best  
> way to implement it.

Hi Zach,

I brought this up last week and Travis was OK with it. I have it on my 
todo list, but if you are in a hurry you're welcome to do it instead.

If you do look at it, consider looking into the '__length_hint__ 
parameter that's slated to go into Python 2.5. When this is present, 
it's potentially a big win, since you can preallocate the array and fill 
it directly from the iterator. Without this, you probably can't do much 
better than just building a list from the array. What would work well 
would be to build a list, then steal its memory. I'm not sure if that's 
feasible without leaking a reference to the list though.

Also, with iterators, specifying dtype will make a huge difference. If 
an object has __length_hint__ and you specify dtype, then you can 
preallocate the array as I suggested above. However, if dtype is not 
specified, you still need to build the list completely, determine what 
type it is, allocate the array memory and then copy the values into it. 
Much less efficient!

Regards,

-tim


>
> Zach
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting 
> language
> that extends applications into web and mobile media. Attend the live 
> webcast
> and join the prime developer group breaking into this new coding 
> territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>






More information about the Numpy-discussion mailing list