[Numpy-discussion] fromiter

Tim Hochberg tim.hochberg at cox.net
Sat Jun 10 17:28:55 CDT 2006


David M. Cooke wrote:

>On Sat, Jun 10, 2006 at 01:18:05PM -0700, Tim Hochberg wrote:
>  
>
>>I finally got around to cleaning up and checking in fromiter. As Travis 
>>suggested, this version does not require that you specify count. From 
>>the docstring:
>>
>>    fromiter(...)
>>        fromiter(iterable, dtype, count=-1) returns a new 1d array
>>    initialized from iterable. If count is nonegative, the new array
>>    will have count elements, otherwise it's size is determined by the
>>    generator.
>>
>>If count is specified, it allocates the full array ahead of time. If it 
>>is not, it periodically reallocates space for the array, allocating 50% 
>>extra space each time and reallocating back to the final size at the end 
>>(to give realloc a chance to reclaim any extra space).
>>
>>Speedwise, "fromiter(iterable, dtype, count)" is about twice as fast as 
>>"array(list(iterable),dtype=dtype)". Omitting count slows things down by 
>>about 15%; still much faster than using "array(list(...))".  It also is 
>>going to chew up more memory than if you include count, at least 
>>temporarily, but still should typically use much less than the 
>>"array(list(...))" approach.
>>    
>>
>
>Can this be integrated into array() so that array(iterable, dtype=dtype)
>does the expected thing?
>  
>
It get's a little sticky since the expected thing is probably that 
array([iterable, iterable, iterable], dtype=dtype) work and produce an 
array of shape [3, N].  That looks like that would be hard to do 
efficiently.

>Can you try to find the length of the iterable, with PySequence_Size() on
>the original object? This gets a bit iffy, as that might not be correct
>(but it could be used as a hint).
>  
>
The way the code is setup, a hint could be made use of with little 
additional complexity. Allegedly, some objects in 2.5 will grow 
__length_hint__, which could be made use of as well. I'm not very 
motivated to mess with this at the moment though as the benefit is 
relatively small.

>What about iterables that return, say, tuples? Maybe add a shape argument,
>so that fromiter(iterable, dtype, count, shape=(None, 3)) expects elements
>from iterable that can be turned into arrays of shape (3,)? That could
>replace count, too.
>  
>
I expect that this would double (or more) the complexity of the current 
code (which is nice and simple at present). I'm inclined to leave it as 
it is and advocate solutions of this type:

     >>> import numpy
     >>> tupleiter = ((x, x+1, x+2) for x in range(10)) # Just for example
     >>> def flatten(x):
    ...     for y in x:
    ...             for z in y:
    ...                     yield z
     >>> numpy.fromiter(flatten(tupleiter), int).reshape(-1, 3)
    array([[ 0,  1,  2],
           [ 1,  2,  3],
           [ 2,  3,  4],
           [ 3,  4,  5],
           [ 4,  5,  6],
           [ 5,  6,  7],
           [ 6,  7,  8],
           [ 7,  8,  9],
           [ 8,  9, 10],
           [ 9, 10, 11]])


[As a side note, I'm quite suprised that there isn't a way to flatten 
stuff already in itertools, but if there is, I can't find it].

-tim












More information about the Numpy-discussion mailing list