[Numpy-discussion] bad generator behaviour with sum
Tim Hochberg
tim.hochberg at ieee.org
Sun Aug 27 18:03:03 CDT 2006
Tom Denniston wrote:
> I was thinking about this in the context of Giudo's comments at scipy
> 2006 that much of the language is moving away from lists toward
> iterators. He gave the keys of a dict as an example.
>
> Numpy treats iterators, generators, etc as 0x0 PyObjects rather than
> lazy generators of n dimensional data. I guess my question for Travis
> (any others much more expert than I in numpy) is is this intentional
> or is it something that was never implemented because of the obvious
> subtlties of defiing the correct semantics to make this work.
>
More the latter than the former.
> Personally i find it no big deal to use array(list(iter)) in the 1d
> case and the list function combined with a list comprehension for the
> 2d case.
There is a relatively new function fromiter, that materialized the last
time this discussion came up that covers the above case. For example:
numpy.fromiter((i*i for i in range(10)), int)
> I usually know how many dimensions i expect so i find this
> easy and i know about this peculiar behavior. I find, however, that
> this behavior is very suprising and confusing to the new user and i
> don't usually have a good justification for it to answer them.
>
> The ideal semantics, in my mind, would be if an iterator of iterators
> of iterators, etc was no different in numpy than a list of lists of
> lists, etc. But I have no doubt that there are subtleties i am not
> considering. Has anyone more familiar than I with the bowels of numpy
> thought about this problem and see reasons why this is a bad idea or
> just prohibitively difficult to implement?
>
There was some discussion about this several months ago and I even set
out to implement it. I realized after not too long however that a
complete solution, as you describe above, was going to be difficult and
that I only really cared about the 1D case anyway, so punted and
implemented fromiter instead. As I recall, there are two issues that
complicate the general case:
1. You need to specify the type or you gain no advantage over just
instantiating the list. This is because you need to know the type
before you allocate space for the array. Normally you do this by
traversing the structure and looking at the contents. However for
an iterable, you have to stash the results when you iterate over
it looking for the type. This means that unless the array type is
specified up front, you might as well just convert everything to
lists as far as performance goes.
2. For 1D arrays you can get away without knowing the shape by doing
doing successive overallocation of memory, similar to the way list
and array.array work. This is what fromiter does. I suppose the
same tactic would work for iterators of iterators, but the
bookkeeping becomes quite daunting.
Issue 1 is the real killer -- because of that a solution would either
sometimes (mysteriously for the unitiated) be really inefficient or one
would be required to specify types for array(iterable). The latter is my
preference, but I'm beginning to think it would actually be better to
always have to specify types. It's tempting to take another stab at
this, in Python this time, and see if I can get a Python level soltuion
working. However I don't have the time to try it right now.
-tim
> On 8/27/06, Charles R Harris <charlesr.harris at gmail.com> wrote:
>
>> Hi,
>>
>> The problem seems to arise in the array constructor, which treats the
>> generator as a python object and creates an array containing that object.
>> So, do we want the possibility of an array of generators or should we
>> interpret it as a sort of list? I vote for that latter.
>>
>> Chuck
>>
>>
>> On 8/27/06, Charles R Harris <charlesr.harris at gmail.com> wrote:
>>
>>> Hi Christopher,
>>>
>>>
>>>
>>> On 8/27/06, Charles R Harris < charlesr.harris at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> On 8/27/06, listservs at mac.com <listservs at mac.com> wrote:
>>>>
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> It seems like numpy.sum breaks generator expressions:
>>>>>
>>>>> In [1]: sum(i*i for i in range(10))
>>>>> Out[1]: 285
>>>>>
>>>>> In [2]: from numpy import sum
>>>>>
>>>>> In [3]: sum(i*i for i in range(10))
>>>>> Out[3]: <generator object at 0x10eca58>
>>>>>
>>>>> Is this intentional? If so, how do I get the behaviour that I am
>>>>>
>> after?
>>
>>>>
>>>>
>>>>
>>>> In [3]: sum([i*i for i in range(10)])
>>>>
>>>> Out[3]: 285
>>>>
>>>> Chuck
>>>>
>>>
>>> The numarray.sum also fails to accept a generator as an argument. Because
>>>
>> python does and the imported sum overwrites it, we should probably check the
>> argument type and make it do the right thing.
>>
>>> Chuck
>>>
>>>
>>>
>>>
>> -------------------------------------------------------------------------
>> Using Tomcat but need to do more? Need to support web services, security?
>> Get stuff done quickly with pre-integrated technology to make your job
>> easier
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>
>>
>>
>>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>
More information about the Numpy-discussion
mailing list