[Numpy-discussion] np.bincount raises MemoryError when given an empty array

josef.pktd@gmai... josef.pktd@gmai...
Mon Feb 1 23:57:28 CST 2010


On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>
>
> On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
>>
>> On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris
>> <charlesr.harris@gmail.com> wrote:
>> >
>> >
>> > On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com>
>> > wrote:
>> >>
>> >> On Tue, Feb 2, 2010 at 1:05 PM,  <josef.pktd@gmail.com> wrote:
>> >>
>> >> > I think this could be considered as a correct answer, the count of
>> >> > any
>> >> > integer is zero.
>> >>
>> >> Maybe, but this shape is random - it would be different in different
>> >> conditions, as the length of the returned array is just some random
>> >> memory location.
>> >>
>> >> >
>> >> > Returning an array with one zero, or the empty array or raising an
>> >> > exception? I don't see much of a pattern
>> >>
>> >> Since there is no obvious solution, the only rationale for not raising
>> >> an exception  I could see is to accommodate often-encountered special
>> >> cases. I find returning [0] more confusing than returning empty
>> >> arrays, though - maybe there is a usecase I don't know about.
>> >>
>> >
>> > In this case I would expect an empty input to be a programming error and
>> > raising an error to be the right thing.
>>
>> Not necessarily, if you run the bincount over groups in a dataset and
>> your not sure if every group is actually observed. The main question,
>> is whether the user needs or wants to check for empty groups before or
>> after the loop over bincount.
>>
>
> How would they know which bin to check? This seems like an unlikely way to
> check for an empty input.

# grade (e.g. SAT) distribution by school and race
for s in schools:
    for r in race:
      print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])

allwhite schools and allblack schools raise an exception.

I just made up the story, my first attempt was: all sectors, all
firmsize groups, bincount something, will have empty cells for some
size groups, e.g. nuclear power in family business.

Josef

>
>>
>> Like
>> >>> np.sum([])
>> 0.0
>> >>> sum([])
>> 0
>> the empty array or the array([0]) can be considered as the default
>> argument. In this case it is not really a programming error.
>>
>
> I like that better than an empty array.
>
>>
>> Since bincount usually returns redundant zero count unless
>> np.unique(data) = np.arange(data.max()+1),
>> array([0]) would also make sense as a minimum answer
>> >>> np.bincount([7,8,9])
>> array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1])
>>
>> I use bincount quite a lot but only with fixed sized arrays, so I
>> never actually used it in this way (yet).
>>
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


More information about the NumPy-Discussion mailing list