[Numpy-discussion] np.bincount raises MemoryError when given an empty array
josef.pktd@gmai...
josef.pktd@gmai...
Tue Feb 2 07:55:43 CST 2010
On Tue, Feb 2, 2010 at 8:53 AM, <josef.pktd@gmail.com> wrote:
> 2010/2/2 Ernest Adrogué <eadrogue@gmx.net>:
>> 2/02/10 @ 00:01 (-0700), thus spake Charles R Harris:
>>> On Mon, Feb 1, 2010 at 10:57 PM, <josef.pktd@gmail.com> wrote:
>>>
>>> > On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris
>>> > <charlesr.harris@gmail.com> wrote:
>>> > >
>>> > >
>>> > > On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
>>> > >>
>>> > >> On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris
>>> > >> <charlesr.harris@gmail.com> wrote:
>>> > >> >
>>> > >> >
>>> > >> > On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com>
>>> > >> > wrote:
>>> > >> >>
>>> > >> >> On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
>>> > >> >>
>>> > >> >> > I think this could be considered as a correct answer, the count of
>>> > >> >> > any
>>> > >> >> > integer is zero.
>>> > >> >>
>>> > >> >> Maybe, but this shape is random - it would be different in different
>>> > >> >> conditions, as the length of the returned array is just some random
>>> > >> >> memory location.
>>> > >> >>
>>> > >> >> >
>>> > >> >> > Returning an array with one zero, or the empty array or raising an
>>> > >> >> > exception? I don't see much of a pattern
>>> > >> >>
>>> > >> >> Since there is no obvious solution, the only rationale for not
>>> > raising
>>> > >> >> an exception I could see is to accommodate often-encountered special
>>> > >> >> cases. I find returning [0] more confusing than returning empty
>>> > >> >> arrays, though - maybe there is a usecase I don't know about.
>>> > >> >>
>>> > >> >
>>> > >> > In this case I would expect an empty input to be a programming error
>>> > and
>>> > >> > raising an error to be the right thing.
>>> > >>
>>> > >> Not necessarily, if you run the bincount over groups in a dataset and
>>> > >> your not sure if every group is actually observed. The main question,
>>> > >> is whether the user needs or wants to check for empty groups before or
>>> > >> after the loop over bincount.
>>> > >>
>>> > >
>>> > > How would they know which bin to check? This seems like an unlikely way
>>> > to
>>> > > check for an empty input.
>>> >
>>> > # grade (e.g. SAT) distribution by school and race
>>> > for s in schools:
>>> > for r in race:
>>> > print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])
>>> >
>>> > allwhite schools and allblack schools raise an exception.
>>> >
>>> > I just made up the story, my first attempt was: all sectors, all
>>> > firmsize groups, bincount something, will have empty cells for some
>>> > size groups, e.g. nuclear power in family business.
>>> >
>>> >
>>> OK, point taken. What do you think would be the best thing to do?
>>
>> In my opinion, returning an empty array makes more sense than
>> array([0]). An empty arrays means "there are no bins", whereas
>> an array of length 1 implies that there is one.
>
> Since bincount returns sometimes zero count bins, the implication is
> not necessarily true.
>
> But now I'm also in favor of the empty array, as a least surprise
> solution, and the user can decide whether, when or how to handle empty
> arrays.
>
>
> just one more example, before discovering bincount, I used histogram
> to count integers
>
without typo:
>>> x=np.arange(5); np.histogram(x[x == 7], bins=np.arange(7+1))
(array([0, 0, 0, 0, 0, 0, 0]), array([0, 1, 2, 3, 4, 5, 6, 7]))
>>> x=np.arange(5); np.histogram(x[x == 7], bins=[])
(array([], dtype=int32), array([], dtype=float64))
>
> Josef
>
>
>>
>> Cheers.
>>
>> Ernest
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
More information about the NumPy-Discussion
mailing list