[Numpy-discussion] advanced indexing bug with huge arrays?

Aronne Merrelli aronne.merrelli@gmail....
Mon Jan 23 14:04:22 CST 2012


On Mon, Jan 23, 2012 at 1:33 PM, Travis Oliphant <teoliphant@gmail.com>wrote:

> Can you determine where the problem is, precisely.    In other words, can
> you verify that c is not getting filled in correctly?
>
> You are no doubt going to get overflow in the summation as you have a
> uint8 parameter.   But, having that overflow be exactly '0' would be
> surprising.
>
> Can you verify that a and b are getting created correctly?   Also, 'c'
> should be a 2-d array, can you verify that?  Can you take the sum along the
> -1 axis and the 0 axis separately:
>
> print a.shape
> print b.shape
> print c.shape
>
> c[1000000:].sum(axis=0)
> d = c[1000000:].sum(axis=-1)
> print d[:100]
> print d[-100:]
>


I am getting the same results as David. It looks like c just "stopped
filling in" partway through the array. I don't think there is any overflow
issue, since the result of sum() is up-promoted to uint64 when I do that.
Travis, here are the outputs at my end - I cut out many zeros for brevity:

In [7]: print a.shape
(5000000, 972)
In [8]: print b.shape
(4993210,)
In [9]: print c.shape
(4993210, 972)

In [10]: c[1000000:].sum(axis=0)
Out[10]:
array([0, 0, 0, .... , 0])

In [11]: d = c[1000000:].sum(axis=-1)

In [12]: print d[:100]
[0 0 0 ... 0 0]

In [13]: print d[-100:]
[0 0 0 ... 0 0 0]

I looked at sparse subsamples with matplotlib - specifically,
imshow(a[::1000, :]) - and the a array looks correct (random values
everywhere), but c is zero past a certain row number. In fact, it looks
like it becomes zero at row 575419 - I think for all rows in c beyond row
574519, the values will be zero. For lower row numbers, I think they are
correctly filled (at least, by the sparse view in matplotlib).

In [15]: a[b[574519], 350:360]
Out[15]: array([143, 155,  11,  30, 212, 149, 110, 164, 165, 120],
dtype=uint8)

In [16]: c[574519, 350:360]
Out[16]: array([143, 155,  11,  30, 212, 149,   0,   0,   0,   0],
dtype=uint8)


I'm using EPD 7.1, numpy 1.6.1, Linux installation (I don't know the kernel
details)

HTH,
Aronne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120123/ad3e9033/attachment.html 


More information about the NumPy-Discussion mailing list