[SciPy-user] Handle large array

Robert Kern robert.kern@gmail....
Fri Jan 16 02:44:00 CST 2009


On Fri, Jan 16, 2009 at 02:23, Tan Tran <fragon25@yahoo.com> wrote:
> Hello,
>
> I'm trying to do some & like this
>
> xx = (d[:,0:1] == 0) & (d[:,2:3] == 2) & (d[:, 1:2]==1) & (d[:, 1:2]==2)
>
> If d is small, 19 columns and about 5000 rows, the code runs fine. But if I
> have large data like d has about 40k rows, I got error message: MemoryError
>
> I tried to make separate variable but still have problem when trying to &
> them
> aa = d[:,0:1] == 0
> bb =  d[:,2:3] == 2
> cc = d[:, 1:2]==1
> dd = d[:, 1:2]==2
>
> xx = aa & bb & cc & dd <-- MemoryError's here
>
> Have anybody seen this problem before? How to play with large data?

I usually chunk things up using iterators. For example:


def chunked_slices(ntotal, chunksize):
    nchunks, nlast = divmod(ntotal, chunksize)
    for i in range(nchunks):
        yield slice(i*chunksize, (i+1)*chunksize)
    if nlast > 0:
        penultimate = (i+1)*chunksize
        yield slice(penultimate, penultimate+nlast)

xx = np.empty([len(d)], dtype=bool)

for slc in chunked_slices(len(d), 1000):
    xx[slc] = (d[slc,0] == 0) & (d[slc,2] == 2) & (d[slc,1]==1) & (d[slc,1]==2)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the SciPy-user mailing list