[Numpy-discussion] memory usage question

Eric Firing efiring@hawaii....
Sun Jun 6 20:00:16 CDT 2010

On 06/06/2010 02:17 PM, Tom Kuiper wrote:
> Greetings all.
> I have a feeling that, coming at this with a background in FORTRAN and
> C, I'm missing some subtlety, possibly of an OO nature.   Basically, I'm
> looping over very large data arrays and memory usage just keeps growing
> even though I re-use the arrays.  Below is a stripped down version of
> what I'm doing.  You'll recognize it as gulping a great quantity of data
> (1 million complex samples), Fourier transforming these by 1000 sample
> blocks into spectra, co-adding the spectra, and doing this 255 times,
> for a grand 1000 point total spectrum.  At iteration 108 of the outer
> loop, I get a memory error.  By then, according to 'top', ipython (or
> python) is using around 85% of 3.5 GB of memory.
>      P = zeros(fft_size)
>    nsecs = 255
>    fft_size = 1000
>    for i in range(nsecs):
>      header,data = get_raw_record(fd_in)
>      num_bytes = len(data)
>      label, reclen, recver, softver, spcid, vsrid, schanid,
> bits_per_sample, \
>          ksamps_per_sec, sdplr, prdx_dss_id, prdx_sc_id, prdx_pass_num, \
>          prdx_uplink_band,prdx_downlink_band, trk_mode, uplink_dss_id,
> ddc_lo, \
>          rf_to_if_lo, data_error, year, doy, sec, data_time_offset, frov,
> fro, \
>          frr, sfro,rf_freq, schan_accum_phase, (scpp0,scpp1,scpp2,scpp3), \
>          schan_label = header
>      # ksamp_per_sec = 1e3, number of complex samples in 'data' = 1e6
>      num_32bit_words = len(data)*8/BITS_PER_32BIT_WORD
>      cmplx_samp_per_word = (BITS_PER_32BIT_WORD/(2*bits_per_sample))
>      cmplx_samples =
> unpack_vdr_data(num_32bit_words,cmplx_samp_per_word,data)
>      del(data) # This makes no difference
>      for j in range(0,ksamps_per_sec*1000/fft_size):
>        index = int(j*fft_size)
>        S = fft(cmplx_samples[index:index+fft_size])
>        P += S*conjugate(S)
>      del(cmplx_samples) # This makes no difference
>    if (i % 20) == 0:
>      gc.collect(0) # This makes no difference
>    P /= nsecs
>    sample_period = 1./ksamps_per_sec # kHz
>    f = fftfreq(fft_size, d=sample_period)
> What am I missing?

I don't know, but I would suggest that you strip the example down 
further: instead of reading data from a file, use numpy.random.randn to 
generate fake data as needed.  In other words, use only numpy 
functions--no readers, no unpackers.  Put this minimal script into a 
file and run it from the command line, not in ipython.  (Have you 
verified that you get the same result running a standalone script from 
the command line as running from ipython?)  Put a memory-monitoring step 
inside, maybe at each outer loop iteration.  You can use the 
matplotlib.cbook.report_memory function or similar:

def report_memory(i=0):  # argument may go away
     'return the memory consumed by process'
     from subprocess import Popen, PIPE
     pid = os.getpid()
     if sys.platform=='sunos5':
         a2 = Popen('ps -p %d -o osz' % pid, shell=True,
         mem = int(a2[-1].strip())
     elif sys.platform.startswith('linux'):
         a2 = Popen('ps -p %d -o rss,sz' % pid, shell=True,
         mem = int(a2[1].split()[1])
     elif sys.platform.startswith('darwin'):
         a2 = Popen('ps -p %d -o rss,vsz' % pid, shell=True,
         mem = int(a2[1].split()[0])

     return mem

I'm suspecting the problem may be in your data reader and/or unpacker, 
not in the application of numpy functions.  Also, ipython can confuse 
the issue by keeping references to objects.  In any case, with a simpler 
test script and regular memory monitoring, it should be easier for you 
to track down the problem.


> Best regards
> Tom
> p.s.  Many of you will see this twice, for which I apologize.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list