[Numpy-discussion] memory usage (Emil Sidky)

Perry Greenfield perry@stsci....
Wed Oct 15 14:09:27 CDT 2008


When you slice an array, you keep the original array in memory until  
the slice is deleted. The slice uses the original array memory and is  
not a copy. The second example explicitly makes a copy.

Perry


On Oct 15, 2008, at 2:31 PM, emil wrote:

>
>> Huang-Wen Chen wrote:
>>> Robert Kern wrote:
>>>>> from numpy import *
>>>>> for i in range(1000):
>>>>>   a = random.randn(512**2)
>>>>>   b = a.argsort(kind='quick')
>>>> Can you try upgrading to numpy 1.2.0? On my machine with numpy  
>>>> 1.2.0
>>>> on OS X, the memory usage is stable.
>>>>
>>> I tried the code fragment on two platforms and the memory usage  
>>> is also
>>> normal.
>>>
>>> 1. numpy 1.1.1, python 2.5.1 on Vista 32bit
>>> 2. numpy 1.2.0, python 2.6 on RedHat 64bit
>>
>> If I recall correctly, there were some major improvements in python's
>> memory management/garbage collection from version 2.4 to 2.5. If you
>> could try to upgrade your python to 2.5 (and possibly also your  
>> numpy to
>> 1.2.0), you'd probably see some better behaviour.
>>
>> Regards,
>> Vincent.
>>
>
> Problem fixed. Thanks.
>
> But it turns out there were two things going on:
> (1) Upgrading to numpy 1.2 (even with python 2.4) fixed the memory  
> usage
> for the loop with argsort in it.
> (2) Unfortunately, when I went back to my original program and ran it
> with the upgraded numpy, it still was chewing up tons of memory. I
> finally found the problem:
> Consider the following two code snippets (extension of my previous  
> example).
> from numpy import *
> d = []
> for i in range(1000):
>    a = random.randn(512**2)
>    b = a.argsort(kind= 'quick')
>    c = b[-100:]
>    d.append(c)
>
> and
>
> from numpy import *
> d = []
> for i in range(1000):
>    a = random.randn(512**2)
>    b = a.argsort(kind= 'quick')
>    c = b[-100:].copy()
>    d.append(c)
>
> The difference being that c is a reference to the last 100 elements  
> of b
> in the first example, while c is a copy of the last 100 in the second
> example.
> Both examples yield identical results (provide randn is run with the
> same seed value). But the former chews up tons of memory, and the  
> latter
> doesn't.
> I don't know if this explanation makes any sense, but it is as if  
> python
> has to keep all the generated b's around in the first example  
> because c
> is only a reference.
>
> Anyway, bottom line is that my problem is solved.
> Thanks,
> Emil
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion



More information about the Numpy-discussion mailing list