[Numpy-discussion] memory usage (Emil Sidky)
Perry Greenfield
perry@stsci....
Wed Oct 15 14:09:27 CDT 2008
When you slice an array, you keep the original array in memory until
the slice is deleted. The slice uses the original array memory and is
not a copy. The second example explicitly makes a copy.
Perry
On Oct 15, 2008, at 2:31 PM, emil wrote:
>
>> Huang-Wen Chen wrote:
>>> Robert Kern wrote:
>>>>> from numpy import *
>>>>> for i in range(1000):
>>>>> a = random.randn(512**2)
>>>>> b = a.argsort(kind='quick')
>>>> Can you try upgrading to numpy 1.2.0? On my machine with numpy
>>>> 1.2.0
>>>> on OS X, the memory usage is stable.
>>>>
>>> I tried the code fragment on two platforms and the memory usage
>>> is also
>>> normal.
>>>
>>> 1. numpy 1.1.1, python 2.5.1 on Vista 32bit
>>> 2. numpy 1.2.0, python 2.6 on RedHat 64bit
>>
>> If I recall correctly, there were some major improvements in python's
>> memory management/garbage collection from version 2.4 to 2.5. If you
>> could try to upgrade your python to 2.5 (and possibly also your
>> numpy to
>> 1.2.0), you'd probably see some better behaviour.
>>
>> Regards,
>> Vincent.
>>
>
> Problem fixed. Thanks.
>
> But it turns out there were two things going on:
> (1) Upgrading to numpy 1.2 (even with python 2.4) fixed the memory
> usage
> for the loop with argsort in it.
> (2) Unfortunately, when I went back to my original program and ran it
> with the upgraded numpy, it still was chewing up tons of memory. I
> finally found the problem:
> Consider the following two code snippets (extension of my previous
> example).
> from numpy import *
> d = []
> for i in range(1000):
> a = random.randn(512**2)
> b = a.argsort(kind= 'quick')
> c = b[-100:]
> d.append(c)
>
> and
>
> from numpy import *
> d = []
> for i in range(1000):
> a = random.randn(512**2)
> b = a.argsort(kind= 'quick')
> c = b[-100:].copy()
> d.append(c)
>
> The difference being that c is a reference to the last 100 elements
> of b
> in the first example, while c is a copy of the last 100 in the second
> example.
> Both examples yield identical results (provide randn is run with the
> same seed value). But the former chews up tons of memory, and the
> latter
> doesn't.
> I don't know if this explanation makes any sense, but it is as if
> python
> has to keep all the generated b's around in the first example
> because c
> is only a reference.
>
> Anyway, bottom line is that my problem is solved.
> Thanks,
> Emil
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
More information about the Numpy-discussion
mailing list