A Sunday 23 March 2008, Francesc Altet escrigué:
> A Sunday 23 March 2008, Anne Archibald escrigué:
> > On 23/03/2008, Damian Eads <eads@soe.ucsc.edu> wrote:
> > > Hi,
> > >
> > > I am working on a memory-intensive experiment with very large
> > > arrays so I must be careful when allocating memory. Numpy already
> > > supports a number of in-place operations (+=, *=) making the task
> > > much more manageable. However, it is not obvious to me out I set
> > > values based on a very simple condition.
> > >
> > > The expression
> > >
> > > y[y<0]=-1
> > >
> > > generates a binary index mask y>=0 of the same size as the array
> > > y, which is problematic when y is quite large.
> > >
> > > I was wondering if there was anything like a set_where(A, cmp,
> > > B, setval, [optional elseval]) function where cmp would be a
> > > comparison operator expressed as a string.
> > >
> > > The code below illustrates what I want to do. Admittedly, it
> > > needs to be cleaned up but it's a proof of concept. Does numpy
> > > provide any functions that support the functionality of the code
> > > below?
> >
> > That's a good question, but I'm pretty sure it doesn't, apart from
> > numpy.clip(). The way I'd try to solve that problem would be with
> > the dreaded for loop. Don't iterate over single elements, but if
> > you have a gargantuan array, working in chunks of ten thousand (or
> > whatever) won't have too much overhead:
> >
> > block = 100000
> > for n in arange(0,len(y),block):
> > yc = y[n:n+block]
> > yc[yc<0] = -1
> >
> > It's a bit of a pain, but working with arrays that nearly fill RAM
> > *is* a pain, as I'm sure you are all too aware by now.
> >
> > You might look into numexpr, this is the sort of thing it does
> > (though I've never used it and can't say whether it can do this).
>
> Well, Numexpr is designed to minimize the number of temporaries, and
> can do what Damian wants without requiring to put the mask in a
> temporary. However, the output will require new space. The usage
> should be something like:
>
> In [11]: y = numpy.random.normal(0, 10, 10)
>
> In [12]: numexpr.evaluate('where(y<0, -1, y)')
> Out[12]:
> array([ 7.11784295, -1. , 10.92876842, -1. ,
> 0.76092629, -1. , 14.07021792, -1. ,
> 5.67173405, 31.28631822])
Ops. I realised that, for this particular case, Numexpr memory usage is
similar to its NumPy counterpart:
y[:] = numpy.where(y<0, -1, y)
So, I think the best option for you should be working with chunks, as
Anne suggested.
