[Numpy-discussion] A random.normal function with stdev as array]
tim.hochberg at cox.net
Wed Apr 5 09:58:08 CDT 2006
Eric Emsellem wrote:
> this is illuminating in fact. These are things I would not have
> thought about.
> I am trying at the moment to understand why two versions of my program
> have a difference of about 10% (here it is 2sec for 1000 points, so
> you can imagine for 10^6...) although the code is nearly the same.
> I have loops such as:
> bigarray = array of Nbig points
> for i in range(N) :
> bigarray = bigarray + calculation
If you tell us more about calculation, we could probably help more. This
sounds like you want to vectorize the inner loop, but you may in fact
have already done that. There's nothing wrong with looping in python as
long as you amortize the loop overhead over a large number of
operations. Thus, the advice to vectorize your *inner* loop, not
vectorize all loops. Attempting the latter can lead to impenatrable
code, usually doesn't help signifigantly and sometimes slows things down
as you overflow the cache with big matrices.
> I thought to do it by:
> bigarray = numpy.sum(array([calculation for i in range(N)]))
> not sure this is good...
I suspect not, but timeit is your friend....
> And you are basically saying that
> bigarray = bigarray + calculation
> is better than
> bigarray += calculation
> or is it strictly equivalent? (in terms of CPU...)
Actually the reverse. "bigarray += calculation" should be better in
terms of both speed and memory usage. In this case it's also clearer, so
it's an improvement all around. They both do the same number of adds,
but the first allocates more memory and pushes more data back and forth
between main memory and the cache.
The point I was making about += verus + was that I wouldn't in general
a = some_func()
a += something_else
a = some_func() + something_else
because it's less clear. In cases, where you do need really need the
speed, it's fine, but most of the time that's not the case. In your
case, the speedup is fairly minor, I believe because random.normal is
fairly expensive. If you instead compare these two ways of computing a
cube, you'll see a much larger difference (37%).
>>> setup = "import numpy; stddev=numpy.arange(1e6,dtype=float)%3"
>>> timeit.Timer('stddev * stddev * stddev', setup).timeit(20)
>>> timeit.Timer('result = stddev*stddev; result *= stddev',
However, if you work with smaller matrices, the effect almost disappears
>>> setup = "import numpy; stddev=numpy.arange(1e4,dtype=float)%3"
>>> timeit.Timer('result = stddev*stddev; result *= stddev', setup).time
>>> timeit.Timer('stddev * stddev * stddev', setup).timeit(2000)
I believe that's because the speedup is nearly all due to reducing the
amount of data you move around. In the second case everything fits in
the cache, so this effect is minor. In the first you are pushing data
back and forth to main memory so it's fairly large. On my machine these
sort of effects kick in somewhere between 10,000 and 100,000 elements.
> thanks for the help, and sorry for the dum questions
Not a problem. These are all legitimate questions that you can't really
be expected to know without a fair amount of experience with numpy or
its predecessors. It would be cool if someone added a page to the wicki
on the topic so we could start collecting and orgainizing this
information. For all I know there's one already there though -- I
should probably check.
> Tim Hochberg wrote:
>> Eric Emsellem wrote:
>>>> Since stdev essentially scales the output of random, wouldn't the
>>>> followin be equivalent to the above?
>>>> result = numpy.random.normal(0, 1, stddev.shape)
>>>> result *= stdev
>>> yes indeed, this is a good option where in fact I could do
>>> result = stddev * numpy.random.normal(0, 1, stddev.shape)
>>> in one line.
>>> thanks for the tip
>> Indeed you can. However, keep in mind that the one line version is
>> equivalent to:
>> temp = numpy.random.normal(0, 1, stddev.shape)
>> result = stddev * temp
>> That is, it creates an extra temporary variable only to throw it
>> away. The two line version I posted above avoids that temporary and
>> thus should be both faster and less memory hungry. It's always good
>> to check these things however:
>> >>> setup = "import numpy; stddev=numpy.arange(1e6,dtype=float)%3"
>> >>> timeit.Timer('stddev * numpy.random.normal(0, 1, stddev.shape)',
>> >>> timeit.Timer('result = numpy.random.normal(0, 1, stddev.shape);
>> result*=stddev', setup).timeit(20)
>> So, yes, the two line method is marginally faster (about 10%). Most
>> of the time you shouldn't care about this: the one line version is
>> clearer and most of the code you write isn't a bottleneck. Starting
>> out writing this as the two line version is premature optimization. I
>> used it here since the question was about optimization .
>> I see Robert Kern just posted my list. If you want to put this in
>> terms of that list, then:
>> 0. Think about your algorithm
>> => Recognize that stddev is a scale parameter
>> 1. Vectorize your inner loop.
>> => This is a no brainer after 0 resulting in the one line version
>> 2. Eliminate temporaries
>> => This results in the two line version.
>> Also key here is recognizing when to stop. Steps 0 is always
>> appropriate and step 1 is almost always good, resulting in code that
>> is both clearer and faster. However, once you get to step 2 and
>> beyond you tend to trade speed/memory usage for clarity. Not always:
>> sometime *= and friends are clearer, but often, particularly if you
>> start resorting to three arg ufuncs. So, my advice is to stop
>> optimizing as soon as your code is fast enough.
>>> (of course this is not strictly equivalent depending on the random
>>> generator, but that will be fine for my purpose)
>> I'll have to take your word for it -- after the normal distribution
>> my knowledge in the area peters out rapidly/
More information about the Numpy-discussion