# [Numpy-discussion] [Newbie] Fast plotting

Bruce Southey bsouthey@gmail....
Tue Jan 6 08:44:42 CST 2009

```Francesc Alted wrote:
> A Tuesday 06 January 2009, Franck Pommereau escrigué:
>
>> Hi all, and happy new year!
>>
>> I'm new to NumPy and searching a way to compute from a set of points
>> (x,y) the mean value of y values associated to each distinct x value.
>> Each point corresponds to a measure in a benchmark (x = parameter,  y
>> = computation time) and I'd like to plot the graph of mean
>> computation time wrt parameter values. (I know how to plot, but not
>> how to compute mean values.)
>>
>> My points are stored as two arrays X, Y (same size).
>> In pure Python, I'd do as follows:
>>
>> s = {} # sum of y values for each distinct x (as keys)
>> n = {} # number of summed values (same keys)
>> for x, y in zip(X, Y) :
>>     s[x] = s.get(x, 0.0) + y
>>     n[x] = n.get(x, 0) + 1
>> new_x = array(list(sorted(s)))
>> new_y = array([s[x]/n[x] for x in sorted(s)])
>>
>> Unfortunately, this code is much too slow because my arrays have
>> millions of elements. But I'm pretty sure that NumPy offers a way to
>> handle this more elegantly and much faster.
>>
>> As a bonus, I'd be happy if the solution would allow me to compute
>> also standard deviation, min, max, etc.
>>
>
> The next would do the trick:
>
> In [92]: x = np.random.randint(100,size=100)
>
> In [93]: y = np.random.rand(100)
>
> In [94]: u = np.unique(x)
>
> In [95]: means = [ y[x == i].mean() for i in u ]
>
> In [96]: stds = [ y[x == i].std() for i in u ]
>
> In [97]: maxs = [ y[x == i].max() for i in u ]
>
> In [98]: mins = [ y[x == i].min() for i in u ]
>
> and your wanted data will be in means, stds, maxs and mins lists.  This
> approach has the drawback that you have to process the array each time
> that you want to extract the desired info.  If what you want is to
> always retrieve the same set of statistics, you can do this in one
> single loop:
>
> In [99]: means, std, maxs, mins = [], [], [], []
>
> In [100]: for i in u:
>     g = y[x == i]
>     means.append(g.mean())
>     stds.append(g.std())
>     maxs.append(g.max())
>     mins.append(g.min())
>    .....:
>
> which has the same effect than above, but is much faster.
>
> Hope that helps,
>
>
If you use Knuth's one pass approach
(http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#III._On-line_algorithm)
you can write a function to get the min, max, mean and variance/standard
deviation in a single pass through the array rather than one pass for
each. I do not know if this will provide any advantage as that will
probably depend on the size of the arrays.