[Numpy-discussion] [Newbie] Fast plotting

Sebastian Stephan Berg sebastian@sipsolutions....
Tue Jan 6 08:45:31 CST 2009


Just thinking. If the parameters are limited, you may be able to use the
histogram feature? Doing one histogram with Y as weights, then one
without weights and calculating the mean from this yourself should be
pretty speedy I imagine. Other then that maybe sorting the whole thing
and then doing some searchsorted and side='right' and working on those
slices maybe. I mean something like this:

def spam(x, y, work_on_copy=False):
    """Take the arrays x and y and return
    unique_x_values, means, stds, maxs, mins
    as lists. means, stds, maxs and mins are those
    of the corresponding y values.
    If work_on_copy is true, x and y are copied to ensure
    that they are not sorted in place.
    u, means, stds, maxs, mins = [], [], [], [], []
    s = x.argsort()
    if work_on_copy:
        x = x[s]
        y = y[s]    
        x[:] = x[s]
        y[:] = y[s]

    start = 0
    value = x[0]
    while True:
        next = x.searchsorted(value, side='right')
        if next == len(x):
        value = x[next]    
        start = next
    return u, means, stds, maxs, mins

This is of course basically the same as what Francesc suggested, but a
quick test shows that it seems to scale better. I didn't try the speed
of histogram.


On Tue, 2009-01-06 at 10:35 +0100, Franck Pommereau wrote:
> Hi all, and happy new year!
> I'm new to NumPy and searching a way to compute from a set of points
> (x,y) the mean value of y values associated to each distinct x value.
> Each point corresponds to a measure in a benchmark (x = parameter,  y =
> computation time) and I'd like to plot the graph of mean computation
> time wrt parameter values. (I know how to plot, but not how to compute
> mean values.)
> My points are stored as two arrays X, Y (same size).
> In pure Python, I'd do as follows:
> s = {} # sum of y values for each distinct x (as keys)
> n = {} # number of summed values (same keys)
> for x, y in zip(X, Y) :
>     s[x] = s.get(x, 0.0) + y
>     n[x] = n.get(x, 0) + 1
> new_x = array(list(sorted(s)))
> new_y = array([s[x]/n[x] for x in sorted(s)])
> Unfortunately, this code is much too slow because my arrays have
> millions of elements. But I'm pretty sure that NumPy offers a way to
> handle this more elegantly and much faster.
> As a bonus, I'd be happy if the solution would allow me to compute also
> standard deviation, min, max, etc.
> Thanks in advance for any help!
> Franck
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

More information about the Numpy-discussion mailing list