[Numpy-discussion] diff of find of diff trick

Norbert Nemec Norbert.Nemec.list@gmx...
Thu Oct 11 10:34:41 CDT 2007

Basically, what you want to do is a histogram. Numpy has that
functionality built in. However: the version built in to numpy is about
as suboptimal as yours. The problem is the unnecessary sorting of the data.

In principle, a histogram does not need any sorting, so it could be done
in strictly O(N). I don't see any simply way of doing so efficiently in
numpy without resorting to a costly loop. We had a discussion about this
a while ago but I cannot remember any resolution. The proper thing to do
would probably be a C routine.

If performance is crucial to you, I would suggest using weave. A minimal
example as starting point is given below.


#!/usr/bin/env python

import numpy
import scipy.weave
import scipy.weave.converters

def histogram(data, nR):
    res = numpy.zeros(nR,int)
    dataflat = data.flatten()
    Ndata = len(dataflat)
        for(int i=0;i<Ndata;i++) {
            int d = dataflat(i);
            if(d < nR && d >= 0) {
                res(d) += 1;
        arg_names = ["dataflat","res","Ndata","nR"],
        type_converters = scipy.weave.converters.blitz,
    return res

a = numpy.array([0,3,1,2,3,5,2,3,2,4,5,1,3,4,2])
print a
print histogram(a,4)

Robin wrote:
> Hello,
> As a converting MATLAB user I am bringing over some of my code. A core
> function I need is to sample the probability distribution of a given
> set of data. Sometimes these sets are large so I would like to make it
> as efficient as possible. (the data are integers representing members
> of a discrete space)
> In MATLAB the best way I found was the "diff of find of diff" trick
> which resulted in the completely vectorised solution (below). Does it
> make sense to translate this into numpy? I don't have a feel yet for
> what is fast/slow - are the functions below built in and so quick (I
> thought diff might not be).
> Is there a better/more pythonic way to do it?
> --------
> function Pr=prob(data, nR)
> Pr    = zeros(nR,1);
> % diff of find of diff trick for counting number of elements
> temp     = sort(data(data>0));    % want vector excluding P(0)
> dtemp    = diff([temp;max(temp)+1]);
> count     = diff(find([1;dtemp]));
> indx     = temp(dtemp>0);
> Pr(indx)= count ./ numel(data);    % probability
> --------
> Thanks
> Robin
> ------------------------------------------------------------------------
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

More information about the Numpy-discussion mailing list