[SciPy-dev] Statistical review month : weighted histogram and cumfreq

Robert Kern robert.kern at gmail.com
Tue Apr 11 19:53:28 CDT 2006

David Huard wrote:
> I recently had to compute a weighted cumulative frequency distribution
> so I modified the scipy.stats.histogram and scipy.stats.cumfreq
> fonctions. I added a key in both function call, namely weight=None,
> where the default is simply uniform weights. I wanted to ask if this
> change would be welcome before submitting the patch.

Well, it's frequently difficult to talk about whether a change is welcome or not
without seeing the code, so please submit the patch to the tracker.


> My concern is that
> the change modifies the result returned by the function. Presently, the
> histogram and cumfreq functions return integers arrays, the number of
> items lying in certain intervals. When these items are weighted, an
> integer count doesn't make much sense, and I normalized the histogram
> and cumfreq results. In other words, the new histogram function returns
> a float array of the frequency, instead of a count. I feel that having a
> normalized output is more pratical, but it would ruin existing code.
> There is always the possibility of creating a whistogram and wcumfreq
> functions, but this is not a pretty solution.

My feeling is that there are a lot of ways to compute histograms. A lot of those
choices are orthogonal to each other. Also largely orthogonal are the ways in
which you might *use* a histogram. Trying to manage all of those choices with
keyword arguments or differently-named functions is a nightmare. This is a place
for classes. For example, Konrad Hinsen has nice Histogram and WeightedHistogram
classes in Scientific.

I think we should leave the interface of scipy.histogram() alone and write a
Histogram class instead.

Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

More information about the Scipy-dev mailing list