[SciPy-User] my own hist_equal (Vectorization gods required! ; )

David Baddeley david_baddeley@yahoo.com...
Thu Apr 8 17:59:22 CDT 2010

Hi Micheal,

I think the folowing should also work (& is probably faster):

def hist_equal(data):
    I = data.ravel().argsort()
    return numpy.linspace(0, 255, I.size)[I].reshape(data.shape)

this works because you can effectively create the cdf by plotting the sorted data on the x axis, with something going linearly form 0 to 1 on the y (rather than using histograms and cumsums). The downside of this approach is that intensity values which are the same in the input data are not going to be exactly the same in the output (they'll be spread over the interval between one value and the next highest). For practical usage (particularly seeing as histogram equalisation is a horribly non-linear operation to start with), this might well be a problem you can live with. 

If this isn't acceptable, you should be able to speed your code up a bit by doing the following: 
1) - precomputing round((cdf - cdfmin)*255/nPixels)
2) - getting rid of numpy.where (which tends to be slow) and just using boolean/fancy indexing (fast) eg:
nData[data == lum]


----- Original Message ----
From: Michael Aye <kmichael.aye@googlemail.com>
To: scipy-user@scipy.org
Sent: Fri, 9 April, 2010 4:11:50 AM
Subject: [SciPy-User] my own hist_equal (Vectorization gods required! ;)

Hi all,

I am implementing my own histogram equalization, as the few lines of
code that were mentioned before in this forum didn't really work for
me and I wanted to really understand what is going on there as well

So here is what I got so far, main effort was to wrap my brain around
the double lookup table I needed to find the right values to replace
and the new values, based on the transformed cdf.
So, it seems to work fine, but the remaining loop of course disturbs
me in terms of efficiency and I wondered if some of you vectorization
gods could have a look if there's something possible.
Especially when I have 16 bit images, there are a lot of different
luminances I have to loop through, so it's not the fastest.
Apart from that I think it can't be optimized further much (I think?)

def hist_equal(data):
    # range is +2 to have the highest luminance to get into correct
    bins = numpy.arange(data.min(), data.max() + 2)
    # first the histogram of luminances
    h, bins = numpy.histogram(data, bins=bins)
    # now get the cdf
    cdf = h.cumsum()
    # now get the unique luminance values
    uniques = numpy.unique(data)
    # this to not look it up again everytime in the numpy array
    nPixel = data.size
    cdfmin = cdf.min()
    # don't change the incoming data
    nData = data.copy()
    for lum in uniques:
        # algorithm taken from wikipedia entry for histogram
        # always scaling to 256 luminance values
        nData[numpy.where(data == lum)] = \
            numpy.round((cdf[numpy.where(bins == lum)] - cdfmin) *
255 / nPixel)
    return nData

I am thinking that this might even be useful to implement somewhere
(unless it is already and I missed it??)

Best regards,
SciPy-User mailing list


More information about the SciPy-User mailing list