[SciPy-User] my own hist_equal (Vectorization gods required! ; )

Wes McKinney wesmckinn@gmail....
Thu Apr 8 18:37:54 CDT 2010


On Thu, Apr 8, 2010 at 6:59 PM, David Baddeley
<david_baddeley@yahoo.com.au> wrote:
> Hi Micheal,
>
> I think the folowing should also work (& is probably faster):
>
> def hist_equal(data):
>    I = data.ravel().argsort()
>    return numpy.linspace(0, 255, I.size)[I].reshape(data.shape)
>
> this works because you can effectively create the cdf by plotting the sorted data on the x axis, with something going linearly form 0 to 1 on the y (rather than using histograms and cumsums). The downside of this approach is that intensity values which are the same in the input data are not going to be exactly the same in the output (they'll be spread over the interval between one value and the next highest). For practical usage (particularly seeing as histogram equalisation is a horribly non-linear operation to start with), this might well be a problem you can live with.
>
> If this isn't acceptable, you should be able to speed your code up a bit by doing the following:
> 1) - precomputing round((cdf - cdfmin)*255/nPixels)
> 2) - getting rid of numpy.where (which tends to be slow) and just using boolean/fancy indexing (fast) eg:
> nData[data == lum]
>
> David
>
> ----- Original Message ----
> From: Michael Aye <kmichael.aye@googlemail.com>
> To: scipy-user@scipy.org
> Sent: Fri, 9 April, 2010 4:11:50 AM
> Subject: [SciPy-User] my own hist_equal (Vectorization gods required! ;)
>
> Hi all,
>
> I am implementing my own histogram equalization, as the few lines of
> code that were mentioned before in this forum didn't really work for
> me and I wanted to really understand what is going on there as well
> anyway.
>
> So here is what I got so far, main effort was to wrap my brain around
> the double lookup table I needed to find the right values to replace
> and the new values, based on the transformed cdf.
> So, it seems to work fine, but the remaining loop of course disturbs
> me in terms of efficiency and I wondered if some of you vectorization
> gods could have a look if there's something possible.
> Especially when I have 16 bit images, there are a lot of different
> luminances I have to loop through, so it's not the fastest.
> Apart from that I think it can't be optimized further much (I think?)
>
> def hist_equal(data):
>    # range is +2 to have the highest luminance to get into correct
> bin
>    bins = numpy.arange(data.min(), data.max() + 2)
>    # first the histogram of luminances
>    h, bins = numpy.histogram(data, bins=bins)
>    # now get the cdf
>    cdf = h.cumsum()
>    # now get the unique luminance values
>    uniques = numpy.unique(data)
>    # this to not look it up again everytime in the numpy array
>    nPixel = data.size
>    cdfmin = cdf.min()
>    # don't change the incoming data
>    nData = data.copy()
>    for lum in uniques:
>        # algorithm taken from wikipedia entry for histogram
> equalization
>        # always scaling to 256 luminance values
>        nData[numpy.where(data == lum)] = \
>            numpy.round((cdf[numpy.where(bins == lum)] - cdfmin) *
> 255 / nPixel)
>    return nData
>
> I am thinking that this might even be useful to implement somewhere
> (unless it is already and I missed it??)
>
> Best regards,
> Michael
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

In my experience, for setting values using a boolean mask like the
above, numpy.putmask is significantly faster than fancy indexing. This
is also true of numpy.take versus fancy indexing (but I don't know if
that can be used here)

- Wes


More information about the SciPy-User mailing list