[Numpy-discussion] changed behavior of numpy.histogram

Mark.Miller mpmusu@cc.usu....
Wed Jan 23 12:03:20 CST 2008


Nah...no worries Stuart.  Again, I recognize that what I was doing 
deviated from the likely true intent of the histogram function.  But it 
was a nice convenient bit of code, for sure.

I'll take a look at your suggestion...it's different than what I 
previously used.  So, thanks for the input.

And yes...your description of a countunique() function is precisely what 
I had in mind.  Might be very useful if it could also work on object arrays.

Thanks again,

-Mark


Stuart Brorson wrote:
> Hi again --
> 
> You made me feel guilty about breaking your code.  Here's some
> suggested substitute code :
> 
> In [10]: import  numpy
> 
> In [11]: a = numpy.array(('atcg', 'aaaa', 'atcg', 'actg', 'aaaa'))
> 
> In [12]: b = numpy.sort(a)
> 
> In [13]: c = numpy.unique(b)
> 
> In [14]: d = numpy.searchsorted(b, c)
> 
> In [15]: e = numpy.append(d[1:], len(a))
> 
> In [16]: f = e - d
> 
> In [17]:
> 
> In [17]: print c
> ['aaaa' 'actg' 'atcg']
> 
> In [18]: print f
> [2 1 2]
> 
> Note that histogram also uses searchsorted to do its stuff.
> 
> Personally, I think the way to go is have a "countunique" function
> which returns a list of unique occurrances of the array elements
> (regardless of their type), and a list of their count.  The above code
> could be a basis for this fcn.
> 
> I'm not sure that this  should be implemented using histogram, since
> at least I ordinarily consider histogram as a numeric function.
> Others may have different opinions.
> 
> Cheers,
> 
> Stuart Brorson
> Interactive Supercomputing, inc.
> 135 Beaver Street | Waltham | MA | 02452 | USA
> http://www.interactivesupercomputing.com/
> 
> 
> On Wed, 23 Jan 2008, Mark.Miller wrote:
> 
>> Greetings:  I just noticed a changed behavior of numpy.histogram.  I
>> think that a recent 'fix' to the code has changed my ability to use that
>> function (albeit in an unconventional manner).  I previously used the
>> histogram function to obtain counts of each unique string within a
>> string array.  Again, I recognize that it is not a typical use of the
>> histogram function, but it did work very nicely for me.
>>
>> Here's an example:
>>
>> ###numpy 1.0.3  --works just fine
>>>>> import numpy
>>>>> numpy.__version__
>> '1.0.3'
>>>>> a=numpy.array(('atcg', 'atcg', 'aaaa', 'aaaa'))
>>>>> a
>> array(['atcg', 'atcg', 'aaaa', 'aaaa'],
>>       dtype='|S4')
>>>>> b=numpy.unique(a)
>>>>> numpy.histogram(a,b)
>> (array([2, 2]), array(['aaaa', 'atcg'],
>>       dtype='|S4'))
>> ###numpy 1.0.4  --no longer functions
>>>>> import numpy
>>>>> numpy.__version__
>> '1.0.4'
>>>>> a=numpy.array(('atcg', 'atcg', 'aaaa', 'aaaa'))
>>>>> a
>> array(['atcg', 'atcg', 'aaaa', 'aaaa'],
>>       dtype='|S4')
>>>>> b=numpy.unique(a)
>>>>> numpy.histogram(a,b)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File
>> "/opt/libraries/python/python-2.5.1/numpy-1.0.4-gnu/lib/python2.5/site-packages/numpy/lib/function_base.py",
>> line 154, in histogram
>>     if(any(bins[1:]-bins[:-1] < 0)):
>> TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and
>> 'numpy.ndarray'
>> Is this something that can possibly be fixed (should I submit a ticket)?
>>  Or should I revert to some other approaches for implementing the same
>> idea?  It really was a nice convenience.  Or, alternately, would some
>> sort of new function along the lines of a numpy.countunique() ultimately
>> be useful?
>>
>> Thanks,
>>
>> -Mark
>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion



More information about the Numpy-discussion mailing list