[Numpy-discussion] changed behavior of numpy.histogram
Mark.Miller
mpmusu@cc.usu....
Wed Jan 23 12:03:20 CST 2008
Nah...no worries Stuart. Again, I recognize that what I was doing
deviated from the likely true intent of the histogram function. But it
was a nice convenient bit of code, for sure.
I'll take a look at your suggestion...it's different than what I
previously used. So, thanks for the input.
And yes...your description of a countunique() function is precisely what
I had in mind. Might be very useful if it could also work on object arrays.
Thanks again,
-Mark
Stuart Brorson wrote:
> Hi again --
>
> You made me feel guilty about breaking your code. Here's some
> suggested substitute code :
>
> In [10]: import numpy
>
> In [11]: a = numpy.array(('atcg', 'aaaa', 'atcg', 'actg', 'aaaa'))
>
> In [12]: b = numpy.sort(a)
>
> In [13]: c = numpy.unique(b)
>
> In [14]: d = numpy.searchsorted(b, c)
>
> In [15]: e = numpy.append(d[1:], len(a))
>
> In [16]: f = e - d
>
> In [17]:
>
> In [17]: print c
> ['aaaa' 'actg' 'atcg']
>
> In [18]: print f
> [2 1 2]
>
> Note that histogram also uses searchsorted to do its stuff.
>
> Personally, I think the way to go is have a "countunique" function
> which returns a list of unique occurrances of the array elements
> (regardless of their type), and a list of their count. The above code
> could be a basis for this fcn.
>
> I'm not sure that this should be implemented using histogram, since
> at least I ordinarily consider histogram as a numeric function.
> Others may have different opinions.
>
> Cheers,
>
> Stuart Brorson
> Interactive Supercomputing, inc.
> 135 Beaver Street | Waltham | MA | 02452 | USA
> http://www.interactivesupercomputing.com/
>
>
> On Wed, 23 Jan 2008, Mark.Miller wrote:
>
>> Greetings: I just noticed a changed behavior of numpy.histogram. I
>> think that a recent 'fix' to the code has changed my ability to use that
>> function (albeit in an unconventional manner). I previously used the
>> histogram function to obtain counts of each unique string within a
>> string array. Again, I recognize that it is not a typical use of the
>> histogram function, but it did work very nicely for me.
>>
>> Here's an example:
>>
>> ###numpy 1.0.3 --works just fine
>>>>> import numpy
>>>>> numpy.__version__
>> '1.0.3'
>>>>> a=numpy.array(('atcg', 'atcg', 'aaaa', 'aaaa'))
>>>>> a
>> array(['atcg', 'atcg', 'aaaa', 'aaaa'],
>> dtype='|S4')
>>>>> b=numpy.unique(a)
>>>>> numpy.histogram(a,b)
>> (array([2, 2]), array(['aaaa', 'atcg'],
>> dtype='|S4'))
>> ###numpy 1.0.4 --no longer functions
>>>>> import numpy
>>>>> numpy.__version__
>> '1.0.4'
>>>>> a=numpy.array(('atcg', 'atcg', 'aaaa', 'aaaa'))
>>>>> a
>> array(['atcg', 'atcg', 'aaaa', 'aaaa'],
>> dtype='|S4')
>>>>> b=numpy.unique(a)
>>>>> numpy.histogram(a,b)
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> File
>> "/opt/libraries/python/python-2.5.1/numpy-1.0.4-gnu/lib/python2.5/site-packages/numpy/lib/function_base.py",
>> line 154, in histogram
>> if(any(bins[1:]-bins[:-1] < 0)):
>> TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and
>> 'numpy.ndarray'
>> Is this something that can possibly be fixed (should I submit a ticket)?
>> Or should I revert to some other approaches for implementing the same
>> idea? It really was a nice convenience. Or, alternately, would some
>> sort of new function along the lines of a numpy.countunique() ultimately
>> be useful?
>>
>> Thanks,
>>
>> -Mark
>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
More information about the Numpy-discussion
mailing list