[Numpy-discussion] Problems understanding histogram2d
Aronne Merrelli
aronne.merrelli@gmail....
Fri Jul 20 20:10:06 CDT 2012
On Fri, Jul 20, 2012 at 10:11 AM, Andreas Hilboll <lists@hilboll.de> wrote:
> Hi,
>
> I have a problem using histogram2d:
>
> from numpy import linspace, histogram2d
> bins_x = linspace(-180., 180., 360)
> bins_y = linspace(-90., 90., 180)
> data_x = linspace(-179.96875, 179.96875, 5760)
> data_y = linspace(-89.96875, 89.96875, 2880)
> histogram2d(data_x, data_y, (bins_x, bins_y))
>
> AttributeError: The dimension of bins must be equal to the dimension of
> the sample x.
>
> I would expect histogram2d to return a 2d array of shape (360,180), which
> is full of 256s. What am I missing here?
>
It is a joint histogram, so the x and y inputs represent each
dimension of a 2-dimensional sample. So, the x and y arrays must be
the same length. (the documentation does appear to be incorrect here).
The bins do not need to have the same length. Here is your example
adjusted (with many fewer bins so I could print it in the console) -
note since you just have two "ramps" from linspace as the data, most
of the points are near the diagonal.
In [15]: bins_x = linspace(-180,180,6)
In [16]: bins_y = linspace(-90,90,4)
In [17]: data_x = linspace(-179.96875, 179.96875, 2880)
In [18]: data_y = linspace(-89.96875, 89.96875, 2880)
In [19]: H, x_edges, y_edges = np.histogram2d(data_x, data_y, (bins_x, bins_y))
In [20]: H
Out[20]:
array([[ 576., 0., 0.],
[ 384., 192., 0.],
[ 0., 576., 0.],
[ 0., 192., 384.],
[ 0., 0., 576.]])
In [21]: x_edges
Out[21]: array([-180., -108., -36., 36., 108., 180.])
In [22]: y_edges
Out[22]: array([-90., -30., 30., 90.])
So, back to that AttributeError - it is clearly unhelpful. Looking
through the code, it looks like the x,y input arrays are joined into a
2D array with a numpy core function 'atleast_2d'. If this function
sees inputs that are not the same length, it actually produces a
2-element numpy object array:
In [57]: data_x.shape, data_y.shape
Out[57]: ((5760,), (2880,))
In [58]: data_xy = atleast_2d([data_x, data_y])
In [59]: data_xy.shape, data_xy.dtype
Out[59]: ((1, 2), dtype('object'))
In [60]: data_xy[0,0].shape, data_xy[0,1].shape
Out[60]: ((5760,), (2880,))
If the x, y array have the same length this looks a lot more logical:
In [62]: data_x.shape, data_y.shape
Out[62]: ((2880,), (2880,))
In [63]: data_xy = atleast_2d([data_x, data_y])
In [64]: data_xy.shape, data_xy.dtype
Out[64]: ((2, 2880), dtype('float64'))
So, that Assertion error comes up histogramdd (which actually does the
work), expects the data array to be [Ndimension, Nsample], and the
number of dimensions is set by the number of bin arrays that were
input (2). Since it sees that [1,2] shaped object array, it treats
that as a 2-element, 1-dimension dataset; thus, at that level, the
AssertionError actually makes sense.
Hope that helps,
Aronne
More information about the NumPy-Discussion
mailing list