# [Numpy-discussion] Crosstabulation

Vincent Schut schut@sarvision...
Tue Jul 20 03:44:20 CDT 2010

```
On 07/19/2010 10:14 PM, Friedrich Romstedt wrote:
> 2010/7/19 sandric ionut<sandricionut@yahoo.com>:
>> For land-use a class would be for example forest, other would be orchard
>> etc. For Slope gradient I would have values which<3 and between 3 and 7
>> etc. So, I will have 2 raster data with, let's say, 3 classes each: forest,
>> orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The
>> cross-tabulation analysis should give me a table like:
>>
>>              forest orchards built-up
>> 0-3         10         20           15
>> 3-15       5           10           20
>> 15-35     5           15           15
>>
>> where the numbers represents all the common cells, for example: 10 cells
>> with forest correspond to 10 cells with 0-3 slope gradient interval and so
>> on
>> (by cells I mean the pixel from a raster data)
>
> Okay everything is clear now.  I would suggestest looping over the
> cells of the table.  E.g. when:
>
> landuse_catmap
> slope_map
>
> are the respective maps, you can categorise the slope_map with
>
> slope_catmap = 0 * (0<= slope_map) * (slope_map<  3) + 1 * (3<=
> slope_map) * (slope_map<  15) + 2 * (15<= slope_map)
>
> to get categories 0, 1, and 2, where the zero case is included here
> only for illustration, since it's just zero it can be omitted.
>
> Then you are maybe supposed to do:
>
> table = numpy.zeros((N_slope_cats, N_landuse_cats))
>
> and finally the looping over its elements, where the looping over the
> map cells is done entirely by numpy:
>
> for slope_cat in xrange(0, N_slope_cats):
> 	for landuse_cat in xrange(0, N_landuse_cats):
> 		table[slope_cat, landuse_cat] = \
> 			((slope_catmap == slope_cat) * \
> 			(landuse_catmap == landuse_cat)).sum()
>
> I believe there is some slightly faster but more obscure way doing
> this table creation in one single large step by putting the maps into
> a hyperdimensianal array but I also believe this is beyond our scope
> here if this is already fast enough.

That 'slightly faster but more obscure way' would be histogram2d :-)
Especially when you have many landuse and/or slope classes over which to
loop, histogram2d will be more than just slightly faster. About obscure,
well, I leave that to the user. Probably depends on your background.
Personally I use it all the time. Loops are evil :-)

your 'more obscure' code would become something like this:

slope_bin_edges = [0, 3, 15, 35]
landuse_bin_edges = [0, 1, 2, 3]
crosstab = numpy.histogram2d(landuse, slope, bins=(landuse_bin_edges,
slope_bin_edges))

VS.
>
> Friedrich

```