[SciPy-user] Naive Question about Data Representations

Anne Archibald peridot.faceted@gmail....
Sat Jun 21 00:23:48 CDT 2008


2008/6/21 Brian Lewis <brian.lewis17@gmail.com>:

> How do we go from 2^32 addresses to 4 GiB?  To make this jump, it seems we
> associate each address with 1 B (8 bits).  Then 2^32 = 4 * 2^30 = 4
> Gibi-addresses = 4 GibiBytes.  I understand that now, but if we associated
> each address with something larger than 1 B, then the upper limit would be
> larger.  So why does 1 address == 1 B?  It seems that having a 32-bit
> processor/operating system, alone, is not the only constraint on why there
> is a 4 GiB upper bound.

> Now I understand that we can store 2^32 Bytes / 4 Bytes = 2^30 integers
> (upper bound).  Previously, I would have said:  "We have 2^32 locations,
> which is just 32 bits....each integer requires 32 bits....so we should only
> be able to store one integer".  Obviously, I know this not to be the case,
> and the missing link was that each location corresponded to 1 B.   But why?
> If we could associate each address with 2 Bytes, shouldn't the upper bound
> for a 32-bit system be 8 GiB instead?
>
> Relevance: I'm trying to understand the largest array (upper bounds) I can
> make on my system, and I am not so familiar with these things.
>
>>>> import struct; print struct.calcsize('P')
> 4
>
> This means each pointer will take up 4 Bytes. So it is the size of an
> integer, and I should be able to store 2^30, 32-bit integers (on a sunny
> day).  Approx: 4 GiB of RAM.
>
> Thanks again for your patience.   Please correct my mistakes and if
> possible, shed light on why each address represents 1 B.

It's a design decision. It's cumbersome to address memory on a finer
granularity than your addresses allow (though it is possible), so many
years ago computer designers settled on 8-bit bytes being the basic
unit of memory (for PCs). To get at memory in units smaller than a
byte you have to do bit-fiddling, which is slow and painful. Since
bytes are a reasonably common unit - for strings, for example - having
to do bit-fiddling to get at them would be a nuisance. That said, I
think some specialized architectures, for example some DSPs, do
exactly this, for efficiency. But since the only benefit would have
been a factor of four in address space, it didn't seem worth it -
especially as the decision was being made when a whole gigabyte of RAM
was barely imaginable. Since any change would have meant breaking
backward compatibility, nobody bothered until 64-bit addresses became
available, at which point it seemed moot again.

Anne


More information about the SciPy-user mailing list