[SciPy-user] Naive Question about Data Representations

David Warde-Farley dwf@cs.toronto....
Sat Jun 21 02:58:16 CDT 2008


On 21-Jun-08, at 12:47 AM, Brian Lewis wrote:

> Now I understand that we can store 2^32 Bytes / 4 Bytes = 2^30  
> integers (upper bound).  Previously, I would have said:  "We have  
> 2^32 locations, which is just 32 bits....each integer requires 32  
> bits....so we should only be able to store one integer".  Obviously,  
> I know this not to be the case, and the missing link was that each  
> location corresponded to 1 B.   But why?  If we could associate each  
> address with 2 Bytes, shouldn't the upper bound for a 32-bit system  
> be 8 GiB instead?

Anne's reply is far more astute than I could ever manage; suffice it  
to say, for a variety of reasons (ASCII being one of them, I think),  
it's just The Way It Is on all modern CPUs, and changing it would  
require fabricating new chips and updating everything from the OS  
upward to work with the new addressing scheme.

> Relevance: I'm trying to understand the largest array (upper bounds)  
> I can make on my system, and I am not so familiar with these things.
>
> >>> import struct; print struct.calcsize('P')
> 4
>
> This means each pointer will take up 4 Bytes. So it is the size of  
> an integer, and I should be able to store 2^30, 32-bit integers (on  
> a sunny day).  Approx: 4 GiB of RAM.

Yes, though keep in mind that on modern multi-tasking OSes, you've got  
some overhead from the kernel and the always-running services.

Also there's virtual memory to think about, where all programs are  
tricked into thinking there's more memory than actually available, and  
the deficit is just handled by swapping out unused portions to disk  
(all transparent to the user). The address space limitation still  
holds though, in terms of the upper limit of what you can _address_  
(even though the totality of the memory your program uses may not  
physically be in RAM all the time).

David


More information about the SciPy-user mailing list