[Numpy-discussion] fromstring, tostring slow?

Mark Janikas mjanikas@esri....
Tue Feb 13 18:31:00 CST 2007

This is all very good info.  Especially, the byteswap.  Ill be testing
it momentarily.  As far as a detailed explanation of the problem....

In essence, I am applying sparse matrix multiplication.  The matrix of
which I am dealing with in the matter described is nxn.  Generally, this
matrix is 1-20% sparse.  I use it in spatial data analysis, where the
matrix W represents the spatial association between n observations.  The
operations I perform on it are generally related to the spatial lag of a
variable... or Wy, where y is a nxk matrix (usually k=1).  As k is
generally small, the y vector and the result vector are represented by
numpy arrays.  I can have nxkx2 pieces of info in mem (usually).  What I
cant have is n**2.  So, I store each row of W in a file as a record
consisting of 3 parts:

1) row, nn (# of neighbors)
2) nhs (nx1) vector of integers representing the columns in row[i] != 0
3) weights (nx1) vector of floats corresponding to the index in the
previous row

The first two parts of the record are known as a GAL or geographic
algorithm library.  Since a lot of my W matrices have distance metrics
associated with them I added the third.  I think this might be termed by
someone else as an enhanced GAL.  At any rate, this allows me to perform
this operation on large datasets w/o running out of mem.

-----Original Message-----
From: numpy-discussion-bounces@scipy.org
[mailto:numpy-discussion-bounces@scipy.org] On Behalf Of Christopher
Sent: Tuesday, February 13, 2007 4:07 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] fromstring, tostring slow?

Mark Janikas wrote:
> I don't think I can do that because I have heterogeneous rows of
> data.... I.e. the columns in each row are different in length.

like I said, show us your whole problem...

But you don't have to write.read all the data at once with from/tofile()

anyway. Each of your "rows" has to be in a separate array anyway, as 
numpy arrays don't support "ragged" arrays, but each row can be written 
with tofile()

> Furthermore, when reading it back in, I want to read only bytes of the
> info at a time so I can save memory.  In this case, I only want to
> one record in mem at once.

you can make multiple calls to fromfile(), thou you'll have to know how 
long each record is.

> Another issue has arisen from taking this routine cross-platform....
> namely, if I write the file on Windows I cant read it on Solaris.  I
> assume the big-little endian is at hand here.


> I know using the struct
> module that I can pack using either one.

so can numpy. see the "byteswap" method, and you can specify a 
particular endianess with a datatype when you read with fromfile():

a = N.fromfile(DataFile, dtype=N.dtype("<d"), count=20)

reads 20 little-endian doubles from DataFile, regardless of the native 
endianess of the machine you're on.


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Numpy-discussion mailing list

More information about the Numpy-discussion mailing list