[Numpy-discussion] fromstring, tostring slow?
Tue Feb 13 18:31:00 CST 2007
This is all very good info. Especially, the byteswap. Ill be testing
it momentarily. As far as a detailed explanation of the problem....
In essence, I am applying sparse matrix multiplication. The matrix of
which I am dealing with in the matter described is nxn. Generally, this
matrix is 1-20% sparse. I use it in spatial data analysis, where the
matrix W represents the spatial association between n observations. The
operations I perform on it are generally related to the spatial lag of a
variable... or Wy, where y is a nxk matrix (usually k=1). As k is
generally small, the y vector and the result vector are represented by
numpy arrays. I can have nxkx2 pieces of info in mem (usually). What I
cant have is n**2. So, I store each row of W in a file as a record
consisting of 3 parts:
1) row, nn (# of neighbors)
2) nhs (nx1) vector of integers representing the columns in row[i] != 0
3) weights (nx1) vector of floats corresponding to the index in the
The first two parts of the record are known as a GAL or geographic
algorithm library. Since a lot of my W matrices have distance metrics
associated with them I added the third. I think this might be termed by
someone else as an enhanced GAL. At any rate, this allows me to perform
this operation on large datasets w/o running out of mem.
[mailto:email@example.com] On Behalf Of Christopher
Sent: Tuesday, February 13, 2007 4:07 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] fromstring, tostring slow?
Mark Janikas wrote:
> I don't think I can do that because I have heterogeneous rows of
> data.... I.e. the columns in each row are different in length.
like I said, show us your whole problem...
But you don't have to write.read all the data at once with from/tofile()
anyway. Each of your "rows" has to be in a separate array anyway, as
numpy arrays don't support "ragged" arrays, but each row can be written
> Furthermore, when reading it back in, I want to read only bytes of the
> info at a time so I can save memory. In this case, I only want to
> one record in mem at once.
you can make multiple calls to fromfile(), thou you'll have to know how
long each record is.
> Another issue has arisen from taking this routine cross-platform....
> namely, if I write the file on Windows I cant read it on Solaris. I
> assume the big-little endian is at hand here.
> I know using the struct
> module that I can pack using either one.
so can numpy. see the "byteswap" method, and you can specify a
particular endianess with a datatype when you read with fromfile():
a = N.fromfile(DataFile, dtype=N.dtype("<d"), count=20)
reads 20 little-endian doubles from DataFile, regardless of the native
endianess of the machine you're on.
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Numpy-discussion mailing list
More information about the Numpy-discussion