[Scipy-tickets] [SciPy] #314: read_array is slow for large files
SciPy
scipy-tickets at scipy.net
Tue Nov 28 04:16:55 CST 2006
#314: read_array is slow for large files
-------------------------+--------------------------------------------------
Reporter: pv | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: scipy.io | Version:
Severity: normal | Resolution:
Keywords: |
-------------------------+--------------------------------------------------
Comment (by wnbell):
In general, it would be nice to be able to save ndarrays and sparse
matrices to files and load them back without any loss of information (i.e.
dimensions, rank, or type of sparse matrix). Both ASCII and binary
formats should be supported with the latter being portable w.r.t.
endianness. Standard compression (gz, bzip2) would be a nice bonus.
I wasn't able to find an existing project ( MatrixMarket, HDF, Matlab
format, etc.) which did all this (and worked w/ Scipy), so I hacked up
something which accomplishes most of these goals here:
[http://graphics.cs.uiuc.edu/~wnbell/pyarrayio/]
Currently it can
* load/save ndarrays while retaining rank and dimensions
* load/save sparse matrices (csr,csc, and coo)
* load/save a human-readable 'basic' format (2d matrices with space and
newline delimiters)
The binary support does not consider endianness, so that would need to be
added.
The code above is probably far from ideal, so I present it only to
motivate further discussion and hopefully prompt a more robust
implementation. Feel free to adapt it (or completely discard it :) as you
wish. Comments and criticism are welcome.
Examples of the (ascii) output are here:
[http://graphics.cs.uiuc.edu/~wnbell/pyarrayio/examples/]
The need for yet-another-format could be avoided if the Matlab file
support was brought up to speed (last I looked it was V4 or V5, so no
sparse matrices). I don't know if that's really the path of least
resistance though.
--
Ticket URL: <http://projects.scipy.org/scipy/scipy/ticket/314#comment:1>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list