[Scipy-tickets] [SciPy] #314: read_array is slow for large files

SciPy scipy-tickets at scipy.net
Tue Nov 28 04:16:55 CST 2006


#314: read_array is slow for large files
-------------------------+--------------------------------------------------
 Reporter:  pv           |        Owner:  somebody
     Type:  enhancement  |       Status:  new     
 Priority:  normal       |    Milestone:          
Component:  scipy.io     |      Version:          
 Severity:  normal       |   Resolution:          
 Keywords:               |  
-------------------------+--------------------------------------------------
Comment (by wnbell):

 In general, it would be nice to be able to save ndarrays and sparse
 matrices to files and load them back without any loss of information (i.e.
 dimensions, rank, or type of sparse matrix).  Both ASCII and binary
 formats should be supported with the latter being portable w.r.t.
 endianness.  Standard compression (gz, bzip2) would be a nice bonus.


 I wasn't able to find an existing project ( MatrixMarket, HDF, Matlab
 format, etc.) which did all this (and worked w/ Scipy), so I hacked up
 something which accomplishes most of these goals here:
 [http://graphics.cs.uiuc.edu/~wnbell/pyarrayio/]

 Currently it can
  * load/save ndarrays while retaining rank and dimensions
  * load/save sparse matrices (csr,csc, and coo)
  * load/save a human-readable 'basic' format (2d matrices with space and
 newline delimiters)

 The binary support does not consider endianness, so that would need to be
 added.


 The code above is probably far from ideal, so I present it only to
 motivate further discussion and hopefully prompt a more robust
 implementation.  Feel free to adapt it (or completely discard it :) as you
 wish.  Comments and criticism are welcome.

 Examples of the (ascii) output are here:
 [http://graphics.cs.uiuc.edu/~wnbell/pyarrayio/examples/]


 The need for yet-another-format could be avoided if the Matlab file
 support was brought up to speed (last I looked it was V4 or V5, so no
 sparse matrices).  I don't know if that's really the path of least
 resistance though.

-- 
Ticket URL: <http://projects.scipy.org/scipy/scipy/ticket/314#comment:1>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list