[SciPy-dev] Binary i/o package
Tue Jun 5 17:02:21 CDT 2007
Hi Ravi -
I may not have been clear in my description.
The code of interest, which is C++, just reads from a binary file into
a numpy array:
readfields.readfields(file or fileobj, dtype, nrows, rows=, fields=)
You give it a file or file object, a dtype (list of tuples) describing
each row of the file, and the number of rows. It creates internally
the numpy array and reads into it. You can request a subset of rows
with the rows= keyword, or a subset of fields by name with the fields=
keyword. In that case it just grabs the defs for the subset of fields
from tne dtype you entered, creates the correct output length based on
the rows keyword and copies only the requested fields.
That is it, the most basic reader that can select subsets of rows and fields.
I also included, just as a working example, a little python module
called readfields.simple_format that has functions for reading/writing
to a self-describing file format. The write() function writes a numpy
array to a file with a header; it just calls tofile() after writing
the header, so nothing new there. Then there are functions
read_header() which just reads the header and read() which reads
data+header. That was just a working example and doesn't necessarily
need to be included in scipy since this is by no means a standard file
format. It is just the simplest format I could come up with that is
natural for numpy and my readfields() module.
Hope this clears things up,
On 6/5/07, Ravikiran Rajagopal <email@example.com> wrote:
> On Sunday 03 June 2007 4:42:04 pm Erin Sheldon wrote:
> > This package fills the niche and is the backbone of such systems. And
> > it is a small chunk of code. You can extract what you want from the
> > file and store it directly into a numpy array in the most efficient
> > manner possible.
> Apologies for the slow reply, but only now did I find time to go through your
> code. I agree with you that this is a pretty useful piece of code. However,
> the functionality offered by your code, IMHO, should be split into two parts:
> - readbinarray / writebinarray /skipfields
> - readheader / writeheader
> Possible prototypes would be as follows:
> readbinarray( fid, fieldtuple, columns, lines, headerskip=0 )
> writebinarray( fid, fieldtuple, columns, lines )
> skipfields( fid, fieldtuple, lines )
> "fieldtuple" describes the structure of each record. This set of functions
> would make your code the equivalent of read_array and write_array without
> involving "self-documentation" of binary files. This allows arbitrary headers
> and arbitrary parsers of the header data.
> The second set of functions provides default methods for reading/writing
> headers. Combining these orthogonal functions gives the current interface.
> I would be very interested in seeing the first part in scipy.io. If no one
> else is interested in having a binary equivalent of read_array/write_Array in
> scipy, something like this is a perfect candidate for a scikit.
> Scipy-dev mailing list
More information about the Scipy-dev