[Numpy-discussion] numpy.load raising IOError but EOFError expected

"V. Armando Solé" sole@esrf...
Thu Jul 1 07:26:35 CDT 2010


Ruben Salvador wrote:
> Great! Thanks for all your answers!
>
> I actually have the files created as .npy (appending a new array eact 
> time). I know it's weird, and it's not its intended use. But, for 
> whatsoever reasons, I came to use that. No turn back now. 
>
> Fortunately, I am able to read the files correctly, so being weird 
> also, at least, it works. Repeating the tests would be very time 
> consuming. I'll just try the different options mentioned for the 
> following tests. 
>
> Anyway, I think this is a quite common situation. Tests running for a 
> loooooong time, producing results at very different times (not 
> necessarily huge amounts of data of results, it could be just a single 
> float, or array), and repeating these tests a lot of times, makes it 
> absolutely necessary to have numpyish functions/filetype to APPEND 
> these freshly-new produced data each time it is available. Having to 
> load a .npz file, adding the new data and saving again is wasting 
> unnecesary resources. Having a single file for each run of the test, 
> though possible, for me, complicates the post-processing section, 
> while increasing the time to copy these files (many small files tend 
> to take longer to copy than one single bigger file). Why not just a 
> modified .npy filetype/function with a header indicating it's hosting 
> more than one array¿?
>

Well, at our lab we are collecting images and saving them into HDF5 
files. Since the files are self-describing it is quite convenient. You 
can decide if you want the images as individual arrays or stacked into a 
bigger one because you know it when you open the file. You can keep 
adding items at any time because HDF5 does not force you to specify the 
final size of the array and you can access it like any numpy array 
without needing to load the whole array into memory nor being limited in 
memory in 32-bit machines. I am currently working on a 100Gbytes array 
on a 32bit machine without problems.

Really, I would give a try to HDF5. In our case we are using h5py, but 
latest release candidate of PyTables seems to have the same "numpy like" 
functionality.

Armando



More information about the NumPy-Discussion mailing list