[Numpy-discussion] how to pipe into numpy arrays?
Dag Sverre Seljebotn
Thu Oct 25 01:19:30 CDT 2012
On 10/25/2012 08:17 AM, Dag Sverre Seljebotn wrote:
> On 10/24/2012 09:00 PM, Michael Aye wrote:
>> As numpy.fromfile seems to require full file object functionalities
>> like seek, I can not use it with the sys.stdin pipe.
>> So how could I stream a binary pipe directly into numpy?
>> I can imagine storing the data in a string and use StringIO but the
>> files are 3.6 GB large, just the binary, and that will most likely be
>> much more as a string object.
> A Python 2 string is just a bytes object and would take 3.6 GB as well
> (or did you mean in text encoding?)
>> Reading binary files on disk is NOT the problem, I would like to avoid
>> the temporary file if possible.
> Read in chunks? Something like
> 1) Create array arr
> arr_bytes = arr.view(np.uint8).reshape(np.prod(arr.shape))
> # check that modifying arr_bytes modifies arr,
> # if not, work with reshape arguments
> while not done:
> arr_bytes[i:i + chunk_size] = f.read(chunk_size)
> Alternatively, one could write some C or Cython code to read directly
> into the NumPy array buffer, which avoids an extra copy over the memory
> bus of the data. (Since unfortunately it doesn't look like "fromfile"
> has an out argument.)
Actually, as long as you make sure chunk_size is on the order of 1 MB or
so, the Python overhead may not matter and the chunks fit in cache so an
extra copy is avoided, so a C solution may be overkill.
More information about the NumPy-Discussion