[Numpy-discussion] Designing a new storage format for numpy recarrays
Fri Oct 30 11:19:42 CDT 2009
I should clarify what I meant......
Suppose I have a recarray with 50 fields and want to read just one of
those fields. PyTables/HDF will read in the compressed data for chunks
of complete rows, decompress the full 50 fields, and then give me back
the data for just one field.
I'm after a solution where asking for a single field reads in the bytes
for just that field from disk and decompresses it.
This is similar to the difference between databases storing their data
as rows or columns. See for example Mike Stonebraker's C-store
column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf).
Francesc Alted wrote:
> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigué:
>> - Pytables (HDF using chunked storage for recarrays with LZO
>> compression and shuffle filter)
>> - can't extract individual field from a recarray
> Er... Have you tried the ``cols`` accessor?
More information about the NumPy-Discussion