[Numpy-discussion] Designing a new storage format for numpy recarrays

Stephen Simmons mail@stevesimmons....
Fri Oct 30 11:19:42 CDT 2009


I should clarify what I meant......

Suppose I have a recarray with 50 fields and want to read just one of 
those fields. PyTables/HDF will read in the compressed data for chunks 
of complete rows, decompress the full 50 fields, and then give me back 
the data for just one field.

I'm after a solution where asking for a single field reads in the bytes 
for just that field from disk and decompresses it.

This is similar to the difference between databases storing their data 
as rows or columns. See for example Mike Stonebraker's C-store 
column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf).

Stephen



Francesc Alted wrote:
> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigué:
>   
>>  - Pytables (HDF using chunked storage for recarrays with LZO
>> compression and shuffle filter)
>>     - can't extract individual field from a recarray
>>     
>
> Er... Have you tried the ``cols`` accessor?
>
> http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr
>
> Cheers,
>
>   



More information about the NumPy-Discussion mailing list