[Numpy-discussion] Designing a new storage format for numpy recarrays

Anne Archibald peridot.faceted@gmail....
Fri Oct 30 11:35:10 CDT 2009


2009/10/30 Stephen Simmons <mail@stevesimmons.com>:
> I should clarify what I meant......
>
> Suppose I have a recarray with 50 fields and want to read just one of
> those fields. PyTables/HDF will read in the compressed data for chunks
> of complete rows, decompress the full 50 fields, and then give me back
> the data for just one field.
>
> I'm after a solution where asking for a single field reads in the bytes
> for just that field from disk and decompresses it.
>
> This is similar to the difference between databases storing their data
> as rows or columns. See for example Mike Stonebraker's C-store
> column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf).

Is there any reason not to simply store the data as a collection of
separate arrays, one per column? It shouldn't be too hard to write a
wrapper to give this nicer syntax, while implementing it under the
hood with HDF5...

Anne

> Stephen
>
>
>
> Francesc Alted wrote:
>> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigué:
>>
>>>  - Pytables (HDF using chunked storage for recarrays with LZO
>>> compression and shuffle filter)
>>>     - can't extract individual field from a recarray
>>>
>>
>> Er... Have you tried the ``cols`` accessor?
>>
>> http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr
>>
>> Cheers,
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list