[Numpy-discussion] Designing a new storage format for numpy recarrays
Fri Oct 30 11:35:10 CDT 2009
2009/10/30 Stephen Simmons <email@example.com>:
> I should clarify what I meant......
> Suppose I have a recarray with 50 fields and want to read just one of
> those fields. PyTables/HDF will read in the compressed data for chunks
> of complete rows, decompress the full 50 fields, and then give me back
> the data for just one field.
> I'm after a solution where asking for a single field reads in the bytes
> for just that field from disk and decompresses it.
> This is similar to the difference between databases storing their data
> as rows or columns. See for example Mike Stonebraker's C-store
> column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf).
Is there any reason not to simply store the data as a collection of
separate arrays, one per column? It shouldn't be too hard to write a
wrapper to give this nicer syntax, while implementing it under the
hood with HDF5...
> Francesc Alted wrote:
>> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigué:
>>> - Pytables (HDF using chunked storage for recarrays with LZO
>>> compression and shuffle filter)
>>> - can't extract individual field from a recarray
>> Er... Have you tried the ``cols`` accessor?
> NumPy-Discussion mailing list
More information about the NumPy-Discussion