[Numpy-discussion] Designing a new storage format for numpy recarrays

Anne Archibald peridot.faceted@gmail....
Fri Oct 30 11:35:10 CDT 2009

2009/10/30 Stephen Simmons <mail@stevesimmons.com>:
> I should clarify what I meant......
> Suppose I have a recarray with 50 fields and want to read just one of
> those fields. PyTables/HDF will read in the compressed data for chunks
> of complete rows, decompress the full 50 fields, and then give me back
> the data for just one field.
> I'm after a solution where asking for a single field reads in the bytes
> for just that field from disk and decompresses it.
> This is similar to the difference between databases storing their data
> as rows or columns. See for example Mike Stonebraker's C-store
> column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf).

Is there any reason not to simply store the data as a collection of
separate arrays, one per column? It shouldn't be too hard to write a
wrapper to give this nicer syntax, while implementing it under the
hood with HDF5...


> Stephen
> Francesc Alted wrote:
>> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigué:
>>>  - Pytables (HDF using chunked storage for recarrays with LZO
>>> compression and shuffle filter)
>>>     - can't extract individual field from a recarray
>> Er... Have you tried the ``cols`` accessor?
>> http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr
>> Cheers,
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list