[Numpy-discussion] ragged array implimentation

Francesc Alted faltet@pytables....
Mon Mar 7 13:18:00 CST 2011


A Monday 07 March 2011 19:42:00 Christopher Barker escrigué:
> But now that you've entered the conversation, does HDF and/or
> pytables have a standard way of dealing with this?

Well, I don't think there is such a 'standard' way for dealing with 
ragged arrays, but yes, pytables has support for them.  Creating them is 
easy:

# Create a VLArray:
fileh = tables.openFile('vlarray1.h5', mode='w')
vlarray = fileh.createVLArray(fileh.root, 'vlarray1',
                              tables.Int32Atom(shape=()),
                              "ragged array of ints",
                              filters=tables.Filters(1))
# Append some (variable length) rows:
vlarray.append(array([5, 6]))
vlarray.append(array([5, 6, 7]))
vlarray.append([5, 6, 9, 8])

Then, you can access the rows in a variety of ways, like iterators:

print '-->', vlarray.title
for x in vlarray:
    print '%s[%d]--> %s' % (vlarray.name, vlarray.nrow, x)

--> ragged array of ints
vlarray1[0]--> [5 6]
vlarray1[1]--> [5 6 7]
vlarray1[2]--> [5 6 9 8]

or via __getitem__, using general fancy indexing:

a_row = vlarray[2]
a_list = vlarray[::2]
a_list2 = vlarray[[0,2]]   # get list of coords
a_list3 = vlarray[[0,-2]]  # negative values accepted
a_list4 = vlarray[numpy.array([True,...,False])]  # array of bools

but, instead of returning a numpy array of 'object' elements, plain 
python lists are returned instead.  More info on VLArray object in:

http://www.pytables.org/docs/manual/ch04.html#VLArrayClassDescr
 
> is a "vlen array" stored contiguously in netcdf?

I don't really know, but one limitation of variable length arrays in 
HDF5 (and hence NetCDF4) is that they cannot be compressed (but that 
should be addressed in the future).

-- 
Francesc Alted


More information about the NumPy-Discussion mailing list