[SciPy-user] Fast saving/loading of huge matrices

Gael Varoquaux gael.varoquaux@normalesup....
Fri Apr 20 01:24:20 CDT 2007


I agree that pytable lack a really simple interface. Say something that
dumps a dic to an hdf5 file, and vice-versa (althought hdf5 -> dic is a
bit harder as all the hdf5 types may not convert nicely to python types).

On my experiment I use this code to load the data:

"""
def load_h5(file_name):
    """ Loads an hdf5 file and returns a dict with the hdf5 data in it.
    """
    file = tables.openFile(file_name)
    out_dict = {}
    for key, value in file.leaves.iteritems():
        if isinstance(value, tables.UnImplemented):
            continue
        try:
            value = value.read()
            try:
                if isinstance(value, CharArray):
                    value = value.tolist()
            except Exception, inst:
                print "Couldn't convert %s to a list" % key
                print inst
            if len(value) == 1:
                value = value[0]
            out_dict[key[1:]] = value
        except Exception, inst:
            print "couldn't load %s" % key
            print inst
    file.close()
    return(out_dict)
"""

It works well on our files, but our files are produced by code I wrote,
so they do not explore all the possibilities of hdf5.

Similarily I have some python code to dump a dic of arrays to an hdf5
file:

"""
def dic_to_h5(filename, dic):
    """ Saves all the arrays in a dictionary to an hdf5 file.
    """
    out_file = tables.openFile(filename, mode = "w")
    for key, value in dic.iteritems():
        if isinstance( value, ndarray):
           out_file.createArray('/', str(key), value)
    out_file.close()
"""

This code is not general enough to go in pytables, but if the list wants
to improve it a bit, then we could propose it for inclusion, or at least
put it on the cookbook.

Cheers,

Gaël

On Thu, Apr 19, 2007 at 06:01:44PM -0500, Ryan Krauss wrote:
> I have a very similar question.  Pytables clearly has much more
> capability than I need and the documentation is a bit intimidating.  I
> have tests that involve multiple channels of data that I need to
> store.  Can you give a simple example of using pytables to store 3
> seperate Nx1 vectors in the same file and easily retreive the
> individual channels.  The cPickle equivalent would be something like:

> v1=rand(1000,)
> v2=rand(1000,)
> mydict={'v1':v1,'v2':v2}

> and then dump mydict to a pickle file.  How would I do this samething
> in pytables?

> Thanks,

> Ryan

> On 4/19/07, Vincent Nijs <v-nijs@kellogg.northwestern.edu> wrote:
> > Pytables looks very interesting and clearly has a ton of features. However,
> > if I am trying to just read-in a csv file can it figure out the correct data
> > types on its own (e.g., dates, floats, strings)? Read "I am too lazy to
> > types in variables names and types myself if the names are already in the
> > file" :)

> > Similarly can you just dump a dictionary or rec-array into a pytable with
> > one 'save' command and have pytables figure out the variable names and
> > types? This seems relevant since you wouldn't have to do that with cPickle
> > which saves user-time if not computer time.

> > Sorry if this is too off-topic.

> > Vincent



More information about the SciPy-user mailing list