[Numpy-discussion] Advice on converting iterator into array efficiently

Francesc Alted faltet@pytables....
Fri Aug 29 04:29:46 CDT 2008

A Friday 29 August 2008, Alan Jackson escrigué:
> Looking for advice on a good way to handle this problem.
> I'm dealing with large tables (Gigabyte large). I would like to
> efficiently subset values from one column based on the values in
> another column, and get arrays out of the operation. For example,
> say I have 2 columns, "energy" and "collection". Collection is
> basically an index that flags values that go together, so all the
> energy values with a collection value of 18 belong together. I'd
> like to be able to set up an iterator on collection that would
> hand me an array of energy on each iteration :
> if table is all my data, then something like
> for c in table['collection'] :
>     e = c['energy']
>     ... do array operations on e
> I've been playing with pytables, and they help, but I can't quite
> seem to get there. I can get an iterator for energy within a
> collection, but I can't figure out an efficient way to get an array
> out.
> What I have so far is
> for h in np.unique(table.col('collection')) :
>     rows = table.where('collection == c')
>     for row in rows :
>         print c,' : ', row['energy']
> but I really want to convert rows['energy'] to an array.

You may use a list to keep the values and then convert it to an array.  
Also, yoy can use a dictionary for keeping the unique collections.  The 
next should do the trick:

energies = {}
for row in table:
    c = row['collection']
    e = row['energy']
    if c in energies:
	energies[c] = [e]

# Convert the lists in numpy arrays
for key in energies:
    energies[key] = numpy.array(energies[key])

This solution is pretty optimal in that it avoids to have to load all 
the table in memory and besides only requires one table iteration to 
get the job done.

Hope this helps,

Francesc Alted

More information about the Numpy-discussion mailing list