[Numpy-discussion] ragged array implimentation

Christopher Barker Chris.Barker@noaa....
Thu Mar 10 11:05:11 CST 2011


On 3/7/11 5:51 PM, Sturla Molden wrote:
> Den 07.03.2011 18:28, skrev Christopher Barker:
>> 1, 2, 3, 4
>> 5, 6
>> 7, 8, 9, 10, 11, 12
>> 13, 14, 15
>> ...
>>

> A ragged array, as implemented in C++, Java or C# is just an array of
> arrays (or 'a pointer to an array of pointers').

Sure, but as a rule I don't find direct translation of C++ or Java code 
to Pyton the best approach ;-)


> Basically, that is an
> ndarray of ndarrays (or a list of ndarrays, whatever you prefer).
>
>   >>>  ra = np.zeros(4, dtype=np.ndarray)
>   >>>  ra[0] = np.array([1,2,3,4])
>   >>>  ra[1] = np.array([5,6])
>   >>>  ra[2] = np.array([7,8,9,10,11,12])
>   >>>  ra[3] = np.array([13,14,15])
>   >>>  ra
> array([[1 2 3 4], [5 6], [ 7  8  9 10 11 12], [13 14 15]], dtype=object)
>   >>>  ra[1][1]
> 6
>   >>>  ra[2][:]
> array([ 7,  8,  9, 10, 11, 12])

yup -- or I could use a list to store the rows, which would add the 
ability to append rows.

> Slicing in two dimensions does not work as some might expect:
>
>   >>>  ra[:2][:2]
> array([[1 2 3 4], [5 6]], dtype=object)

yup -- might want to overload indexing to do something smarter about 
that, though in myuse-case, slicing "vertically" isn't really useful 
anyway -- the nth element in one row doesn't neccessarily have anything 
to do with the nth element in another row.

However, asside from the slicing syntax issue, what I lose with the 
approach is the ability to get reasonable performance on operations on 
the entire array:

ra *= 3.3

I"d like that to be numpy-efficient.

What I need to grapple with is:

1) Is there a point to trying to build a general purpose ragged array? 
Or should I jsut build something that satisfies my use-case at hand?

2) What's the balance I need between performance and flexibility? 
putting the rows in a list give a lot more flexibility, putting it all 
in one 1-d numpy array could give better performance.

NOTE: this looks like it could use a "growable" numpy array, much like 
one I've written before -- maybe it's time to revive that project and 
use it here, fixing some performance issues while I'm at it.

Thanks for all your ideas,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list