[Numpy-discussion] how can we concatenate structured arrays ?

josef.pktd@gmai... josef.pktd@gmai...
Wed Sep 16 03:45:30 CDT 2009


On Wed, Sep 16, 2009 at 4:25 AM,  <josef.pktd@gmail.com> wrote:
> I have two structured arrays of different types. How can I
> horizontally concatenate the two arrays? Is there a direct way, or do
> I need to start from scratch?
>
> nobs = 10
> testdata = np.random.randint(3,
> size=(nobs,4)).view([('a',int),('b',int),('c',int),('d',int)])
> testdatacont = np.random.normal( size=(nobs,2)).view([('e',float), ('f',float)])
>
>>>> np.hstack((testdata,testdatacont))
> Traceback (most recent call last):
>  File "C:\Programs\Python25\Lib\site-packages\numpy\lib\shape_base.py",
> line 505, in hstack
>    return _nx.concatenate(map(atleast_1d,tup),1)
> TypeError: expected a readable buffer object
>
>>>> np.column_stack((testdata,testdatacont))
> Traceback (most recent call last):
>  File "C:\Programs\Python25\Lib\site-packages\numpy\lib\shape_base.py",
> line 552, in column_stack
>    return _nx.concatenate(arrays,1)
> TypeError: expected a readable buffer object
>
>
> the following works, but looks like a big detour for a simple column_stack:
>
>>>> import numpy.lib.recfunctions
>>>> dt2 = numpy.lib.recfunctions.zip_descr((testdata,testdatacont),flatten=True)
>>>> joinedarr = np.array([tuple(i+j) for i,j in zip(testdata.base.tolist(), testdatacont.base.tolist())], dtype = dt2)
>>>> joinedarr.dtype
> dtype([('a', '<i4'), ('b', '<i4'), ('c', '<i4'), ('d', '<i4'), ('e',
> '<f8'), ('f', '<f8')])
>
>
> if I want to convert the dtypes to float (which I don't want in this
> case), then its easier
>
>>>> np.column_stack((testdata.base,testdatacont.base)).dtype
> dtype('float64')
>
>
> Josef
>

looping over column also works, this looks more efficient

>>> tt = np.empty((10,1), dt2)
>>> tt.shape
(10, 1)
>>> tt['a'].shape
(10, 1)
>>> testdata['a'].shape  # has ndim=2
(10, 1)
>>> for n in testdata.dtype.names: tt[n] = testdata[n]
...
>>> for n in testdatacont.dtype.names: tt[n] = testdatacont[n]
...
>>> tt
array([[(2, 0, 1, 1, 0.61282791440084505, 0.29305903681720574)],
       [(1, 1, 1, 2, -1.5331947180856178, -0.62794592132997662)],
       [(1, 0, 1, 1, 0.34850521437127446, -0.71435625605096553)],
       [(2, 1, 2, 1, -0.035021646994300569, 0.14235131301077331)],
       [(2, 0, 2, 0, -0.072940874291085214, 1.257392635986091)],
       [(1, 0, 1, 0, 0.19764464613444582, 3.1907154468379528)],
       [(1, 2, 2, 1, 1.0584100502205742, -1.8249604812902063)],
       [(1, 1, 0, 0, -0.1580364093187942, 0.0314819593087034)],
       [(1, 2, 2, 0, -2.0938485304115289, 1.0133998231900494)],
       [(0, 2, 0, 0, 0.042563869142945909, 1.2643518145105357)]],
      dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4'), ('d', '<i4'),
('e', '<f8'), ('f', '<f8')])

Is this the best, we can do?

Josef


More information about the NumPy-Discussion mailing list