[SciPy-user] Ordering and Counting the Repetitions of the Rows of a Matrix

Emmanuelle Gouillart emmanuelle.gouillart@normalesup....
Thu Jun 25 14:27:59 CDT 2009


Hi Lorenzo,

if you just sort your array along the first axis (A.sort(axis=1)), then you're back
to your former problem, right? For your 4-column array you can sort only
the two central columns.

BTW, I had a look at your previous question and here's a solution that
hadn't been proposed - if I read through the thread correctly.

>>> a = np.array([[1, 2], [2, 3], [1, 2], [3, 4], [2,3]])
>>> dt = a.dtype
>>> newdt = [('',dt)]*2
>>> b = a.view(newdt)
>>> b = b.ravel()
>>> np.uni
np.unicode   np.unicode0  np.unicode_  np.union1d   np.unique
np.unique1d
>>> np.uniq
np.unique    np.unique1d  
>>> c = np.unique1d(b)
>>> c
array([(1, 2), (2, 3), (3, 4)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])
>>> c = c.view(dt)
>>> c
array([1, 2, 2, 3, 3, 4])
>>> c = np.c_[c[::2], c[1::2]]
>>> c
array([[1, 2],
       [2, 3],
       [3, 4]])

(not as short as Stefan's solution, though :D).

For the occurrence array, use the optional index arrays returned by
np.unique1d:

>>> c1, c2, c3 = np.unique1d(b, return_index=True, return_inverse=True)
>>> occurrence = np.histogram(c3, bins = np.arange(c1.shape[0] +1))
(array([2, 2, 1]), array([0, 1, 2, 3]))

Cheers,

Emmanuelle


On Thu, Jun 25, 2009 at 01:01:32PM +0200, Lorenzo Isella wrote:
> Dear All,
> I dug up an old post of mine to this list (the problem was mainly how to 
> get rid of multiple rows in a matrix while counting the multiple 
> occurrences of each row).
> Now, the problem is slightly more complex

> The matrix is of the kind

> A= 1 2
>        2 3
>        9 9
>        4 4
>        1 2
>        3 2

> but this time, you consider the row with entries (2 3) equal to the one 
> with entries (3 2), i.e. this time the ordering of elements in a row 
> does not matter.
> How can I still calculate the repetitions of each row in the sense 
> explained above and obtain the 'repetition-free' matrix?

> Furthermore, suppose that you have the matrix

> B= 2 1 2 4
>       4 2 3 9
>       8 9 9 7
>       5 4 4 1
>       6 1 2 2
>       4 3 2 9

> Now, you have extra elements with respect to matrix A, but you consider 
> two rows equal if the first and forth entry are coincident and the 
> second and third entry are the same numbers or are swapped (like in the 
> case of matrix A). E.g. the second and last row of matrix B would be 
> considered equal in this case. You still want the number of occurrences 
> of each row (with the new concept of equal rows) and the repetition-free 
> matrix.
> Any ideas about how this could be efficiently implemented?
> Many thanks

> Lorenzo
> > Date: Sun, 27 Jul 2008 15:46:29 -0400 From: "Warren Weckesser" 
> > <warren.weckesser@gmail.com> Subject: Re: [SciPy-user] Ordering and 
> > Counting the Repetitions of the Rows of a Matrix To: "SciPy Users 
> > List" <scipy-user@scipy.org> Message-ID: 
> > <114880320807271246x1c922e7cg9539684fbad7bed9@mail.gmail.com> 
> > Content-Type: text/plain; charset="iso-8859-1" Lorenzo, Given a matrix 
> > A like you showed, here is one way to find (and count) the unique 
> > rows: ---------- d = {} for r in A: t = tuple(r) d[t] = d.get(t,0) + 1 
> > # The dict d now has the counts of the unique rows of A. B = 
> > numpy.array(d.keys()) # The unique rows of A C = 
> > numpy.array(d.values()) # The counts of the unique rows ---------- For 
> > a large number of rows (e.g. 10000), this appears to be significantly 
> > faster than the code that David Kaplan suggested in his email earlier 
> > today. Regards, Warren On Sun, Jul 27, 2008 at 12:17 PM, Lorenzo 
> > Isella <lorenzo.isella@gmail.com>wrote:
> >> > Dear All,
> >> > Consider an Nx2 matrix of the kind:

> >> > A=   1 2
> >> >       3 13
> >> >       1  2
> >> >       6  8
> >> >       3 13
> >> >       2  9
> >> >       1  1


> >> > The first entry in each row is always smaller or equal than the second
> >> > entry in the same row.
> >> > Now there are two things I would like to do with this A matrix:
> >> > (1) With a sort of n.unique1d (but have not been very successful yet),
> >> > let each row of A appear only once (i.e. get rid of the repetitions).
> >> > Therefore one should obtain the matrix:
> >> > B=   1 2
> >> >       3 13
> >> >       6  8
> >> >       2  9
> >> >       1  1

> >> > (2) At the same time, efficiently count how many times each row of B
> >> > appeared in A. I would like to get a C vector counting them as:

> >> > C=   2
> >> >       2
> >> >       1
> >> >       1
> >> >       1


> >> > Any suggestions about an efficient way of achieving this?
> >> > Many thanks

> >> > Lorenzo
> >> > ______________________

> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


More information about the SciPy-user mailing list