[SciPy-user] Ordering and Counting the Repetitions of the Rows of a Matrix
Emmanuelle Gouillart
emmanuelle.gouillart@normalesup....
Thu Jun 25 14:27:59 CDT 2009
Hi Lorenzo,
if you just sort your array along the first axis (A.sort(axis=1)), then you're back
to your former problem, right? For your 4-column array you can sort only
the two central columns.
BTW, I had a look at your previous question and here's a solution that
hadn't been proposed - if I read through the thread correctly.
>>> a = np.array([[1, 2], [2, 3], [1, 2], [3, 4], [2,3]])
>>> dt = a.dtype
>>> newdt = [('',dt)]*2
>>> b = a.view(newdt)
>>> b = b.ravel()
>>> np.uni
np.unicode np.unicode0 np.unicode_ np.union1d np.unique
np.unique1d
>>> np.uniq
np.unique np.unique1d
>>> c = np.unique1d(b)
>>> c
array([(1, 2), (2, 3), (3, 4)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
>>> c = c.view(dt)
>>> c
array([1, 2, 2, 3, 3, 4])
>>> c = np.c_[c[::2], c[1::2]]
>>> c
array([[1, 2],
[2, 3],
[3, 4]])
(not as short as Stefan's solution, though :D).
For the occurrence array, use the optional index arrays returned by
np.unique1d:
>>> c1, c2, c3 = np.unique1d(b, return_index=True, return_inverse=True)
>>> occurrence = np.histogram(c3, bins = np.arange(c1.shape[0] +1))
(array([2, 2, 1]), array([0, 1, 2, 3]))
Cheers,
Emmanuelle
On Thu, Jun 25, 2009 at 01:01:32PM +0200, Lorenzo Isella wrote:
> Dear All,
> I dug up an old post of mine to this list (the problem was mainly how to
> get rid of multiple rows in a matrix while counting the multiple
> occurrences of each row).
> Now, the problem is slightly more complex
> The matrix is of the kind
> A= 1 2
> 2 3
> 9 9
> 4 4
> 1 2
> 3 2
> but this time, you consider the row with entries (2 3) equal to the one
> with entries (3 2), i.e. this time the ordering of elements in a row
> does not matter.
> How can I still calculate the repetitions of each row in the sense
> explained above and obtain the 'repetition-free' matrix?
> Furthermore, suppose that you have the matrix
> B= 2 1 2 4
> 4 2 3 9
> 8 9 9 7
> 5 4 4 1
> 6 1 2 2
> 4 3 2 9
> Now, you have extra elements with respect to matrix A, but you consider
> two rows equal if the first and forth entry are coincident and the
> second and third entry are the same numbers or are swapped (like in the
> case of matrix A). E.g. the second and last row of matrix B would be
> considered equal in this case. You still want the number of occurrences
> of each row (with the new concept of equal rows) and the repetition-free
> matrix.
> Any ideas about how this could be efficiently implemented?
> Many thanks
> Lorenzo
> > Date: Sun, 27 Jul 2008 15:46:29 -0400 From: "Warren Weckesser"
> > <warren.weckesser@gmail.com> Subject: Re: [SciPy-user] Ordering and
> > Counting the Repetitions of the Rows of a Matrix To: "SciPy Users
> > List" <scipy-user@scipy.org> Message-ID:
> > <114880320807271246x1c922e7cg9539684fbad7bed9@mail.gmail.com>
> > Content-Type: text/plain; charset="iso-8859-1" Lorenzo, Given a matrix
> > A like you showed, here is one way to find (and count) the unique
> > rows: ---------- d = {} for r in A: t = tuple(r) d[t] = d.get(t,0) + 1
> > # The dict d now has the counts of the unique rows of A. B =
> > numpy.array(d.keys()) # The unique rows of A C =
> > numpy.array(d.values()) # The counts of the unique rows ---------- For
> > a large number of rows (e.g. 10000), this appears to be significantly
> > faster than the code that David Kaplan suggested in his email earlier
> > today. Regards, Warren On Sun, Jul 27, 2008 at 12:17 PM, Lorenzo
> > Isella <lorenzo.isella@gmail.com>wrote:
> >> > Dear All,
> >> > Consider an Nx2 matrix of the kind:
> >> > A= 1 2
> >> > 3 13
> >> > 1 2
> >> > 6 8
> >> > 3 13
> >> > 2 9
> >> > 1 1
> >> > The first entry in each row is always smaller or equal than the second
> >> > entry in the same row.
> >> > Now there are two things I would like to do with this A matrix:
> >> > (1) With a sort of n.unique1d (but have not been very successful yet),
> >> > let each row of A appear only once (i.e. get rid of the repetitions).
> >> > Therefore one should obtain the matrix:
> >> > B= 1 2
> >> > 3 13
> >> > 6 8
> >> > 2 9
> >> > 1 1
> >> > (2) At the same time, efficiently count how many times each row of B
> >> > appeared in A. I would like to get a C vector counting them as:
> >> > C= 2
> >> > 2
> >> > 1
> >> > 1
> >> > 1
> >> > Any suggestions about an efficient way of achieving this?
> >> > Many thanks
> >> > Lorenzo
> >> > ______________________
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-user
mailing list