[Numpy-discussion] Determine slices in a sorted array

Laszlo Nagy gandalf@shopzeus....
Thu Jul 1 15:13:50 CDT 2010


Given an array with two axes, sorted by a column 'SLICE_BY', how can I 
extract slice indexes for rows with the same 'SLICE_BY' value?

Here is an example program, demonstrating the problem:

from numpy import *

a = random.randint(0,100,(20,4))
SLICE_BY = 0 # Make slices of array 'a' by column SLICE_BY


a.sort(SLICE_BY)
slices = []
prev_val = None
sidx = -1
for rowidx,row in enumerate(a):
     val = row[SLICE_BY]
     if val!=prev_val:
         if prev_val is None:
             prev_val = val
             sidx = rowidx
         else:
             slices.append((prev_val,sidx,rowidx))
         sidx = rowidx
         prev_val = val

if sidx<a.shape[0]-1:
     slices.append((val,sidx,a.shape[0]))

print a
print slices


This program would print:

[[ 1  0  8  1]
  [ 4  5 17  9]
  [ 4 11 19 23]
  [11 12 24 23]
  [13 16 28 23]
  [14 26 29 36]
  [15 33 32 37]
  [20 38 38 40]
  [28 47 47 45]
  [33 50 50 57]
  [45 55 52 65]
  [47 67 60 65]
  [56 76 71 68]
  [61 76 71 78]
  [70 83 82 83]
  [89 83 84 85]
  [91 84 85 87]
  [95 96 86 88]
  [98 96 89 88]
  [99 98 92 88]]
[(1, 0, 1), (4, 1, 3), (11, 3, 4), (13, 4, 5), (14, 5, 6), (15, 6, 7), 
(20, 7, 8), (28, 8, 9), (33, 9, 10), (45, 10, 11), (47, 11, 12), (56, 
12, 13), (61, 13, 14), (70, 14, 15), (89, 15, 16), (91, 16, 17), (95, 
17, 18), (98, 18, 19)]


Altough my demonstration program is functionally correct, it is not 
efficient. I need to do this with 10 million rows. Number of slices is 
relatively small (10 to 10000).

Is is possible to construct my "slices" with pure numpy functions? E.g. 
anything that does not involve big number of python bytecode 
instructions, constucting Python objects, referencing/dereferencing 10 
million times etc.

Thanks,

   Laszlo



More information about the NumPy-Discussion mailing list