[Numpy-discussion] Selecting "column" from array of tuples

Ian Stokes-Rees ijstokes@crystal.harvard....
Thu Jul 9 12:18:44 CDT 2009

[take 6 on sending this -- I'm subscribed to numpy-discuss, but this 
post refuses to show up]

I have an large array consisting of 5-tuples. I'd like to select the 
first and second columns in order to produce a scatter plot.  Each tuple 
consists of mixed types (floats and strings).  The Matlab equivalent 
would be:


1. I cannot figure out how to specify "column" selection from an array 
of tuples; or

2. I cannot figure out how to use an array of lists instead of an array 
of tuples.

Some code should illustrate this.  What works:

  dtype   = [("score", "f4"), ("rfac", "f4"), ("codefull", "a10"), 
("code2", "a2"), ("subset","a4")]
  results = zeros((len(lines),), dtype=dtype)
  idx = 0   for line in lines:
      parts           = line.split()
      codefull        = parts[0]
      code2           = codefull[1:3]
      rfac            = float(parts[12])
      score           = float(parts[13])
      subset          = parts[14]
      results[idx]    = (score, rfac, codefull, code2, subset)
      idx += 1

What does not work:

  results = zeros((len(lines),len(dtype)), dtype=dtype)
      results[idx]    = [score, rfac, codefull, code2, subset]

or indexing into the array:

  results[:][0] # this works, but doesn't return the desired first column

Any suggestions greatly appreciated.


Ian Stokes-Rees                            W: http://sbgrid.org
ijstokes@crystal.harvard.edu               T: +1 617 432-5608 x75
SBGrid, Harvard Medical School             F: +1 617 432-5600

More information about the NumPy-Discussion mailing list