[SciPy-User] problems with masked arrays

Pierre GM pgmdevlist@gmail....
Tue Oct 6 19:26:46 CDT 2009


[Rob, it's customary to give more info than "it doesn't work": please  
post an error message w/ the version of numpy you're running]


On Oct 6, 2009, at 7:17 PM, Rob Felty wrote:

> I am trying to create an array which contains a mixture of strings,  
> floats, and ints.

Do you create it by hand, or do you read the data from a file-like  
object ?
If the latter, could you try genfromtxt ? This function should be able  
to take care of potential missing values for you.
If the former, yes, you gonna run into problem, and numpy.ma wont be  
able to help you. See, your missing entries are '', which are  
interpreted as string, when you'd want some other type (eg, int for  
your 'id' field), and ndarray chokes on that. As numpy.ma.array calls  
ndarray under the hood, there's nothing it can do.

Now, you should still be able to use genfromtxt. Using your test:
 >>> # transform the initial list of tuples into a list of strings
 >>>  data=[";".join(str(_) for _ in t) for t in test]
 >>> # Call np.mafromtx
 >>>np.mafromtxt(StringIO.StringIO("\n".join 
(data)),delimiter=";",dtype=None)
masked_array(data = [(--, '3-D', 7333, --, --, --, --, --, 'Tridi',  
--, --, 'GOOGLE', --)
  (4, 'a', 1267005, 3, 1, "'1", '[VV]', '[eI]', '@', 7.0, 7.0, 'HML',  
'@')],
              mask = [ (True, False, False, True, True, True, True,  
True, False, True, True, False, True)
  (False, False, False, False, False, False, False, False, False,  
False, False, False, False)],
        fill_value = (999999, 'N/A', 999999, 999999, 999999, 'N/', 'N/ 
A', 'N/A', 'N/A', 1e+20, 1e+20, 'N/A', 'N'),
             dtype = [('f0', '<i8'), ('f1', '|S3'), ('f2', '<i8'),  
('f3', '<i8'), ('f4', '<i8'), ('f5', '|S2'), ('f6', '|S4'), ('f7', '| 
S4'), ('f8', '|S5'), ('f9', '<f8'), ('f10', '<f8'), ('f11', '|S6'),  
('f12', '|S1')])

mafromtxt is a shortcut for genfromtxt that forces the output to a  
masked array. It's a bit slower, but it's useful. Don't forget to use  
dtype=None in argument to force genfromtxt to check the types of your  
input.

Another possibilty is to write a little function that would preprocess  
you array with missing values, making sure that the types  are  
properly converted.

Let me know if it helps



More information about the SciPy-User mailing list