[Numpy-discussion] How to create a boolean sub-array from a larger string array?
Andriy Basilisk
basilisk96@gmail....
Sat Jun 23 00:08:58 CDT 2007
Hello all,
My challenge is this:
I'm working on an application that parses numerical data from a text
report using regular expressions, and then places the results in Numpy
matrices for processing. The data contains integers, floats, and
boolean values. The boolean values are represented in the text file
by either an empty string '', or by a star '*'. The regex parser
creates a sequence of nested lists that is readily converted to a MxN
string-type matrix. Then, the necessary rows of that matrix are
sliced to create the necessary new sub-matrices.
Here is a simplified sample of my solution so far:
import numpy as _N
data = [['1', '5.30', '', '3.44', '*'], ['2', '-4.12', '*', '-1.24',
''], ['3', '0.45', '', '3.22', '*']]
mdat = _N.mat(data).T # mdat.shape is now (5,3)
ids = mdat[0,].astype(_N.int) #this works for str->int
noms = mdat[(1,3),].astype(_N.float64) #same idea also works for
str->float64
## The following technique would be nice, but
## it causes a ValueError: invalid literal for int() with base 10: ''
outs = mdat[(2,4),].astype(_N.bool)
## Instead, I have to convert the strings to '0' or '1'
## explicitly, then cast them to a bool matrix:
for i, b in enumerate(mdat[(2,4),].T):
mdat[2, i] = 1 if mdat[2, i] else 0
mdat[4, i] = 1 if mdat[4, i] else 0
outs = mdat[(2,4),].astype(_N.bool)
I was expecting the above to behave similar to the Python bool()
function on strings:
>>> bool(''), bool('*')
(False, True)
but it doesn't work that way.
Can anyone enlighten me as to why slices of my string matrix cannot be
cast to boolean matrices? I'd rather not have to resort to the 'for'
loop if there is a smarter way to do this. If an intermediate
numpy.array is required instead of numpy.matrix as I have shown here,
it's acceptable. I am using the matrix class in this case because the
application thrives on it.
I'm using Python 2.5 and NumPy 1.0.1 on WinXP.
Any help and useful comments will be appreciated,
-Basilisk96
More information about the Numpy-discussion
mailing list