# [Numpy-discussion] String manipulation

Christopher Barker Chris.Barker@noaa....
Mon Jul 20 14:44:23 CDT 2009

```Nils Wagner wrote:
> How can I split the second line in such a way that I get
>
> ['-1.000000E+00', '-1.000000E+00', '-1.000000E+00',
> '-1.000000E+00', '1.250000E+00', '1.250000E+00']
>
> instead of
>
> ['-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00',
> '1.250000E+00', '1.250000E+00']

It looks like you have fixed-length fields. The naive way do do this is
simple string slicing:

def line2array1(line, field_len=10):
nums = []
i = 0
while i < len(line):
nums.append(float(line[i:i+field_len]))
i += field_len
return np.array(nums)

Then I saw the nifty list comprehension posted by Alan(?), which led me
to the one (long) liner:

def line2array2(line, field_len=10):
return np.array(map(float, [line[i*field_len:(i+1)*field_len] for i
in range(len(line)/field_len)]))

But it seems I should be able to do this using numpy arrays manipulating
the data as characters. However, I had a little trouble getting a string
into a numpy array as characters. This didn't work:

In [55]: s
Out[55]: '-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00
1.250000E+00 1.250000E+00'

In [57]: np.array(s, 'S13')
Out[57]:
array('-1.000000E+00',
dtype='|S13')

so I tried single characters:

In [56]: np.array(s, 'S1')
Out[56]:
array('-',
dtype='|S1')

I still only got the first one.

closer, but not quite:

In [61]: np.array(tuple(s), 'S13')
Out[61]:
array(['-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
'-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
'-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
'-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0',
' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0'],
dtype='|S13')

So I ended up with this:
s_array = np.array(tuple(line), dtype='S1').view(dtype='S%i'%field_len)

which seems uglier than it should be, but did lead so this one-liner:

np.array(tuple(line),dtype='S1').view(dtype='S%i'%field_len).astype(np.float)

Is there a cleaner way to do this?

(test code attached)

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: application/x-python
Size: 879 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090720/6cb67dbd/attachment.bin
```

More information about the NumPy-Discussion mailing list