# [Numpy-discussion] String manipulation

Nils Wagner nwagner@iam.uni-stuttgart...
Tue Jul 21 01:42:32 CDT 2009

```On Mon, 20 Jul 2009 12:44:23 -0700
Christopher Barker <Chris.Barker@noaa.gov> wrote:
> Nils Wagner wrote:
>> How can I split the second line in such a way that I get
>>
>> ['-1.000000E+00', '-1.000000E+00', '-1.000000E+00',
>> '-1.000000E+00', '1.250000E+00', '1.250000E+00']
>>
>>
>> ['-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00',
>> '1.250000E+00', '1.250000E+00']
>
> It looks like you have fixed-length fields.

Yes. See
http://www.sdrl.uc.edu/universal-file-formats-for-modal-analysis-testing-1/file-format-storehouse/unv_0734.htm/

The naive
>way do do this is simple string slicing:
>
> def line2array1(line, field_len=10):
>     nums = []
>     i = 0
>     while i < len(line):
>         nums.append(float(line[i:i+field_len]))
>         i += field_len
>     return np.array(nums)
>
> Then I saw the nifty list comprehension posted by
>Alan(?), which led me to the one (long) liner:
>
> def line2array2(line, field_len=10):
>     return np.array(map(float,
>[line[i*field_len:(i+1)*field_len] for i
> in range(len(line)/field_len)]))
>
> But it seems I should be able to do this using numpy
>arrays manipulating the data as characters. However, I
>had a little trouble getting a string into a numpy array
>as characters. This didn't work:
>
> In [55]: s
> Out[55]:
>'-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00
>1.250000E+00 1.250000E+00'
>
> In [57]: np.array(s, 'S13')
> Out[57]:
> array('-1.000000E+00',
>       dtype='|S13')
>
> so I tried single characters:
>
> In [56]: np.array(s, 'S1')
> Out[56]:
> array('-',
>       dtype='|S1')
>
> I still only got the first one.
>
> closer, but not quite:
>
> In [61]: np.array(tuple(s), 'S13')
> Out[61]:
> array(['-', '1', '.', '0', '0', '0', '0', '0', '0', 'E',
>'+', '0', '0',
>        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E',
>'+', '0', '0',
>        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E',
>'+', '0', '0',
>        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E',
>'+', '0', '0',
>        ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E',
>'+', '0', '0',
>        ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E',
>'+', '0', '0'],
>       dtype='|S13')
>
> So I ended up with this:
> s_array = np.array(tuple(line),
>dtype='S1').view(dtype='S%i'%field_len)
>
> which seems uglier than it should be, but did lead so
>this one-liner:
>
> np.array(tuple(line),dtype='S1').view(dtype='S%i'%field_len).astype(np.float)
>
>
> Is there a cleaner way to do this?
>
> (test code attached)
>
> -Chris
>

Fixed-length fields are quite common e.g. in the area of
Finite Element pre/postprocessing.
Therefore It would be nice to have a function like
line2array in numpy.