[Numpy-discussion] String manipulation

Nils Wagner nwagner@iam.uni-stuttgart...
Tue Jul 21 01:42:32 CDT 2009


On Mon, 20 Jul 2009 12:44:23 -0700
  Christopher Barker <Chris.Barker@noaa.gov> wrote:
> Nils Wagner wrote:
>> How can I split the second line in such a way that I get
>> 
>> ['-1.000000E+00', '-1.000000E+00', '-1.000000E+00', 
>> '-1.000000E+00', '1.250000E+00', '1.250000E+00']
>> 
>> instead of
>> 
>> ['-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00', 
>> '1.250000E+00', '1.250000E+00']
> 
> It looks like you have fixed-length fields. 

Yes. See
http://www.sdrl.uc.edu/universal-file-formats-for-modal-analysis-testing-1/file-format-storehouse/unv_0734.htm/

The naive
>way do do this is simple string slicing:
> 
> def line2array1(line, field_len=10):
>     nums = []
>     i = 0
>     while i < len(line):
>         nums.append(float(line[i:i+field_len]))
>         i += field_len
>     return np.array(nums)
> 
> Then I saw the nifty list comprehension posted by 
>Alan(?), which led me to the one (long) liner:
> 
> def line2array2(line, field_len=10):
>     return np.array(map(float, 
>[line[i*field_len:(i+1)*field_len] for i 
> in range(len(line)/field_len)]))
> 
> But it seems I should be able to do this using numpy 
>arrays manipulating the data as characters. However, I 
>had a little trouble getting a string into a numpy array 
>as characters. This didn't work:
> 
> In [55]: s
> Out[55]: 
>'-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00 
>1.250000E+00 1.250000E+00'
> 
> In [57]: np.array(s, 'S13')
> Out[57]:
> array('-1.000000E+00',
>       dtype='|S13')
> 
> so I tried single characters:
> 
> In [56]: np.array(s, 'S1')
> Out[56]:
> array('-',
>       dtype='|S1')
> 
> I still only got the first one.
> 
> closer, but not quite:
> 
> In [61]: np.array(tuple(s), 'S13')
> Out[61]:
> array(['-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', 
>'+', '0', '0',
>        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', 
>'+', '0', '0',
>        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', 
>'+', '0', '0',
>        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', 
>'+', '0', '0',
>        ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', 
>'+', '0', '0',
>        ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', 
>'+', '0', '0'],
>       dtype='|S13')
> 
> So I ended up with this:
> s_array = np.array(tuple(line), 
>dtype='S1').view(dtype='S%i'%field_len)
> 
> which seems uglier than it should be, but did lead so 
>this one-liner:
> 
> np.array(tuple(line),dtype='S1').view(dtype='S%i'%field_len).astype(np.float)
> 
> 
> Is there a cleaner way to do this?
> 
> (test code attached)
> 
> -Chris
> 

Fixed-length fields are quite common e.g. in the area of 
Finite Element pre/postprocessing.
Therefore It would be nice to have a function like 
line2array in numpy.
Comments ?

Nils
  


More information about the NumPy-Discussion mailing list