[Numpy-discussion] odd ascii format and genfromtxt
Warren Weckesser
warren.weckesser@enthought....
Fri Feb 26 02:29:26 CST 2010
Ralf Gommers wrote:
> Hi all,
>
> I'm trying to read in data from text files with genfromtxt, and have
> some trouble figuring out the right combination of keywords. The
> format is:
>
> ['0\t\t4.000000000000000e+007,0.000000000000000e+000\n',
> '\t9.860280631554179e-001,-1.902586503306264e-002\n',
> '\t9.860280631554179e-001,-1.902586503306264e-002']
>
> Note that there are two delimiters, tab and comma. Also, the first
> line has an extra integer plus tab (this is a repeating pattern). The
> files are large, there's a lot of them, and they're generated by a
> binary I can't modify.
>
> Here are some things I've tried:
>
> In [216]: np.genfromtxt('ascii2test.raw', invalid_raise=False)
> Out[216]: array([ 0., NaN])
>
> In [217]: np.genfromtxt('ascii2test.raw', invalid_raise=False,
> delimiter=['\t', ','])
> TypeError: cannot perform accumulate with flexible type
>
> In [228]: np.genfromtxt('ascii2test.raw', delimiter=['\t', ','],
> dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('fltvar2', '<f8')])
> TypeError: cannot perform accumulate with flexible type
>
>
> Any suggestions?
The 'delimiter' keyword does not accept a list of strings. If it is a
list, it must be a list of integers that are the field widths. In your
case, that won't work.
You could try fromregex:
-----
In [1]: import numpy as np
In [2]: cat sample.raw
0 4.000e+007,0.00000e+000
9.8602806e-001,-1.9025e-002
9.8602806e-001,-1.9025e-002
123 5.0e6,100.0
10.1,-2.0e-3
10.2,-2.1e-3
In [3]: a = np.fromregex('sample.raw', '(.*?)\t+(.*),(.*)',
np.dtype([('extra', 'S8'), ('x', float), ('y', float)]))
In [4]: a
Out[4]:
array([('0', 40000000.0, 0.0), ('', 0.98602805999999998, -0.019025),
('', 0.98602805999999998, -0.019025), ('123', 5000000.0, 100.0),
('', 10.1, -0.002), ('', 10.199999999999999,
-0.0020999999999999999)],
dtype=[('extra', '|S8'), ('x', '<f8'), ('y', '<f8')])
In [5]: a[0]
Out[5]: ('0', 40000000.0, 0.0)
In [6]: a[1]
Out[6]: ('', 0.98602805999999998, -0.019025)
In [7]: a['extra']
Out[7]:
array(['0', '', '', '123', '', ''],
dtype='|S8')
In [8]: a['y']
Out[8]:
array([ 0.00000000e+00, -1.90250000e-02, -1.90250000e-02,
1.00000000e+02, -2.00000000e-03, -2.10000000e-03])
-----
Note that the first field of the array is a string, not an integer. The
string will be empty in rows that did not have the initial integer. I
don't know if that will work for you.
Warren
>
> Thanks,
> Ralf
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list