[Numpy-discussion] data type specification when using numpy.genfromtxt

Chao YUE chaoyuejoy@gmail....
Sun Jun 26 15:28:56 CDT 2011


*Hi Derek,

Thanks very much for your quick reply. I make a short summary of what I've
tried. Actually the *['S10'] + [ float for n in range(48) ] *only* *works
when you explicitly specify the columns to be read, and genfromtxt cannot
automatically determine the type* *if you don't specify the type....

I also have a problem with the missing value which I described at the end of
this mail. Sorry for the very long example....

Thanks again,
*
In [164]:
b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=tuple(range(49)),dtype=['S10']
+ [ float for n in range(48)])

In [165]: b
Out[165]:
array([ ('01/01/2003', -999.0, -1.028, -999.0, -999.0, -999.0, -999.0,
-999.0, -
25.368400000000001, 0.75920799999999999, -25.425699999999999,
0.7763219999999999
6, -25.220500000000001, 0.77561899999999995, 0.20000000000000001, 280.089,
0.574
58299999999995, 0.417018, -0.042441800000000002, 0.0428254,
-0.18517600000000001
, -0.056775800000000001, 93.721299999999999, -8.1318099999999998, -9.5244,
-9.93
23200000000007, -10.2728, -20.945499999999999, -8.4939999999999998,
-9.567819999
9999993, -9.9175500000000003, -9.7835400000000003, -10.4445, -999.0, -999.0,
-99
9.0, -999.0, -999.0, -2.80863, -6.7711100000000002, -999.0, -999.0, -999.0,
0.10
9, 0.075999999999999998, 0.10000000000000001, 0.074999999999999997, 0.0,
-999.0),

       ('01/01/2003', -999.0, -0.40899999999999997, -999.0, -999.0, -999.0,
-999
.0, -999.0, -25.3233, 0.75929800000000003, -25.368600000000001,
0.77451599999999
998, -25.118400000000001, 0.77264200000000005, 0.20499999999999999,
267.80599999
999998, 0.59291700000000003, 0.42051699999999997, -0.037141399999999998,
0.04043
3200000000002, -0.16375999999999999, -0.029456400000000001,
93.749099999999999,
-8.1292799999999996, -9.5213800000000006, -9.9336199999999995,
-10.2749000000000
01, -21.1402, -8.4918899999999997, -9.5663699999999992, -9.9207000000000001,
-9.
7896099999999997, -10.4514, -999.0, -999.0, -999.0, -999.0, -999.0, -2.8468,
-6.
7986899999999997, -999.0, -999.0, -999.0, 0.109, 0.075999999999999998,
0.1000000
0000000001, 0.074999999999999997, 0.0, -999.0),
....

      dtype=[('TIMESTAMP', '|S10'), ('CO2_flux', '<f8'), ('Net_radiation',
'<f8'
), ('Sensible_heat_flux', '<f8'), ('Latent_heat_flux', '<f8'), ('u', '<f8'),
('W
ater_vapor_density_by_LiCor_7500', '<f8'), ('CO2_concentration', '<f8'),
('Air_t
emperature_High', '<f8'), ('HMP45C', '<f8'), ('Relative_humidity_High',
'<f8'),
('HMP45C_1', '<f8'), ('Air_temperature_Middle', '<f8'), ('HMP45C_2', '<f8'),
('R
elative_humidity_Middle', '<f8'), ('HMP45C_3', '<f8'),
('Air_temperature_Low', '
<f8'), ('HMP45C_4', '<f8'), ('Relative_humidity_Low', '<f8'), ('HMP45C_5',
'<f8'
), ('Wind_speed_High', '<f8'), ('Wind_direction_High', '<f8'),
('Wind_speed_Low'
, '<f8'), ('PAR_High', '<f8'), ('PAR_Low', '<f8'),
('Incoming_shortwave_radiatio
n_LI200X', '<f8'), ('Incoming_shortwave_radiation_Eppley', '<f8'),
('Outgoing_sh
ortwave_radiation_Eppley', '<f8'), ('Pressure', '<f8'),
('Soil_temp_1_20_cm', '<
f8'), ('Soil_temp_1_10_cm', '<f8'), ('Soil_temp_1_5_cm', '<f8'),
('Soil_temp_1_2
5_cm', '<f8'), ('Soil_temp_1_0_cm', '<f8'), ('Soil_temp_2_20_cm', '<f8'),
('Soil
_temp_2_10_cm', '<f8'), ('Soil_temp_2_5_cm', '<f8'), ('Soil_temp_2_25_cm',
'<f8'
), ('Soil_temp_2_0_cm', '<f8'), ('Soil_temp_3_20cm', '<f8'),
('Soil_temp_3_10_cm
', '<f8'), ('Soil_temp_3_5_cm', '<f8'), ('Soil_temp_3_25_cm', '<f8'),
('Soil_tem
p_3_0_cm', '<f8'), ('Soil_heat_flux_1', '<f8'), ('Soil_heat_flux_2', '<f8'),
('S
oil_heat_flux_3', '<f8'), ('soil_water_T1', '<f8'), ('soil_water_T2',
'<f8')])


*But if I use the following, it gives error:*

In [171]:
b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,dtype=['S
10'] + [ float for n in range(48)])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in
<module>()

C:\Python26\lib\site-packages\numpy\lib\npyio.pyc in genfromtxt(fname,
dtype, co
mments, delimiter, skiprows, skip_header, skip_footer, converters, missing,
miss
ing_values, filling_values, usecols, names, excludelist, deletechars,
replace_sp
ace, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose,
invalid_rais
e)
   1449             # Raise an exception ?

   1450             if invalid_raise:
-> 1451                 raise ValueError(errmsg)
   1452             # Issue a warning ?

   1453             else:

ValueError
*

If I don't specify the dtype, it will not recognize the type of the first
column (it displays as nan):*

In [172]:
b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2))

In [173]: b
Out[173]:
array([(nan, -999.0, -1.028), (nan, -999.0, -0.40899999999999997),
       (nan, -999.0, 0.16700000000000001), ..., (nan, -999.0, -999.0),
       (nan, -999.0, -999.0), (nan, -999.0, -999.0)],
      dtype=[('TIMESTAMP', '<f8'), ('CO2_flux', '<f8'), ('Net_radiation',
'<f8')
])


*Then the final question is, actually the '-999.0' in the data is missing
value, but I cannot display it as 'nan' by specifying the missing_values as
'-999.0':
but either I set the missing_values as -999.0 or using a dictionary, it
neither work...

*In [178]:
b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2),dtype="|S18,float,float",missing_values=-999.0)

In [179]: b
Out[179]:
array([('01/01/2003 00:00', -999.0, -1.028),
       ('01/01/2003 00:30', -999.0, -0.40899999999999997),
       ('01/01/2003 01:00', -999.0, 0.16700000000000001), ...,
       ('31/12/2003 22:30', -999.0, -999.0),
       ('31/12/2003 23:00', -999.0, -999.0),
       ('31/12/2003 23:30', -999.0, -999.0)],
      dtype=[('TIMESTAMP', '|S18'), ('CO2_flux', '<f8'), ('Net_radiation',
'<f8'
)])

In [180]:
b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(
0,1,2),dtype="|S18,float,float",missing_values={1:'-999.0'})

In [181]:

In [182]: b
Out[182]:
array([('01/01/2003 00:00', -999.0, -1.028),
       ('01/01/2003 00:30', -999.0, -0.40899999999999997),
       ('01/01/2003 01:00', -999.0, 0.16700000000000001), ...,
       ('31/12/2003 22:30', -999.0, -999.0),
       ('31/12/2003 23:00', -999.0, -999.0),
       ('31/12/2003 23:30', -999.0, -999.0)],
      dtype=[('TIMESTAMP', '|S18'), ('CO2_flux', '<f8'), ('Net_radiation',
'<f8'
)])*


the value of is actually -999.0
*In [183]: b['CO2_flux'][1]==-999.0
Out[183]: True


*Even this doesn't work (suppose 2 is our missing_value),*
In [184]: data = "1, 2, 3\n4, 5, 6"

In [185]: np.genfromtxt(StringIO(data),
delimiter=",",dtype="int,int,int",missin
g_values=2)
Out[185]:
array([(1, 2, 3), (4, 5, 6)],
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

In [186]: np.genfromtxt(StringIO(data),
delimiter=",",dtype="int,int,int",names=
"a,b,c",missing_values={'b':2},filling_values=nan)
Out[186]:
array([(1, 2, 3), (4, 5, 6)],
      dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4')])*


Can you give me some suggestion? Thanks in advance~~*
*
Chao
*

2011/6/26 Derek Homeier <derek@astro.physik.uni-goettingen.de>

> On 26.06.2011, at 8:48PM, Chao YUE wrote:
>
> > I want to read a csv file with many (49) columns, the first column is
> string and remaning can be float.
> > how can I avoid type in like
> >
> > data=numpy.genfromtxt('data.csv',delimiter=';',names=True, dtype=(S10,
> float, float, ......))
> >
> > Can I just specify the  type of first cloumn is tring and the remaing
> float? how can I do that?
>
> Simply use 'dtype=None' to let genfromtxt automatically determine the type
> (it is perhaps a bit confusing that this is not the default - maybe it
> should be repeated in the docstring for clarity that the default is for
> dtype is 'float'...).
> Also, a shorter way of typing the dtype above (e.g. in case some columns
> would be auto-detected as int) would be
> ['S10'] + [ float for n in range(48) ]
>
> HTH,
>                                                        Derek
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110626/44d46cd0/attachment-0001.html 


More information about the NumPy-Discussion mailing list