[SciPy-user] Record array help

Bruce Southey bsouthey@gmail....
Mon May 19 10:20:36 CDT 2008


Stéfan van der Walt wrote:
> Hi Johann
>
> 2008/5/19 Johann Rohwer <jr@sun.ac.za>:
>   
>> Is there any extended documentation/tutorial on record arrays?
>>     
>
> There is an introduction here:
>
> http://www.scipy.org/RecordArrays
>
>   
>> 1. Is it possible to change the dtype of a field after the record array has
>> been created?
>>     
>
> It can be done, but often it is not very useful:
>
> In [3]: dt = np.dtype([('x',np.uint8),('y',np.uint8)])
>
> In [4]: np.array([(1,2),(3,4)],dtype=dt)
> Out[4]:
> array([(1, 2), (3, 4)],
>       dtype=[('x', '|u1'), ('y', '|u1')])
>
> In [5]: _.view(np.uint16)
> Out[5]: array([ 513, 1027], dtype=uint16)
>
> I suspect what you want to do is to change one 'column' from, say, int
> to float, and reinterpret the data.  For that, you'll need to make a
> copy.
>
>   
>> 2. The CSV file has missing data points - how do I turn these into python
>> 'None' elements in the record array? (If I leave that element empty in the
>> CSV file, then csv2rec complains about not being able to handle the import;
>> if I put 'None' in the CSV file (without quotes), then the whole field
>> including the 'None' and all the other float data is converted into a string
>> dtype, rendering the numerical data useless).
>>     
>
> Maybe `numpy.loadtxt` could be of some use.
>
>   
>> 3. Is it possible to obtain a subset of the original data (corresponding to
>> two or more columns of the CSV file) as a conventional 2D numpy array, or
>> can I access the data only individually by column (i.e. field in the record
>> array)?
>>     
>
> I hope someone comes up with an elegant solution, otherwise you can make a copy:
>
> numpy.array([data['field1'], data['field2']]).T
>
> Regards
> Stéfan
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
>   
Hi,
You might also want to check out Andrew Straw's DataFrame class:
 http://www.scipy.org/Cookbook/DataFrame

However, with missing values you probably should investigate using 
Masked Arrays. You should be able to modify the DataFrame code to handle 
this.


Regards
Bruce


More information about the SciPy-user mailing list