[Numpy-discussion] Tabular data package

Dan Yamins dyamins@gmail....
Tue Oct 6 13:09:30 CDT 2009


>
> I didn't see any explicit nan handling. Are missing values allowed
> e.g. in the constructor?
>

No, this is a valid point.  We don't handle this as explicitly as we
should.   Are you mostly talking about nan handling in loading from
delimited text files?  (Or are you talking about something more general,
like integration of masked arrays?)   In loading from delimited text files,
you can use the "linefixer" and "valuefixer" arguments, which are for more
general purposes, and which will get the job done, but slowly.  We should do
something more specialized for missing values that would be faster.



> Are these function supposed to work with arbitrary structured arrays?
>

Well, they're only really tested for working with strings, floats, and ints
(tho only the int tests are included in the test module, we should expand
that).   I imagine it's possible they'd work with more sophisticated things
but I'm not sure.


>
> >>> arr =
> np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2)
> >>> arr
> array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0),
>       (2.0000000000000002e+025, 3.0), (0.0, 7.0)],
>      dtype=[('f0', '<f8'), ('f1', '<f8')])
> >>> np.sort([str(l) for l in arr])
> array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)',
>       '(5e-015, 1.0)', '(6.0, 1.0)'],
>      dtype='|S30')
>
> Well on this example (as in tests that we did), fast.recarrayisin performed
as spec'd.   ...  But definitely write back again if you think it's failing
somewhere.

In general, extending a number of the thigns in Tabular (e.g. the loadSV and
saveSV) to arbitrary structured dtypes as opposed to more basic types would
be great.

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091006/98d16cdf/attachment.html 


More information about the NumPy-Discussion mailing list