[SciPy-user] scipy.io.read_array: NaN in data file
Wed Mar 11 08:13:26 CDT 2009
In this particular case we know the cause:
It is either :
a) Overlapping files have been appended. ie file1 contains data from Jan1 to Feb1 and file2 contains data from jan1 to March1. The overlap region has identical data.
b) The data comes from sequential deployments and there is an small overlap at the beginning of the second file. ie file1 has data from Jan1 to Feb1 and file2 contains data from Feb1 to March1. There may be a few data points overlap. These are junk because the equipment was set up in the lab and took measurements in the air until it was swapped with the installed instrument in the water.
In both these cases it is appropriate to take the first value. In the second case we really should be stripping the bad data before appending but this is a work in progress. Right now we are developing a semi-automated QA/QC procedure to clean up data before posting it on the web. We presently use a mix of awk and shell scripts but I'm trying to convert everything to python to make it easier to use, more maintainable, have nicer plots than gnuplot and to develop a gui application to help us do this.
>>> Timmie <email@example.com> 3/11/2009 4:35 AM >>>
> Well, because there's no standard way to do that: when you have
> duplicated dates, should you take the first one? The last one ? Take
> some kind of average of the values ?
Sometimes, there are inherent faults in the data set. Therefore, a automatic
treatment may introduce further errors.
It's only possible when this errors are occuring somewhat systematically.
SciPy-user mailing list
More information about the SciPy-user