[SciPy-dev] Common data sets for testing purposes
robert.kern at gmail.com
Mon Jul 10 16:53:17 CDT 2006
David Huard wrote:
> I think it would be useful if there were standard, common data sets
> included in the scipy distribution (as in matlab or R). They could be
> used to ease testing, the creation of demos or simply to give examples.
> Also, if the data sets are chosen wisely, they could serve to attract
> people from targeted discipline to scipy (IQ scores data won't attract
> the same crowd as neutrino counts or distributed sea surface temperatures).
Good idea. The first step would be collecting some datasets and writing one
scipy/matplotlib (dare I say Chaco?) example per dataset. As we write the
examples, the idioms we use to access the data should come to the surface, and
we can possible settle on a common data format and some utilities in scipy to
make the demos accessible through a uniform interface (more or less; at the very
least the file structure should settle out quickly: a README, example01.py,
example02.py, plot01.png, data/*.dat, etc.).
I would prefer to keep the datasets out of the trunk and the distribution
tarballs, though. The current download burden is somewhat heavy as it is, and
some of the worthwhile datasets will probably be substantial in size. A few
might be absorbed into the scipy trunk for use in unit tests or the (very
lonely) tutorial. I suggest making a data/ directory in the repository sibling
to branches/, tags/, and trunk/. I'll try to get around to it if no one beats me.
If you would like to start a Wiki page on www.scipy.org to collect pointers to
useful datasets and example code, that would be great.
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Scipy-dev