[SciPy-dev] Dataset for examples and license

David Cournapeau david@ar.media.kyoto-u.ac...
Wed Apr 25 06:08:32 CDT 2007

Robert Kern wrote:
> David Cournapeau wrote:
>> I don't see any problem with that approach, and I am sure you know much 
>> better than me how to organize things for easy distribution. I think 
>> everybody agreeing on one file format is important (I have a preference 
>> for hdf5, since it is well supported under python through pytables, and 
>> has a full C api).
> I don't agree. My design goal was to be able to expose a single interface
> (load()) in front of any file format or data source. I imagined that many of the
> data sources would be from other packages that are out of our direct control and
> which we did not want to copy-and-paste into our own repository.
>> For really small dataset, CSV could be OK.
>> Would scipydata be in scipy ? (I am asking again for license reasons :) ).
> No, it would be a separate namespace package. Each scipydata subpackage could
> specify its own license.
Ok, I set up something really trivial, so that we can start discussing 
about details before releasing something. This is available here (bzr 


This basically define two subpackages of scipydata, iris and 
oldfaithful. Each dataset being small, they are defined in python files. 
In ipython, you can do:

 >> from scipydata import iris
 >> ?iris
Type:           module
Base Class:     <type 'module'>
String Form:    <module 'scipydata.iris' from 'scipydata/iris/__init__.pyc'>
Namespace:      Interactive
    This famous (Fisher's or Anderson's) iris data set gives the
    measurements in centimeters of the variables sepal length and width 
and petal
    length and width, respectively, for 50 flowers from each of 3 
species of iris.
    The species are Iris setosa, versicolor, and virginica.
 >>data = iris.load()

Something which would be nice is that when you type iris, you get the 
docstring, eg something like __repr__ method, but for a module. I don't 
know if this is possible (this may have undesirable effects, too). Also, 
I don't know if this is worthwile to have something like a DataInfo 
class which has all the meta data, so that you can have everything at 
once (ala help(faithful) in R, for people who know R).



More information about the Scipy-dev mailing list