[SciPy-dev] Dataset for examples and license

Steven H. Rogers steve@shrogers....
Tue Apr 24 22:01:55 CDT 2007

Robert Kern wrote:
> David Cournapeau wrote:
>> Well, I guess once scipy is modularized and can be installed package by 
>> package, having a package dataset ala R would be nice. For now, I have a 
>> small python script which convert those dataset to hdf5, so they can be 
>> read easily from python, and if including them to scipy is OK 
>> license-wise, I can easily add the data as a package for distribution 
>> (the compressed, pickled, related data takes ~ 100 kb).
> I'm fiddling around with a convention for data packages. Let's suppose we have a
> namespace package scipydata. Each data package would be a subpackage under
> scipydata. It would provide some conventionally-named metadata to describe the
> dataset (`__doc__` to describe the dataset in prose, `source`, `copyright`,
> etc.) and a load() callable that would load the dataset and return a dictionary
> with its data. The load() callable could do whatever it needs to load the data.
> It might just return objects that are defined in code (e.g. numpy.array([...]))
> if they are small enough. Or it might read a CSV, NetCDF4, or HDF5 file that is
> included in the package. Or it might download something from a website or FTP site.
> The scipydata.util package would provide some utilities to help writing
> scipydata packages. Particularly, it would be provide utilities to read some
> kind of configuration file or environment variable which establishes a cache
> directory such that large datasets can be downloaded from a website once and
> loaded from disk thereafter.
> The scipydata packages could then be distributed extremely easily as eggs, and
> getting your dataset would be as simple as
>   $ easy_install scipydata.cournapeaus_data
> Does that sound good to you?

Yes, it does.

# Steve

More information about the Scipy-dev mailing list