[SciPy-user] A first proposal for dataset organization

David Huard david.huard@gmail....
Thu Sep 20 09:13:16 CDT 2007


2007/9/20, David Cournapeau <david@ar.media.kyoto-u.ac.jp>:
>
> Robert Kern wrote:
> > David Huard wrote:
> >> Hi Anne,
> >>
> >> 2007/9/19, Anne Archibald <peridot.faceted@gmail.com
> >> <mailto:peridot.faceted@gmail.com>>:
> >>
> >>     On 18/09/2007, David Huard <david.huard@gmail.com
> >>     <mailto:david.huard@gmail.com>> wrote:
> >>
> >>     > For large data sets, I'm not sure I understand what you're
> >>     meaning. Do you
> >>     > intend to include netcdf or HDF5 files and provide an interface
> to
> >>     access
> >>     > those data sets so users don't have to bother about the
> underlying
> >>     engine ?
> >>     > Do we really want to distribute a package weighting > 1GB ?
> >>
> >>     One of the points of this project, as I understand it, is to make
> it
> >>     convenient for people to get and use real datasets. In particular,
> one
> >>     possibility is to not include the data in this package, but instead
> >>     only a script to download it from (say) the HEASARC. Thus big
> datasets
> >>     are not outrageous, and more to the point, we need to be able to
> deal
> >>     with them whatever form they are in natively.
> >>
> >>
> >> My understanding was rather :
> >> " ... to make it convenient for people to get and use real datasets for
> >> use in SciPy and NumPy examples, documentation and tutorials. " This
> >> limits the scope of the dataset package, at least for starters. If some
> >> tutorial deals with larger than memory issues, then using a specialized
> >> binary format makes sense. However, I think that pretty basic datasets
> >> can illustrate the use of most SciPy and NumPy functions.
> >
> > That's an important use case, certainly, but I had in mind uses cases
> like the
> > one Anne gave, too, when I suggested parts of the design that David
> implemented.
> > The scope is still fairly broad.
> Yes, indeed, my sentence "to make it convenient for people to get and
> use real datasets for use in SciPy and NumPy examples, documentation and
> tutorials" was just a list of possible usages, not the only usages to
> take into account. I realized also that my proposal sounded like I was
> the only involved, which was not the case. I hope people involved in
> previous discussion on that matter didn't take any offence.


OK. So here is my understanding of what has been said so far about the scope
of the package, please correct me if I'm wrong.

 * Provide data sets for testing, demos and tutorials of scipy and numpy
functions.
 * Propose a standard format to store data in text/binary files.
 * Propose a format to represent the data internally (dictionary, record
arrays, masked arrays, timeseries, etc).
 * Implement an API  to store/retrieve the data to/from text or binary files
based on the standard.
 * Provide utilities to import data sets from web archives and convert them
to the proposed format.

Regards,

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20070920/9a4c039e/attachment.html 


More information about the SciPy-user mailing list