[SciPy-dev] Machine learning datasets (was Presentation of pymachine, a python package for machine learning)

David Cournapeau david@ar.media.kyoto-u.ac...
Sun Jun 3 21:46:36 CDT 2007

Peter Skomoroch wrote:
> The licensing of datasets is an interesting issue, it sounds like they 
> will need to be tackled one by one unless explicitly released to the 
> public domain.
> Check out the wikipedia entry on "Open Data":
> http://en.wikipedia.org/wiki/Open_Data
> "Creators of data often do not consider the need to state the 
> conditions of ownership, licensing and re-use. For example, many 
> scientists do not regard the published data arising from their work to 
> be theirs to control and the act of publication in a journal is an 
> implicit release of the data into the commons. However the lack of a 
> license makes it difficult to determine the status of a data set 
> <http://en.wikipedia.org/wiki/Data_set> and may restrict the use of 
> data offered in an Open spirit. Because of this uncertainty it is also 
> possible for public or private organisations to aggregate such data, 
> protect it with copyright and then resell it."
> I remember a while back Leslie Kaelbling bought the enron dataset 
> http://www.cs.cmu.edu/~enron/ <http://www.cs.cmu.edu/%7Eenron/> for 
> use in machine learning.  
> Maybe we can start a scipy wikipage with a list/table of datasets 
> along with license status...and check off the ones which we find are 
> not compatible so we can find replacements or get permission.  Also, 
> we might want to add a column for which modules use the data in scipy 
> tests etc.,
> Should I go ahead and create the page? 
I started something here: http://www.scipy.org/DataSets. I tried to put 
all websites talked about in this thread there, with license information 
if available, plus the comment of R. Kern on licensing (at least in the US).



More information about the Scipy-dev mailing list