[SciPy-dev] Is "Creative Commons: Attribution" an acceptable license for datasets included in scipy ?

Bruce Southey bsouthey@gmail....
Tue Jan 12 09:52:39 CST 2010

On 01/12/2010 09:12 AM, josef.pktd@gmail.com wrote:
> On Tue, Jan 12, 2010 at 3:07 AM, Gael Varoquaux
> <gael.varoquaux@normalesup.org>  wrote:
>> On Tue, Jan 12, 2010 at 05:01:14PM +0900, David Cournapeau wrote:
>>> Hi,
>>>       Everything is in the title - I have some new IO code for
>>> scipy.sparse I would like to include in scipy, and the tests include
>>> some dataset under this license. Should I remove them before inclusion ?
>> I believe you must: the attribution clause is not free by OSI definition.
>> In addition, I am pretty sure that none of the CC licenses are DFSG-free
>> up to version 3.0 (don't ask me why).
> cc-by looks pretty innocent for bundling with a package, especially if
> it's only used for tests and for examples and not part of the main
> program (like icons or sound).
> Bundling doesn't look infectious and the user is free to make use of
> them or not. Attribution for bundling doesn't look more restrictive
> than including the copyright statement for BSD lisencend code.
> In statsmodels, we have several datasets,  some public domain, some
> with authorization by the author, but sometimes it is not very clear
> whether a dataset is copyrightable or not.
> Although, I haven't seen any cc-by datasets in econometrics that I
> remember,  and cc-by-nc looks clearly inconsistent.
> Are there some guidelines somewhere what would be consistent with this
> kind of bundling of datasets (tests and examples)?
> US government data is nice because it's all public domain.
> But there are a lot of efforts to make data more widely available, e.g.
> http://www.ckan.net
> http://opendefinition.org/licenses
> Sorry, if this expands too much on the original question, but this is
> bugging me for a while.
> Josef
A little off topic, but search google for 'is data copyrightable'.
For example:

The important case that is referred to is Feist vs Rural:

The answer really depends on what country, what the data is ('facts' are 
not copyrightable), how (and when) it was collected and who  collected it.

I agree with Robert with regards to data with tests. As for examples, it 
depends on the point you want to make as I would suggest simulated data 
or well-known datasets that are most likely in public domain.


More information about the SciPy-Dev mailing list