[SciPy-dev] Is "Creative Commons: Attribution" an acceptable license for datasets included in scipy ?

Bruce Southey bsouthey@gmail....
Tue Jan 12 09:52:39 CST 2010


On 01/12/2010 09:12 AM, josef.pktd@gmail.com wrote:
> On Tue, Jan 12, 2010 at 3:07 AM, Gael Varoquaux
> <gael.varoquaux@normalesup.org>  wrote:
>    
>> On Tue, Jan 12, 2010 at 05:01:14PM +0900, David Cournapeau wrote:
>>      
>>> Hi,
>>>        
>>      
>>>       Everything is in the title - I have some new IO code for
>>> scipy.sparse I would like to include in scipy, and the tests include
>>> some dataset under this license. Should I remove them before inclusion ?
>>>        
>> I believe you must: the attribution clause is not free by OSI definition.
>> In addition, I am pretty sure that none of the CC licenses are DFSG-free
>> up to version 3.0 (don't ask me why).
>>      
> cc-by looks pretty innocent for bundling with a package, especially if
> it's only used for tests and for examples and not part of the main
> program (like icons or sound).
>
> Bundling doesn't look infectious and the user is free to make use of
> them or not. Attribution for bundling doesn't look more restrictive
> than including the copyright statement for BSD lisencend code.
>
> In statsmodels, we have several datasets,  some public domain, some
> with authorization by the author, but sometimes it is not very clear
> whether a dataset is copyrightable or not.
>
> Although, I haven't seen any cc-by datasets in econometrics that I
> remember,  and cc-by-nc looks clearly inconsistent.
>
> Are there some guidelines somewhere what would be consistent with this
> kind of bundling of datasets (tests and examples)?
>
> US government data is nice because it's all public domain.
>
> But there are a lot of efforts to make data more widely available, e.g.
> http://www.ckan.net
> http://opendefinition.org/licenses
>
> Sorry, if this expands too much on the original question, but this is
> bugging me for a while.
>
> Josef
>
>    
A little off topic, but search google for 'is data copyrightable'.
For example:
http://answers.google.com/answers/threadview/id/778789.html
http://scienceblogs.com/commonknowledge/2009/01/data_copyrights_and_slogans_oh.php
http://sciencecommons.org/resources/faq/databases#dbcopyright

The important case that is referred to is Feist vs Rural:
http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service

The answer really depends on what country, what the data is ('facts' are 
not copyrightable), how (and when) it was collected and who  collected it.

I agree with Robert with regards to data with tests. As for examples, it 
depends on the point you want to make as I would suggest simulated data 
or well-known datasets that are most likely in public domain.

Bruce




More information about the SciPy-Dev mailing list