[SciPy-dev] Is "Creative Commons: Attribution" an acceptable license for datasets included in scipy ?
Tue Jan 12 09:52:39 CST 2010
On 01/12/2010 09:12 AM, firstname.lastname@example.org wrote:
> On Tue, Jan 12, 2010 at 3:07 AM, Gael Varoquaux
> <email@example.com> wrote:
>> On Tue, Jan 12, 2010 at 05:01:14PM +0900, David Cournapeau wrote:
>>> Everything is in the title - I have some new IO code for
>>> scipy.sparse I would like to include in scipy, and the tests include
>>> some dataset under this license. Should I remove them before inclusion ?
>> I believe you must: the attribution clause is not free by OSI definition.
>> In addition, I am pretty sure that none of the CC licenses are DFSG-free
>> up to version 3.0 (don't ask me why).
> cc-by looks pretty innocent for bundling with a package, especially if
> it's only used for tests and for examples and not part of the main
> program (like icons or sound).
> Bundling doesn't look infectious and the user is free to make use of
> them or not. Attribution for bundling doesn't look more restrictive
> than including the copyright statement for BSD lisencend code.
> In statsmodels, we have several datasets, some public domain, some
> with authorization by the author, but sometimes it is not very clear
> whether a dataset is copyrightable or not.
> Although, I haven't seen any cc-by datasets in econometrics that I
> remember, and cc-by-nc looks clearly inconsistent.
> Are there some guidelines somewhere what would be consistent with this
> kind of bundling of datasets (tests and examples)?
> US government data is nice because it's all public domain.
> But there are a lot of efforts to make data more widely available, e.g.
> Sorry, if this expands too much on the original question, but this is
> bugging me for a while.
A little off topic, but search google for 'is data copyrightable'.
The important case that is referred to is Feist vs Rural:
The answer really depends on what country, what the data is ('facts' are
not copyrightable), how (and when) it was collected and who collected it.
I agree with Robert with regards to data with tests. As for examples, it
depends on the point you want to make as I would suggest simulated data
or well-known datasets that are most likely in public domain.
More information about the SciPy-Dev