Open Data Licenses

From P2P Foundation
Jump to navigation Jump to search

= A good definition of openness and the use of some form of licensing is crucial to a healthy future for the open data community

Status Report 2010

Jordan Hatcher:

"The Public Domain Dedication and License (PDDL), the Public Doman Dedication Certificate (PDDC) and Creative Commons Zero ... are all aimed at placing work into the public domain. The public domain has a very specific meaning in a legal context: It means that there are no copyright or other IP rights over the work. This is the most open/free approach as the aim is to eliminate any restrictions from an IP perspective.

There are some rights that can be hard to eliminate, and so of course patents may still be an issue depending on the context, (but perhaps that’s conversation for another time).

In addition to these tools, we’ve created two additional specific tools for openly licensing databases — the ODbL and the ODC-Attribution licences.

All three are tools to help increase the public domain and make it more known and accessible.

There’s some really exciting stuff going on with the public domain right now, including with PD calculators — tools to automatically determine whether a work is in the public domain. The great thing about work in the public domain is that it is completely legally interoperable, as it eliminates copyright restrictions." (


From Rufus Pollock of the Open Knowledge Foundation:

"Despite the sometimes heated discussion, there is, in fact, broad agreement: openness means freedom to use and reuse data in any way you wish. The only debate is over what, if any, conditions can be imposed when allowing use and reuse.

In particular, following the example of the software and content domains, the following two items have been proposed as permissible exceptions to the basic rule of ‘allow everything’:

  1. Requirement of attribution (in a non-burdensome manner)
  2. Requirement to share-alike (a reuser or share-alike material must, when making publicly available their own material, make it openly available under a similar share-alike license)"



"Everyone agrees that requiring attribution is OK. Furthermore, it also now generally accepted that having this requirement in a license is not be a problem.

(In the original Protocol for Implementing Open Access Data attribution was alleged to be problematic due to a potential for ‘attribution stacking’. However, these concerns appear to have been allayed. To my mind, it was never clear why data needed to be different: code and content both have plenty of examples of projects with many contributors, much reuse and an attribution requirement)."


"Share-alike provisions are more controversial. It has been argued that share-alike conditions are problematic because of the potential for incompatibility between two share-alike licenses (or community norms). At the same time share-alike may provide an important incentive for individuals and communities to make their data openly available since it provides some assurance that this data will remain open.

Thus, any evaluation comes down to the balance between:

  1. The costs, if any, of allowing share-alike in terms of e.g. complexity and compatibility.
  2. The benefits, if any, that share-alike provides by encouraging the creation of open data in the first place and in ensuring subsequent ’sharing back’ by those who build upon that data.

In my view the benefits are substantial while the costs are not. Incompatibility can largely be avoided by only ‘approving’ share-alike licenses that are compatible. At the same time, share-alike enshrines a principle that is important to many communities in the code and content spheres and same seems true of data (consider e.g. Open Street Map).

(Aside: it is important to emphasize that permitting share-alike does not mean it is must be used. In fact, a particular community could recommend against using share-alike as, for example, the Python community does for code hoping to make it into its standard library.)"


Why they are needed

"Why bother about openness and licensing for data? After all they don’t matter in themselves: what we really care about are things like the progress of human knowledge or the freedom to understand and share.

However, open data is crucial to progress on these more fundamental items. It’s crucial because open data is so much easier to break-up and recombine, to use and reuse. We therefore want people to have incentives to make their data open and for open data to be easily usable and reusable — i.e. for open data to form a ‘commons’.

A good definition of openness acts as a standard that ensures different open datasets are ‘interoperable’ and therefore do form a commons. Licensing is important because it reduces uncertainty. Without a license you don’t know where you, as a user, stand: when are you allowed to use this data? Are you allowed to give to others? To distribute your own changes, etc?

Together, a definition of openness, plus a set of conformant licenses deliver clarity and simplicity. Not only is interoperability ensured but people can know at a glance, and without having to go through a whole lot of legalese, what they are free to do. (For more see this article and this post).

Thus, licensing and definitions are important even though they are only a small part of the overall picture. If we get them wrong they will keep on getting in the way of everything else. If we get them right we can stop worrying about them and focus our full energies on other things." (

Why licenses are better than community norms

"Even if a basic license is used it can be argued that any ‘requirements’ for attribution or share-alike should not be in a license but in ‘community norms’. So which is best?

In my view, when making available data, licenses are much better than community norms. Why?

  1. A license is always needed even if you are taking a PD approach. So ‘norms’ don’t obviate the need to license.
  2. A license is able to encode ‘norms’ both formally and informally (for example, in a preamble — cf. the GPL).
  3. A license is likely to elicit at least as much, and almost certainly more, conformity with its provisions than community norms. This is especially true outside of the community. The future is likely to see a much more mixed data landscape whether in science or elsewhere with many ‘non-community’ (non-academic) business and among ordinary citizens."


More Information

  1. Open Data