Open Scientific Data

From P2P Foundation
Jump to navigation Jump to search

Key thesis: The critical thing to realise is that Open Scientific Data is not Open Software nor Open Content. [1]

Proposed Panton Principles

Consensus after a discussion between Peter Murray-Rust, Cameron Neylon and Rufus Pollock.

What is needed as Open Scientific Data Licences:


"1. A simple statement is required along the forms of “best practice in data publishing is to apply protocol X”. Not a broad selection of licenses with different effects, not a complex statement about what the options are, but “best practice is X”.

2. The purpose of publishing public scientific data and collections of data, whether in the form of a paper, a patent, data publication, or deposition to a database, is to enable re-use and re-purposing of that data. Non-commercial terms prevent this in an unpredictable and unhelpful way. Share-alike and copyleft provisions have the potential to do the same under some circumstances.

3. The scientific research community is governed by strong community norms, particularly with respect to attribution. If we could successfully expand these to include share-alike approaches as a community expectation that would obviate many concerns that people attempt to address via licensing.

4. Explicit statements of the status of data are required and we need effective technical and legal infrastructure to make this easy for researchers." (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1939)


Discussion

Peter Murray-Rust:

"The critical thing to realise is that Open Scientific Data is not Open Software nor Open Content. It may sound arrogant but it can be difficult for a non-scientist to realise that is is different from maps, from Shakespeare, from photography, from government publications, from cricket scores. Scientists by default collect data, or calculate it, to justify their conclusions to prove they have done the work, to allow others to repeat the work.

It should be free, as in air.

They expect others to use it, without their permission. This could be to provde the original ideas right, or to prove them wrong. It could be to mine the data for ideas the original scientists missed. No scientist likes being proved wrong, or having someone else find ideas that they have missed. But it’s a central part of science. A scientist who says “you can’t use my published data” has no credibility today.

That’s not to say some scientists don’t try to hold their data back and mine the maximum from it before publishing. But it is becoming increasingly required – by funders, by universities (in theses) and by some publishers – that the data justifying a publication should be “published” in some way at the time of article publication. And by default there should be no restrictions on copying, re-use , republishing for whatever purpose and by whomever. I may not like it if my data is used to make weapons, or that a commercial organisation republishes it for money. But that is the implied contract I make by being a scientist. If I don’t like weapons derived from science there are other ways I can make my views known other than by adding restrictions – and at times I have.

To summarize. Data itself must be completely free. The question is how to ensure that it is.

The Open Science and Open Knowledge community has been discussing this for about 2 years. We seem to be agreed that legal tools are counterproductive, and that moderation is best applied by the community. This is represented by Community Norms – agreed practices that cause severe disapproval and possibly action when broken.

Our current crisis in Britain illustrates this. Huge numbers of Members of Parliament have been fiddling their expenses. They’ve been spending taxpayers’ money on cleaning their castle moats, buying second homes, antique rugs and so on. Huge amounts. This is, apparently, within the parliamentary guide lines.

But it is against the court of public opinion. It violates our Community Norms. The defence that it is “within the rules” illustrates the futility of the rules.

And it is incredibly difficult to draft good rules. So we’ve decided not to try to use the standard tools of copyright or licences.

For us Data are born Open. The question is how to state that. The simplest way is just to add the OKF’s “Open Data” button to the data. That’s a statement of intent. It says “you can do whatever you like with this data without asking my permission.” In many cases I think that is adequate.

However the community has also investigated the legal aspect and to provide a formal means of stating this in legal terms. This isn’t easy but the two approaches – Public Domain Dedication and Licence (PDDL) and Creative Commons CC0 – are roughly equivalent. I hope it’s useful to say that PPDL comes out of an Open Knowledge philosphy and deals with collections and other non-scientific content, whereas CC0 springs more directly from science." (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1939)


Cameron:

"The appropriate way to license published scientific data is an argument that has now been rolling on for some time. Broadly speaking the argument has devolved into two camps. Firstly those who have a belief in the value of share-alike or copyleft provisions of GPL and similar licenses. Many of these people come from an Open Source Software or Open Content background. The primary concern of this group is spreading the message and use of Open Content and to prevent “freeloaders” from being able to use Open material and not contribute back to the open community. A presumption in this view is that a license is a good, or at least acceptable, way of achieving both these goals. Also included here are those who think that it is important to allow people the freedom to address their concerns through copyleft approaches. I think it is fair to characterize Rufus as falling into this latter group.


On the other side are those, including myself, who are concerned more centrally with enabling re-use and re-purposing of data as far as is possible. Most of us are scientists of one sort or another and not programmers per se. We don’t tend to be concerned about freeloading (or in some cases welcome it as effective re-use). Another common characteristic is that we have been prevented from being able to make our own content as free as we would like due to copyleft provisions. I prefer to make all my content CC-BY (or cc0 where possible). I am frequently limited in my ability to do this by the wish to incorporate CC-BY-SA or GFDL material. We are deeply worried by the potential for licensing to make it harder to re-use and re-mix disparate sets of data and content into new digital objects. There is a sense amongst this group that “data is different” to other types of content, particulary in its diversity of types and re-uses. More generally there is the concern that anything that “smells of lawyers”, like something called a “license”, will have scientists running screaming in the opposite direction as they try to avoid any contact with their local administration and legal teams." (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1939)

Open Scientific Data Licenses

Peter Murray-Rust:

"The two approaches – Public Domain Dedication and Licence (PDDL) and Creative Commons CC0 – are roughly equivalent. I hope it’s useful to say that PPDL comes out of an Open Knowledge philosphy and deals with collections and other non-scientific content, whereas CC0 springs more directly from science." (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1939)

More Information

  1. Public Domain Dedication and License
  2. CC0

Headline text