Open Data in Science
Definition
Peter Murray-Rust of the Unilever Centre for Molecular Sciences Informatics at the University of Cambridge (UK):
“The emerging Open Data movement shares many goals with the Open Access and Open Source movements, but encompasses its own distinct issues that are in need of examination by the scientific community. Many advocates of Open Data believe that, although there are substantial potential benefits from sharing and reusing digital data upon which scientific advances are built, today much of it is being lost or underutilized because of legal, technological and other barriers." (http://www.arl.org/sparc/announce/102405.html)
Characteristics
- scientific data deemed to belong to the commons (e.g. the human genome)
- infrastructural data essential for scientific endeavour (e.g. in Geographic information systems)
- data published in scientific articles which are factual and therefore not copyrightable
- Open Notebook Science refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.
Discussion
Requirements for Open Data in science
Quoted from http://www.windley.com/archives/2006/05/free_the_data.shtml
- Re-use structures including schemas and ontologies. It’s more important to use well-understood structures than to use any particular idiom.
- Re-use the licenses that have already been developed. Licensing meta-data (ala Creative Commons) is also important.
- Enable re-use of ideas (contrasted with the expression of the idea). We have to find the proper scope of ‘derivative works’ and re-examine the issue of database copyright. Shockingly, copying the bibliographic data from a work (for purposes of citation) can be seen as a violation of some licenses.
- Attach policy information that says how the information can be used. Some experimental data depends critically on personally identifying information. Anonymization is a hard task either not working well or being at odds with the underlying research purpose of the data.
- Use open standards
(Weitzner presentation at http://www.w3.org/2006/Talks/0525-web-data-publishing/#(3); qutoed here [1])
Open Data in Science must be machine-readable
Murray-Rust (cited by Richard Poynder):
" traditional subscription publishers like the American Chemical Society and Wiley explicitly forbid text mining of papers they publish. At the same time these publishers insist that authors not only sign over the copyright in the paper, but also ownership of the supplemental data, despite the fact that factual data are not subject to copyright.
After failing to persuade Open Access advocates to hear his concerns, Murray-Rust began to direct his energies to what he calls the Open Data movement, for which he is now a leading advocate. While he remains an advocate for OA, he explains, he has come to believe that the issue of Open Data needs to be addressed separately. For where the Open Access movement is concerned only with ensuring that scholarly papers are human readable, the Open Data movement requires that they are also machine readable. And since Open Data implies reuse, it is vital that licences are provided that specifically permit this.
Fortunately, Science Commons stepped into the breach, and is proving a valuable ally, not least by developing the Open Data protocol and the recently-announced Public Domain Dedication & Licence (PDDL) — thereby providing the first component of the legal framework that Murray-Rust believes is needed to enable text mining, and helping in the creation of the chemical semantic web." (http://poynder.blogspot.com/2008/01/open-access-interviews-peter-murray.html)
Status Report: Access to Research Data in the OECD
From http://www.firstmonday.org/issues/issue12_6/wunsch/index.html:
"Throughout OECD Member countries, continuously growing quantities of data are collected by publicly funded researchers and research institutions. This rapidly expanding body of research data represents both a massive investment of public funds and a potential source of the knowledge needed to address the myriad challenges facing humanity.
To promote improved scientific and social return on the public investments in research data, OECD member countries have established a variety of laws, policies and practices concerning access to research data at the national level. In this context, it was recognized that international guidelines would be an important contribution to fostering the global exchange and use of research data.
At the outset, the third OECD Global Research Village Conference addressed policy implications of the use of Information and Communication Technologies (ICT) for the global science system in 2000 [7]. In particular, the conference discussed issues of access to publicly financed research related to ICT as for instance access to intellectual property and data resources. In 2001, the OECD’s Committee for Scientific and Technological Policy (CSTP) agreed to the establishment of a Working Group to draw up commonly agreed principles to guide access to publicly financed research. Access to and sharing of research data from public funding was chosen as the most appropriate focus for the activities of the Working Group [8]. Collaborations with similar working groups such as CODATA [9] were sought.
In 2004, OECD Science and Technology Ministers declared that fostering broader, open access to and wide use of research data will enhance the quality and productivity of science systems worldwide. Ministers adopted a Declaration on Access to Research Data from Public Funding, asking the OECD to take further steps towards proposing Principles and Guidelines on Access to Research Data from Public Funding, based on commonly agreed principles to facilitate optimal cost-effective access to digital research data from public funding, and taking into account possible restrictions related to security, property rights and privacy (Annex) [10]. It recognizes “that open access to, and unrestricted use of, data promotes scientific progress and facilitates the training of researchers” and “will maximize the value derived from public investments in data collection efforts”, and entrusted the OECD ’s Committee for Scientific and Technological Policy (CSTP) to work towards the establishment of access regimes for digital research data from public funding. The Ministers asked for the guidelines to be endorsed by the OECD Council at a later stage.
An expert group was formed to support this objective of translating Minister’s goals into an OECD policy instrument. The objective of the Expert Group is to draft useful and relevant guidelines that can be used by national governments and a wide variety of research organizations to facilitate and improve the international sharing of, and access to, digital research data gathered with the assistance of public funding.
The nature of “public funding” of research varies significantly from one country to the next, as do existing data access policies and practices at the national, disciplinary and institutional levels. These differences call for a flexible approach to data access and recognition that one size does not fit all." (http://www.firstmonday.org/issues/issue12_6/wunsch/index.html)
More Information
- SPARC Open Data Email Discussion List, at http://www.arl.org/sparc/opendata/index.html
- Working Group on Open Data in Science
2004 OECD Ministerial Declaration on Access to Digital Research Data from Public Funding