Open Data: Difference between revisions
No edit summary |
No edit summary |
||
| Line 1: | Line 1: | ||
The concept of '''Open Data''' is used in | |||
=Definition= | |||
From the Wikipedia at http://en.wikipedia.org/wiki/Open_Data | |||
"'''Open Data is a philosophy and practice requiring that certain data are freely available to everyone, without restrictions from copyright, patents or other mechanisms of control'''. It has a similar ethos to a number of other "Open" movements and communities such as [[Open Source]] and Open access. | |||
Open Data is often focussed on non-textual material such as maps, genomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data are controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of Open Data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license." | |||
=Characteristics= | |||
Fundamental Open Data Rights: | |||
"Arguments made on behalf of Open Data include: | |||
* "Data belong to the human race". Typical examples are genomes, data on organisms, medical science, environmental data. | |||
* Public money was used to fund the work and so it should be universally available. | |||
* It was created by or at a government institution (this is common in US National Laboratories and government agencies) | |||
* Facts cannot legally be copyrighted. | |||
* Sponsors of research do not get full value unless the resulting data are freely available | |||
* Restrictions on data re-use create an anticommons | |||
* Data are required for the smooth process of running communal human activities (map data, public institutions) | |||
* In scientific research, the rate of discovery is accelerated by better access to data." | |||
(http://en.wikipedia.org/wiki/Open_Data) | |||
Relation to other open activities: | |||
"There are a number of other "Open" philosophies which are similar to, but not synonymous with Open Data but which may overlap, be supersets, or subsets. Here they are briefly listed and compared. | |||
* [[Open Source Software]] is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data. | |||
* [[Open Content]] has similarities to Open Data and may be seen as a superset but differs in that it emphasizes creative works while Open Data is more oriented towards factual data and the output of the scientific research process. | |||
* [[Open Knowledge]]. The Open Knowledge Foundation argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information | |||
(http://en.wikipedia.org/wiki/Open_Data) | |||
Open Data are opposed by Closed Data: | |||
"Several intentional or unintentional mechanisms exist for restricting access to or re-use of data. They include: | |||
* compilation in databases or websites to which only registered members or customers can have access. | |||
* use of a proprietary or closed technology or encryption which creates a barrier for access. | |||
* copyright forbidding (or obfuscating) re-use of the data. | |||
* license forbidding (or obfuscating) re-use of the data | |||
* patent forbidding re-use of the data (for example the 3-dimensional coordinates of some experimental protein structures have been patented) | |||
* restriction of robots to websites, with preference to certain search engines | |||
* aggregating factual data into "databases" which may be covered by "database rights" or "database directives" (e.g. Directive on the legal protection of databases) | |||
* time-limited access to resources such as e-journals (which on traditional print were available to the purchaser indefinitely) | |||
* political, commercial or legal pressure on the activity of organisations providing Open Data (for example the American Chemical Society lobbied the US Congress to limit funding to the National Institutes of Health for its Open Pubchem data." | |||
(http://en.wikipedia.org/wiki/Open_Data) | |||
=Open Data Domains= | |||
The concept of '''Open Data''' is used in different contexts, i.e. mostlhy as either the availability of scientific raw data and as open access to publicly funded, 'government' information. | |||
(There is of course an obvious overlap when the scientific data are produced by public funding or government institutions.) | (There is of course an obvious overlap when the scientific data are produced by public funding or government institutions.) | ||
| Line 52: | Line 127: | ||
==Open Data in Science== | ==Open Data in Science== | ||
# scientific data deemed to belong to the commons (e.g. the human genome) | |||
# infrastructural data essential for scientific endeavour (e.g. in Geographic information systems) | |||
# data published in scientific articles which are factual and therefore not copyrightable | |||
* '''Open Notebook Science''' refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data. | |||
===Definition=== | ===Definition=== | ||
| Line 97: | Line 179: | ||
[[2004 OECD Ministerial Declaration on Access to Digital Research Data from Public Funding]] | [[2004 OECD Ministerial Declaration on Access to Digital Research Data from Public Funding]] | ||
=Open Data Organizations= | |||
* [[CODATA]] | |||
* [[Science Commons]] | |||
* [http://www.freeourdata.org.uk/index.php "Free our data"] ([[The Guardian]] technology section) | |||
* [http://www.okfn.org/ The Open Knowledge Foundation] | |||
* [http://www.talis.com/ Talis] | |||
* [http://www.web2express.org/ Web2Express.org, Open data on semantic web] | |||
* [http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData Linking Open Data on the Semantic Web] | |||
[[Category:Encyclopedia]] | [[Category:Encyclopedia]] | ||
Revision as of 06:02, 1 September 2007
Definition
From the Wikipedia at http://en.wikipedia.org/wiki/Open_Data
"Open Data is a philosophy and practice requiring that certain data are freely available to everyone, without restrictions from copyright, patents or other mechanisms of control. It has a similar ethos to a number of other "Open" movements and communities such as Open Source and Open access.
Open Data is often focussed on non-textual material such as maps, genomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data are controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of Open Data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license."
Characteristics
Fundamental Open Data Rights:
"Arguments made on behalf of Open Data include:
- "Data belong to the human race". Typical examples are genomes, data on organisms, medical science, environmental data.
- Public money was used to fund the work and so it should be universally available.
- It was created by or at a government institution (this is common in US National Laboratories and government agencies)
- Facts cannot legally be copyrighted.
- Sponsors of research do not get full value unless the resulting data are freely available
- Restrictions on data re-use create an anticommons
- Data are required for the smooth process of running communal human activities (map data, public institutions)
- In scientific research, the rate of discovery is accelerated by better access to data."
(http://en.wikipedia.org/wiki/Open_Data)
Relation to other open activities:
"There are a number of other "Open" philosophies which are similar to, but not synonymous with Open Data but which may overlap, be supersets, or subsets. Here they are briefly listed and compared.
- Open Source Software is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.
- Open Content has similarities to Open Data and may be seen as a superset but differs in that it emphasizes creative works while Open Data is more oriented towards factual data and the output of the scientific research process.
- Open Knowledge. The Open Knowledge Foundation argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information
(http://en.wikipedia.org/wiki/Open_Data)
Open Data are opposed by Closed Data:
"Several intentional or unintentional mechanisms exist for restricting access to or re-use of data. They include:
- compilation in databases or websites to which only registered members or customers can have access.
- use of a proprietary or closed technology or encryption which creates a barrier for access.
- copyright forbidding (or obfuscating) re-use of the data.
- license forbidding (or obfuscating) re-use of the data
- patent forbidding re-use of the data (for example the 3-dimensional coordinates of some experimental protein structures have been patented)
- restriction of robots to websites, with preference to certain search engines
- aggregating factual data into "databases" which may be covered by "database rights" or "database directives" (e.g. Directive on the legal protection of databases)
- time-limited access to resources such as e-journals (which on traditional print were available to the purchaser indefinitely)
- political, commercial or legal pressure on the activity of organisations providing Open Data (for example the American Chemical Society lobbied the US Congress to limit funding to the National Institutes of Health for its Open Pubchem data."
(http://en.wikipedia.org/wiki/Open_Data)
Open Data Domains
The concept of Open Data is used in different contexts, i.e. mostlhy as either the availability of scientific raw data and as open access to publicly funded, 'government' information.
(There is of course an obvious overlap when the scientific data are produced by public funding or government institutions.)
Open Access to Government Information
Refers to the campaign for the openness of data collected by government, against company-centric licensing regimes which withhold access to publicly funded data to the public at large.
Description
From the key essay by Peter Weiss, Borders in Cyberspace
"Many nations are embracing the concept of open and unrestricted access to public sector information -- particularly scientific, environmental, and statistical information of great public benefit. Federal information policy in the US is based on the premise that government information is a valuable national resource and that the economic benefits to society are maximized when taxpayer funded information is made available inexpensively and as widely as possible. This policy is expressed in the Paperwork Reduction Act of 1995 and in Office of Management and Budget Circular No. A-130, “Management of Federal Information Resources.”[1] This policy actively encourages the development of a robust private sector, offering to provide publishers with the raw content from which new information services may be created, at no more than the cost of dissemination and without copyright or other restrictions.
In other countries, particularly in Europe, publicly funded government agencies treat their information holdings as a commodity used to generate short-term revenue. They assert monopoly control on certain categories of information to recover the costs of its collection or creation. Such arrangements tend to preclude other entities from developing markets for the information or otherwise disseminating the information in the public interest.
In the US, open and unrestricted access to public sector information has resulted in the rapid growth of information intensive industries particularly in the geographic information and environmental services sectors. Similar growth has not occurred in Europe due to restrictive government information practices. As a convenient shorthand, one might label the American and European approaches as ‘open access’ and ‘cost recovery’, respectively. The cost recovery model is now being challenged on a variety of grounds." (http://www.primet.org/documents/weiss%20-%20borders%20in%20cyberspace.htm)
OECD Public Sector Information definition
From http://www.firstmonday.org/issues/issue12_6/wunsch/index.html:
"Public sector information which often has characteristics of being: dynamic and continually generated, directly generated by the public sector, associated with the functioning of the public sector (for example, meteorological data, business statistics), and readily useable in commercial applications; and,
Public content which often has characteristics of being: static (i.e. it is an established record), held by the public sector rather than being directly generated by it (cultural archives, artistic works where third–party rights may be important), not directly associated with the functioning of government, and not necessarily associated with commercial uses but having other public good purposes (culture, education).
The first category comprises public sector “knowledge” which may be the basis for information–intensive industries; these employ the raw data to produce increasingly sophisticated products. The second refers to cultural, educational and scientific public knowledge where wide public diffusion and long–term preservation (e.g. via museums, libraries, schools) are major governmental objectives." (http://www.firstmonday.org/issues/issue12_6/wunsch/index.html)
2006 Open Data Movement Status
By Peter Suber at http://www.earlham.edu/~peters/fos/newsletter/01-02-07.htm
" 2006 was another big year for Open Access to data. China's Ministry of Science and Technology mandated OA to about 80% of the data generated by publicly-funded research. The Canadian Institutes of Health Research wrote a draft OA policy that would not only mandate OA to research articles but also some of the data files resulting from CIHR-funded research. The Gates Foundation required data sharing for its HIV/AIDS research. The Global Initiative on Sharing Avian Influenza Data was one of several initiatives to encourage OA to avian flu data, breaking the previous, widespread national practices of hoarding it to head off agricultural boycotts or help local scientists scoop foreigners. The US National Science Foundation's Cyberinfrastructure Vision For 21st Century Discovery endorsed open access to data. The Governing Board of the Global Biodiversity Information Facility adopted a Recommendation On Open Access To Biodiversity Data, reaffirming and extending its OA statement from last year. The Conference of the Parties to the Convention on Biological Diversity endorsed OA for biodiversity data. The NIH's OA data repository for biochemistry, PubChem, prevailed against the attempt by the American Chemical Society to defund it or scale it back, and began attracting content from commercial players like Thomson Scientific. The ALPSP and STM, which resist the growth of OA archiving, called for OA to raw data, especially data underlying published journal articles. The Guardian launched the Free Our Data campaign and pressed the UK government to provide OA to publicly-funded data, especially geospatial data. The UK Office of Fair Trading estimated that lack of OA to public data costs the country £500 million/year. The Public Geo Data launched an online petition calling for OA to EU-collected geospatial data. The Commission to the European Parliament published recommended OA to publicly-funded EU geodata. The European Parliament reached a compromise on the INSPIRE Directive (Infrastructure for Spatial Information in Europe), providing OA to some and providing other data on a cost-recovery basis. The Universal Protein Resource became the first database to use a Creative Commons license to encourage re-use, and Science Commons wrote an FAQ on using CC licenses for databases. The SPARC discussion list on Open Data, moderated by Peter Murray-Rust, though launched in late 2005, came to life in 2006. At least two powerful tools, FortiusOne and Swivel, launched to host and analyze OA data." (http://www.earlham.edu/~peters/fos/newsletter/01-02-07.htm)
More Information
More info at http://www.re-public.gr/en/?p=98. This article specifically focuses on geographic datasets in the UK.
See the sites of UK-based organizations such as Free Our Data and Public Geodata.
Open Data in Science
- scientific data deemed to belong to the commons (e.g. the human genome)
- infrastructural data essential for scientific endeavour (e.g. in Geographic information systems)
- data published in scientific articles which are factual and therefore not copyrightable
- Open Notebook Science refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.
Definition
Peter Murray-Rust of the Unilever Centre for Molecular Sciences Informatics at the University of Cambridge (UK):
“The emerging Open Data movement shares many goals with the Open Access and Open Source movements, but encompasses its own distinct issues that are in need of examination by the scientific community. Many advocates of Open Data believe that, although there are substantial potential benefits from sharing and reusing digital data upon which scientific advances are built, today much of it is being lost or underutilized because of legal, technological and other barriers." (http://www.arl.org/sparc/announce/102405.html)
Requirements for Open Data in science
Quoted from http://www.windley.com/archives/2006/05/free_the_data.shtml
- Re-use structures including schemas and ontologies. It’s more important to use well-understood structures than to use any particular idiom.
- Re-use the licenses that have already been developed. Licensing meta-data (ala Creative Commons) is also important.
- Enable re-use of ideas (contrasted with the expression of the idea). We have to find the proper scope of ‘derivative works’ and re-examine the issue of database copyright. Shockingly, copying the bibliographic data from a work (for purposes of citation) can be seen as a violation of some licenses.
- Attach policy information that says how the information can be used. Some experimental data depends critically on personally identifying information. Anonymization is a hard task either not working well or being at odds with the underlying research purpose of the data.
- Use open standards
(Weitzner presentation at http://www.w3.org/2006/Talks/0525-web-data-publishing/#(3); qutoed here [1])
Status Report: Access to Research Data in the OECD
From http://www.firstmonday.org/issues/issue12_6/wunsch/index.html:
"Throughout OECD Member countries, continuously growing quantities of data are collected by publicly funded researchers and research institutions. This rapidly expanding body of research data represents both a massive investment of public funds and a potential source of the knowledge needed to address the myriad challenges facing humanity.
To promote improved scientific and social return on the public investments in research data, OECD member countries have established a variety of laws, policies and practices concerning access to research data at the national level. In this context, it was recognized that international guidelines would be an important contribution to fostering the global exchange and use of research data.
At the outset, the third OECD Global Research Village Conference addressed policy implications of the use of Information and Communication Technologies (ICT) for the global science system in 2000 [7]. In particular, the conference discussed issues of access to publicly financed research related to ICT as for instance access to intellectual property and data resources. In 2001, the OECD’s Committee for Scientific and Technological Policy (CSTP) agreed to the establishment of a Working Group to draw up commonly agreed principles to guide access to publicly financed research. Access to and sharing of research data from public funding was chosen as the most appropriate focus for the activities of the Working Group [8]. Collaborations with similar working groups such as CODATA [9] were sought.
In 2004, OECD Science and Technology Ministers declared that fostering broader, open access to and wide use of research data will enhance the quality and productivity of science systems worldwide. Ministers adopted a Declaration on Access to Research Data from Public Funding, asking the OECD to take further steps towards proposing Principles and Guidelines on Access to Research Data from Public Funding, based on commonly agreed principles to facilitate optimal cost-effective access to digital research data from public funding, and taking into account possible restrictions related to security, property rights and privacy (Annex) [10]. It recognizes “that open access to, and unrestricted use of, data promotes scientific progress and facilitates the training of researchers” and “will maximize the value derived from public investments in data collection efforts”, and entrusted the OECD ’s Committee for Scientific and Technological Policy (CSTP) to work towards the establishment of access regimes for digital research data from public funding. The Ministers asked for the guidelines to be endorsed by the OECD Council at a later stage.
An expert group was formed to support this objective of translating Minister’s goals into an OECD policy instrument. The objective of the Expert Group is to draft useful and relevant guidelines that can be used by national governments and a wide variety of research organizations to facilitate and improve the international sharing of, and access to, digital research data gathered with the assistance of public funding.
The nature of “public funding” of research varies significantly from one country to the next, as do existing data access policies and practices at the national, disciplinary and institutional levels. These differences call for a flexible approach to data access and recognition that one size does not fit all." (http://www.firstmonday.org/issues/issue12_6/wunsch/index.html)
More Information
SPARC Open Data Email Discussion List, at http://www.arl.org/sparc/opendata/index.html
2004 OECD Ministerial Declaration on Access to Digital Research Data from Public Funding