Open Data: Difference between revisions
No edit summary |
No edit summary |
||
| Line 116: | Line 116: | ||
==Open Data in Science== | ==Open Data in Science== | ||
See: [[Open Data in Science]] | |||
=Open Data Organizations= | =Open Data Organizations= | ||
Revision as of 13:24, 4 March 2009
Definition
From the Wikipedia at http://en.wikipedia.org/wiki/Open_Data
"Open Data is a philosophy and practice requiring that certain data are freely available to everyone, without restrictions from copyright, patents or other mechanisms of control. It has a similar ethos to a number of other "Open" movements and communities such as Open Source and Open access.
Open Data is often focussed on non-textual material such as maps, genomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data are controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of Open Data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license."
Characteristics
Fundamental Open Data Rights:
"Arguments made on behalf of Open Data include:
- "Data belong to the human race". Typical examples are genomes, data on organisms, medical science, environmental data.
- Public money was used to fund the work and so it should be universally available.
- It was created by or at a government institution (this is common in US National Laboratories and government agencies)
- Facts cannot legally be copyrighted.
- Sponsors of research do not get full value unless the resulting data are freely available
- Restrictions on data re-use create an anticommons
- Data are required for the smooth process of running communal human activities (map data, public institutions)
- In scientific research, the rate of discovery is accelerated by better access to data."
(http://en.wikipedia.org/wiki/Open_Data)
Relation to other open activities:
"There are a number of other "Open" philosophies which are similar to, but not synonymous with Open Data but which may overlap, be supersets, or subsets. Here they are briefly listed and compared.
- Open Source Software is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.
- Open Content has similarities to Open Data and may be seen as a superset but differs in that it emphasizes creative works while Open Data is more oriented towards factual data and the output of the scientific research process.
- Open Knowledge. The Open Knowledge Foundation argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information
(http://en.wikipedia.org/wiki/Open_Data)
Open Data are opposed by Closed Data:
"Several intentional or unintentional mechanisms exist for restricting access to or re-use of data. They include:
- compilation in databases or websites to which only registered members or customers can have access.
- use of a proprietary or closed technology or encryption which creates a barrier for access.
- copyright forbidding (or obfuscating) re-use of the data.
- license forbidding (or obfuscating) re-use of the data
- patent forbidding re-use of the data (for example the 3-dimensional coordinates of some experimental protein structures have been patented)
- restriction of robots to websites, with preference to certain search engines
- aggregating factual data into "databases" which may be covered by "database rights" or "database directives" (e.g. Directive on the legal protection of databases)
- time-limited access to resources such as e-journals (which on traditional print were available to the purchaser indefinitely)
- political, commercial or legal pressure on the activity of organisations providing Open Data (for example the American Chemical Society lobbied the US Congress to limit funding to the National Institutes of Health for its Open Pubchem data."
(http://en.wikipedia.org/wiki/Open_Data)
Open Data Policies
RECOMMENDATIONS from the U.S. Public Policy Committee of the ACM (USACM):
- Data published by the government should be in formats and approaches that promote analysis and reuse of that data.
- Data republished by the government that has been received or stored in a machine-readable format (such as online regulatory filings) should preserve the machine-readability of that data.
- Information should be posted so as to also be accessible to citizens with limitations and disabilities.
- Citizens should be able to download complete datasets of regulatory, legislative or other information, or appropriately chosen subsets of that information, when it is published by government.
- Citizens should be able to directly access government-published datasets using standard methods such as queries via an API (Application Programming Interface).
- Government bodies publishing data online should always seek to publish using data formats that do not include executable content.
- Published content should be digitally signed or include attestation of publication/creation date, authenticity, and integrity.
(http://www.acm.org/public-policy/open-government)
Open Data Domains
The concept of Open Data is used in different contexts, i.e. mostlhy as either the availability of scientific raw data and as open access to publicly funded, 'government' information.
(There is of course an obvious overlap when the scientific data are produced by public funding or government institutions.)
Open Access to Government Information
See Open Access to Government Information, as well as Open Government Data and Open Public Data
Open Data in Science
See: Open Data in Science
Open Data Organizations
- CODATA
- Science Commons
- Free Our Data (The Guardian technology section), http://www.freeourdata.org.uk/index.php
- The Open Knowledge Foundation
- Talis
- Web2Express.org, Open data on semantic web
- Linking Open Data on the Semantic Web
Open Data Companies
"“Open data is to media what open source is to technology. Open data is an approach to content creation that explicitly recognizes the value of implicit user data. The internet is the first medium to give a voice to the attention that people pay to it. Successful open data companies listen for and amplify the rich data that their audiences produce.” (http://www.attentiontrust.org/node/430)
- Adaptive Blue- Extended browsing
- Aggregate Knowledge- Outsourced recommendations
- Atten.TV- Attention media
- Buzzlogic- Tracking influence
- ClearForest – Text analytics
- Daylife- Hi-touch algorithmic news
- Feedburner- RSS content management
- Lijit Networks- Ranking people
- Majestic Research- Online behavior for investors
- Meetup- America offline
- MyBlogLog- Reader communities
- Omnidrive- Open data storage
- Right Media- Transparent ad network
- Stumbleupon- The "forward" button
(http://www.attentiontrust.org/node/428)
Open Data Repositories
open data sets availiable on the Web.
Examples include Wikipedia, Geonames, MusicBrainz
See also: PubChem
Status Report 2007
Peter Suber:
"With or without mandates, more governments committed themselves to OA for publicly funded data. Norway adopted an OA mandate for public geodata. Canada, Ireland, and Australia began providing OA to publicly funded digital mapping data, without a mandate. After long resistance, the UK Ordnance Survey began to do the same, at least experimentally. (Earlier in the year, a legal analysis by Charlotte Waelde, an expert on intellectual property at the University of Edinburgh, concluded that the data are not protected by copyright but at most, only by the database right; a JISC report recommended a general UK policy of OA for research data; and the new UK Prime Minister Gordon Brown endorsed the principle of public access to public data.) The Committee of Ministers of the Council of Europe recommended "wide public access to research results to which no copyright restrictions apply" (i.e. data). Publishing consultant Eve Gray reported that the South African government was moving toward a policy of OA for publicly funded research data. The Australian government proposed an Australian National Data Service to promote OA and re-use of publicly funded research data. The Organisation for Economic Co-operation and Development (OECD) issued principles and guidelines to implement its 2004 Declaration on Access to Research Data from Public Funding. California is about to adopt the strongest and broadest OA mandate for greenhouse gas data in the US, and Pennsylvania is about to join the other 49 states in mandating OA for state statutes. And the UN Convention on Long-range Transboundary Air Pollution (LRTAP) adopted an OA mandate for most kinds of data covered by the convention.
The US Government Accountability Office called on four major federal funding agencies (DOE, NASA, NOAA, and NSF) to enforce their existing policies on data sharing. Twenty-two US federal government agencies formed an Interagency Working Group on Digital Data (IWGDD), plan to deposit the data generated by their research grantees in a network of OA repositories, and are considering an OA mandate. The US National Archives joined the OA web portal Geospatial One Stop. The NSF Office of Cyberinfrastructure launched a data interoperability project (INTEROP). Google created a Public Sector Initiative to improve its crawling of OA databases hosted by federal, state, and local government agencies in the US. A group of open government activists convened by O'Reilly Media and Public.Resource.Org drafted principles for open government data. For the first time the US made progress toward OA for its three most notorious non-OA government resources: PACER (Public Access to Court Electronic Records), the database of federal court docket information; NTIS (National Technical Information Service), the online databases of research and business data; and CRS Reports, the highly regarded reports from the Congressional Research Service. The first two began offering OA to selected portions of their content, previously TA, and the third is the subject of a new bill in the Senate to mandate OA.
Nature editorialized in favor of e-notebook science and data sharing, and Nature Biotech recommended "that raw data from proteomics and molecular-interaction experiments be deposited in a public [OA] database before manuscript submission." Maxine Clarke, Publishing Executive Editor at Nature, said that the journal would consider requiring and not merely recommending OA for multimedia data if there were a suitable OA repository supporting annotation and long-term preservation. Wiley threatened legal action when Shelley Batts, a graduate student at the University of Michigan, posted a chart from a Wiley article from the Journal of the Science of Food and Agriculture on her blog; when she replaced it with her own chart of the same data and blogged Wiley's threat, the blogosphere exploded and Wiley said it was all a misunderstanding.
Data-sharing policies were adopted by the UK Medical Research Council, the Ethics Committee of France's Centre National de la Recherche Scientifique (CNRS), the Audiovisual Communications Laboratory at Switzerland's Ecole Polytechnique Fédérale de Lausanne, and the International Telecommunications Union. The NIH launched a new data-sharing program for its neuroscience research. There are too many new OA databases to name separately, but since I've mentioned the NIH, I should add that it launched the Database of Genotype and Phenotype (dbGaP) and SHARe (SNP Health Association Resource). It described SHARe as "one of the most extensive collections of genetic and clinical data ever made freely available to researchers worldwide."
Google began helping researchers exchange datasets up to 120 terabytes in size, too large for ordinary online uploads and downloads. At no charge to the researchers, it will ship a brick-sized box of hard drives from one research team to another, provided that the data have no copyright or licensing restrictions and the bricks stop first at Google headquarters for copying and offline storage. In time, Google hopes to make the datasets OA. The company also began sharing files of its own data with researchers on the condition that they make the results of their research OA.
The year 2007 saw a wave of general OA data repositories spring up, many with built-in features for graphics and analysis: for example, Dabble, Data360, Freebase, Many Eyes, Open Economics, StatCrunch, Swivel, and WikiProteins. At the same time, several projects worked to facilitate the deposit of data in OA repositories, such as EDINA's DataShare and JISC's SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data), or to enhance the interface between data repositories and literature repositories, such as JISC's StORe (Source-to-Output Repositories).
By my informal estimate, the fields with the largest advances in OA data during 2007 were archaeology, astronomy, chemistry, the environment (including climate change), geography (including mapping), and medicine (especially, genomics and clinical drug trials)." (http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0011.110)