Genomic Data Commons

From P2P Foundation
Jump to navigation Jump to search

Directory of Genomic Data Commons

Human Genome Project

Via [1]:

o Products: Data and Tools. Genome sequence available publicly o Governance: funded through the NIH o Comment: Another interesting instance of the commons - the government used the power of funding to mandate open access requirements from the organizations which participated.


Via [2]:

o Products: Data. Coordination between researchers in Canada, China, Japan, Nigeria, United Kingdom and the United States to identify disease-causing genes. Data released into the public domain o Governance: Combination of both public and private organizations ( o Another good instance of commons-based production


Via [3]:

  • Products: Data. Open consortium to identify all functional elements of the human genome. Data is made publicly available
  • Governance: Part of the NIH
  • Comment: Perfect instance of commons-based production.

Directory of Observational Genomic Data Commons

Gene Expression Omnibus

Via [4]:

"The birth of the GEO project in the environment we briefly analyzed above allowed the emergence of a new kind of Commons, the "Partially Open Commons", where everybody that has the money and the tools to run the experiment is allowed. This pattern may be similar to the Open Commons, but it is different from the “Limited Commons” that in general we observed as a patter in the Foundational Data projects, where just the “chosen” ones could contribute."

Proteomic Data Commons

Via [5]:

"In terms of data and narrative outputs, proteomics is very similar. There is fundamental and observational data, though there is no “human proteome project” like the HGP to serve as an aggregating actor for commons based efforts. There are many smaller efforts that we can study including databases in structural genomics and protein data.

For protein tools, antibodies are the biggest category. We can classify all sorts of antibodies for specific study like cytokines, neurotrophins, etc. These are studyable both from the perspective of companies that provide as well as labs. There is also a growing system for protein expression like gene expression that depends on antibodies, but also now can use all sorts of genomic tools. So the genomic tools are now becoming proteomic tools as well. Also, access to the same stem cells and mice is essential if the research is going to translate to cures. It will be interesting to look and see if the same desire for treating the outputs of research as inputs to new research we saw in the fundamental genomic data space apply here.

Some other kinds of protein tech would include high throughput screening array technology (the robots that test drugs against proteins) and software tools: structure prediction, identification, properties, alignment. Proteomics research is very intensive in terms of computation and software (much more complex than genomics – more similar in some ways to climate change and weather modeling in terms of complexity)."

We should probably expect to discuss the impact of patents as biomarker / diagnostic marker. Gene patents haven't had the expected impact of anticommons, but protein patents are extremely valuable and frequently enforced."

Directory of Tools and Methodologies Commons in Biotechnology

Via [6]:


o Output: Tools (e.g. new databases) and Narratives (studies and papers) o Governance: Non-profit NGO. Funding through the Norwegian Government, Horticulture Australia, and the Lemelson Foundation o Should definitely take a look at the BioForge project, which aims to encourage collaboration between research groups in the life sciences

o Products: Data, Narratives and Tools. Coalition of organizations aim to share data under a common set of terms and conditions o Governance: lead by 501(c)3 Science Commons

  • Ensembl Genome Browser

o Output: Data. Aims to automatically annotate the genome, integrate that annotation with other databases and share the product freely on the web o Governance: Collaboration between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute o Comment: Interesting case - seems to be using data that's in the commons, managed by private organizations, to produce a new product that is also in the commons o Summary/Notes: Ensembl is a joint project between EMBL - EBI and the Wellcome Trust Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes.

  • BIODAS: Distributed Annotation System

o Products: Data, Narratives and Tools. Aims to create standard protocol for exchanging genomic annotations o Governance: Distributed, though with self-appointed leaders o Comment: This falls under the gray area of the definition of 'commons'. It is much closer to Lessig's definition, where something like TCP/IP could be considered a commons.

  • National Center for Biotechnology Information

o Output: Data. Creates publicly accessible data and analysis systems for biochemistry and genetics o Governance: Division of National Library of Medicine and National Institutes of Health o Comment: Probably does not count as a common-based system. The tools, while publicly available, do not appear to be publicly edit-able. Might be more useful to see what if any collaborative enterprises develop from this work

  • Open Biological Ontologies

o Primarily products: Data, Narratives and Tools. Aims to support community of people developing biomedical ontologies o Governance: Coordinating editors from the Berkeley Bioinformatics Open-Source Projects - there does not seem to be a system of elections

  • Open Wet Ware

o Primarily products: Data, Narratives and Tools. Sharing best practices in biological engineering o Governance: Elected officers, funded through the NSF o Summary/Notes: OpenWetWare is an effort to promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering.

o Problem it aims to address: open source drug development has not been successful because there lacks a critical mass of publicly available data. Focus on tropical diseases o Methodology: develop computational pipeline (e.g. open source data sets) for developing the following: structure modeling of target proteins o predictions of ligant bonding locations: (e.g.,, public-private partnerships (e.g., and private foundations (e.g.,; o predict structures of protein sequences o products created from using the software do not seem to be required to be put in the public domain o Uses Science Commons protocol for implementing Open Access Data (( o Open Source Biotechnology Project, Open source biotechnology?,