Human Genome Project

From P2P Foundation
Jump to navigation Jump to search


Characteristics

Via [1]:

o Products: Data and Tools. Genome sequence available publicly o Governance: funded through the NIH o Comment: Another interesting instance of the commons - the government used the power of funding to mandate open access requirements from the organizations which participated.


Discussion

1. Via [2]:


"The Human Genome Project was the mapping of the entire human genome using an open approach. Funded primarily by US government funding emerging initially from the Department of Energy (representing the roots of genomic research in the push to understand mutation emerging from radiation exposure), the HGP was a classic “big science” project. An enormous amount of money was committed, a small number of centers were chosen to receive that money, and there was an expectation that the data resulting would be an open product. The primary regulation in our nomenclature was normative and not legal – the data was in the public domain, but there were some expectations of scientific behavior and the right to punish violators was reserved, but the punishment would be in the discipline via peer review and grantmaking review and not via the courts.

The norms that emerged from the Human Genome Project served as the basis for setting norms for the development of common‐based practices in the genomics field and also for the understanding of the legal rules related to database protection. For instance, in the in‐take angle, with the HGP it was understood that a limited group of people could contribute since there was a lack of capacity and infrastructure. Not many had the scientists, the labs or the machines to develop the study – a marked characteristic of differentiation when you compare the HGP with Open Source projects, where there is a democratization of means via ubiquitous cheap desktop computing and ICTs.

However, after some time into the project development, the sponsors of the project – the government – realized that the people part of the project’s team was not posting the data they were producing and the competitor Celera was rapidly creating a private version of the genome via new technology (itself developed at least partially with HGP funding). This was the origin of the codified and formalized norms known as the Bermuda Rules, further developed during the Fort Lauderdale meeting. The Rules were simple and clear: all data was in the public domain, and it would be posted online with 24 hours of coming off the machines. However, scientists using the data were expected to check and see if the data had been “published” yet (the fuzzy part) and if it was unpublished they were expected to honor some norms about the data.

The norms that emerged from the HGP were the inspiration for a norm‐setting process in the HapMap project. However, when the HapMap came to life, the Open Source Movement was already a well developed and studied movement. The FLOSS movement inspired the HapMap to adopt, in its beginning, a more regulated approach, through the institution of a “click wrap” contract among the HapMap participants during its in‐take and out‐take process. The sharing norms instituted by the HapMap contract highly regulated the publication process and also tried to interfere in the exploitation (more precisely – the abuse) of patents that may have emerged from the HapMap out‐puts."


Knowledge Governance in the HGP

"Completed in 2003, the Human Genome Project (HGP) was a 13-year, $3,000,000,000 project coordinated by the U.S. Department of Energy and the National Institutes of Health. During the early years of the HGP, the Wellcome Trust (U.K.) became a major partner; additional contributions came from Japan, France, Germany, China, and others. Project goals were to identify all the approximately 20,000-25,000 genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, store this information in databases, improve tools for data analysis, transfer related technologies to the private sector, and address the ethical, legal, and social issues (ELSI) that may arise from the project.

While the entire project raised issues of knowledge governance, first we will examine the issues related to the datasets and databases created in the HGP, because the data governance regimes that emerged from the Human Genome Project served as the basis for setting norms for the development of common-based practices in the genomics field that last far beyond the HGP itself.

As a global distributed project, the HGP was forced early on to grapple with the issues of legal and technical interoperability, data acquisition and distribution, and scientific traditions of priority, publication, and citation. It was also a deeply asymmetric project - it was understood that a limited group of people could contribute since there was a lack of capacity and infrastructure. This limited funding to large sequence centers at major, well known, and powerful universities. Not many had the scientists, the labs or the machines to develop the study – a marked characteristic of differentiation when you compare the HGP with Open Source projects, where there is a democratization of means. Knowledge governance leading data from the few to the many would be required to facilitate an open system’s later emergence.

However, after some time into the project development, the sponsors of the project – the governments and private funders – realized that the public data deposits were falling behind the rate of publicly funded data production. Worse, a private competitor (Celera) was rapidly accelerating the creation of a closed whole-genome sequence. The government and funder reaction was to send the key scientific leadership away to Bermuda (later Fort Lauderdale, FL) to work out the problem amongst themselves.


The basis was scientific, not legal, and deeply tied to the innate asymmetry of funding and the knowledge governance obligations it created:

- “if genome centres restrict their data and get preferential access to it, then some members of the community will no longer support monopolistic funding models (in which large centres sequence one genome after another without peer review of each project). Instead, they will demand the right to compete with these empires, especially for the most scientifically desirable genomes. Other scientists, especially bioinformaticians, will seek to relocate to the centres to gain the advantage of early data access. Data restrictions will therefore promote factionalization where we should be seeking efficiencies of scale, and centralization where we should be promoting diversity”

The resulting 1996 agreement is widely known as the Bermuda Rules. This landmark agreement is not a legal construct, contract, license, or otherwise binding in a court of law. It simply represented the norms of the HGP sequencing community. And the rules are simple. First, take care of the backlog by releasing, immediately, all DNA strands longer than 1000 units; second, all new data goes on the web and into the public domain within 24 hours of coming off the machine.

Within this open governance regime the sequencing centers developed a strong competitive streak, which drove more and more data into the public domain, faster. One key requirement in the success of the Bermuda Rules was, in exchange for access, the application of a scientific publication norm: the centers depositing retained certain rights of first publication. But this again was a norm, not a legal requirement. Violators were in the realm of scientific publication and community judgment, not the courts.

There was great expectation that the impact of the release of the genome data would be that genomics companies would dominate the new face of drug discovery and development48, which faded as it became clear that data on its own was not sufficient to provide the knowledge required to understand diseases or discover drugs. The publication of the complete human genome in the public domain also had a significant governance impact on companies whose business was to use trade secrets to protect their data products, and companies such as Celera were unable to continue a business model based on expensive subscriptions for data that was available on the web in the public domain.

Another key aspect of the HGP was the systematic investment in the technical infrastructure to distribute the sequences, a combined effort of the various governments involved that included nightly sequence harmonization across the various data repositories. The emergence of the U.S. National Center for Biotechnology Information (NCBI) was essential to the success of the project, demonstrating that technical accessibility is part of knowledge governance as well, and also developing some of the early integration of technology access with policy access – the NCBI not only clearly marks government data as public domain, but will not accept data whose depositors request controls based on intellectual property into its molecular databases.50 Knowledge in this context must therefore also be studied with an eye towards technical architectures and their impact on governance.

The norms that emerged from the HGP were the inspiration for the norm-setting process in its successor, the “HapMap” project of human genetic variation,51 and many of the same technical infrastructures were expanded to include the variation data alongside the genome. However, different from the origins of the HGP, when the HapMap was born, the Open Source Movement was a well developed and studied movement. The founders were inspired by Open Source to adopt an “open click-wrap” data license that tried to regulate publication processes and intervene in the exploration (more precisely – the abuse) of patents that may have emerged from the HapMap outputs.


This licensing approach was abandoned for an unregulated environment running under the Bermuda Rules after a few years of operations. The fear of patent enclosure had been ameliorated by the dedication of so much data to the public domain, so the patent aspects of the contract were felt unnecessary, and the unintended knowledge governance impact of the contract was that the HapMap data could not be integrated into the HGP data without creating legal contamination. Thus, the contract was lifted and the norms moved into place instead.

The HGP is in many ways a paradigmatic case of the shift in knowledge governance from privacy and withholding to open sharing. The perfect mixture of policy, funding, norms, law, and technical infrastructure came together to open up the genome to all – when an alternate outcome could have easily occurred. The genome is now fundamentally open data, legally and technically54. Its entirety can be downloaded without registration, and redistributed. It can be annotated, visualized, and built upon. It leverages standard data formats, data repositories, software tools, and more. And we see the long-term environmental impact of “conserving” the genome in a realm where the knowledge governance of genome data was run by the scientists, not the courts, in the explosion of downstream knowledge products emerging from the HGP and its successors.

From the genome as a foundational base we see many distributed efforts emerging, as we might expect from an open source or wiki approach. Distributed annotation systems emerged to mark up the genes on the genome55, and an entire new field - synthetic biology - erupted, using the information gleaned in the genome sequence to create standard biological “parts” to be used in biological programming systems56. Yet to achieve this, the HGP represented a years-long investment in fundamental data generation with an unclear outcome, and required the creation of significant funding streams to support technical distribution systems. Scientists had to come together and develop new norms and ethics for knowledge governance and distribution, not to mention standards for annotation and reuse as well as software tools and systems that made the genome useful, accessible and interoperable." (http://cyber.law.harvard.edu/commonsbasedresearch/sites/commonsbasedresearch/images/Genomics_Knowledge_Governance.pdf)