Wiki Proteins

From P2P Foundation
Jump to navigation Jump to search

= a place where biologists can collectively annotate an enormous database of proteins, a database culled from the best open science journals in the field.


Description

"Today sees the launch of a new collaborative website initially focusing on proteins and their role in biology and medicine. The WikiProfessional technology underlying the site has been developed based upon the collaborative Wikipedia approach. Described in BioMed Central’s open access journal Genome Biology, WikiProteins provides a method for community annotation on a huge scale.

The article is written by Barend Mons of the Erasmus Medical Center in Rotterdam, and the Leiden University Medical Center, The Netherlands, and his co-authors from Brazil, The Netherlands, Switzerland, the UK and the USA. They include Amos Bairoch of UniProt, Michael Ashburner of GO and Jimmy Wales, the founder of Wikipedia.

The source material for WikiProteins comes from a mixture of existing authoritative databases (such as the Unified Medical Language System, UniProtKB/Swiss-Prot, IntAct and GO), supplemented by concepts mined from scientific papers published in public literature databases. The automated data mining identifies ‘facts’ in these available resources, such as protein functions or protein-disease relationships. This process created over one million biomedical concept clouds – called ‘Knowlets’ – around each individual concept. The developers of the site now hope that many researchers will follow their call to annotate, via WikiProteins, the Knowlets for which they are leading experts. The method enables researchers to add data even from sources that are not openly available, such as from journals only accessible via publishers’ databases, immensely enhancing the potential for comprehensive coverage. Each page of text called up via the system is automatically indexed and concepts are connected to the WikiSpace, so that their definition comes up and the information can be edited directly from the page.

The resulting data in the Wiki is fully and freely accessible to the public, and entries can be annotated by any registered user. Mons said: “We here call on a million minds to annotate a million concepts and collect new facts from full-text literature with the immediate reward of collaborative knowledge discovery and recognition of Wiki-contributions to the scientific community.” (http://www.eurekalert.org/pub_releases/2008-05/bc-lcp052308.php)


Discussion

Paper mentioned at http://arstechnica.com/news.ars/post/20080527-first-wikiprofessional-project-wikiproteins-ready-for-beta.html

"The new paper describes a major advantage to this approach. Traditionally, biological information has been divided between two approaches: data mining, which involves parsing existing information to identify semantic content and connections within it, and curating, which involves expert, manual analysis of data. By importing information from both types of sources, WikiProteins should theoretically contain the best properties of both types of data: reliable information supplied by experts and potential connections among data that haven't previously been explored.

The paper provides a number of measures of the success of this approach. For one, the import process has identified over a million individual authors, and a similar number of concepts that connect them and the other items stored in the database. The different data sources also seem to have paid off, as the authors determined that well over half of the protein-protein interactions brought in from curated databases could not have been identified by data-mining PubMed abstracts.

In calling for biologists to get involved in the beta process, the people who generated WikiProteins have a number of roles in mind. For starters, they expect that the data mining process has generated a significant number of spurious connections, and hope that the community will help in pruning those. For example, they noted that the gene abbreviation "CLB2" mapped to at least five different genes (depending on the organism), as well as a material used in dentistry, Clearfil Liner Bond 2; manual intervention may be needed to sort these out. They're also hoping that contributors will simply dump sentences from the literature into WikiProteins in order for them to be indexed and further connections mined." (http://arstechnica.com/news.ars/post/20080527-first-wikiprofessional-project-wikiproteins-ready-for-beta.html)


More Information

A preview of the WikiProtein technology is available at: http://conceptweblinker.wikiprofessional.org/default.py?url=nph-proxy.cgi/010000A/http/genomebiology.wikiprofessional.org/monsarticle.htm