Peter Murray-Rust on Open Data in Science 2.0

From P2P Foundation
Jump to navigation Jump to search

Interview by Richard Poynder. Peter Murray-Rust is an Open Data advocate in the field of science.

Full version in pdf here at


Richard Poynder:

"Peter Murray-Rust is a committed advocate of Open Access (OA). He is, however, a disappointed one. He is disappointed not because so few researchers are willing to self-archive their scholarly papers on the Web, not because it is proving so hard to persuade funders and research institutions to introduce Open Access mandates, but because of a failing he sees within the movement itself. Out of his disappointment, however, has come a new movement: the Open Data movement.

As a Reader in molecular informatics at the University of Cambridge Murray-Rust is interested in scholarly papers less for their textual content, more for the raw data contained within them — the graphs and tables, the molecular structures, the spectral and crystallography data, the photographs of proteins, and all the other factual information that litters science papers.

As such, much of Murray-Rust's time is spent not on reading the scholarly literature, but mining it — using various software tools to automatically extract the "embedded data" contained in the tables, the charts, and the images in science papers, and capturing the "supplemental information" that invariably accompanies the papers. After aggregating all these data Murray-Rust will compare them, input them into programs, use them to create predictive models, and reuse them for a variety of different purposes.

In short, Murray-Rust is working at the frontline of what has been dubbed Science 2.0, an online interactive environment where a great deal of the information used is more likely to have been discovered, aggregated and distributed by software and machines than it is by humans; an environment where data are constantly used and reused — pumped through new tools like RSS feeds, and displayed in mashups, wikis, and the various other tools developing around Open Notebook Science.

Murray-Rust's ultimate goal is to create and exploit what he calls the chemical semantic web — a web that would assume most scientific information was unencumbered by proprietary interests, and able to be freely shared and exchanged.

In practice, however, mining the scholarly literature remains a difficult and risky activity, explains Murray-Rust — not so much because the technology is still in its infancy, but because scholarly publishers routinely appropriate the content of research papers, and then lock it up behind financial firewalls and prohibit its reuse.

Assuming that the Open Access movement was committed to removing these barriers, Murray-Rust became an OA advocate. After all, as leading OA advocate Peter Suber puts it, Open Access implies scholarly literature that is "digital, online, free of charge, and free of most copyright and licensing restrictions". That, says Murray-Rust, is what is needed to build the semantic web.

But while the definition of Open Access agreed at the launch of the 2001 Budapest Open Access Initiative (BOAI) states that any paper made Open Access must be free of copyright and licensing restrictions, Murray-Rust discovered that in most cases publishers and authors still fail to provide the necessary permissions when making papers Open Access. Where a paper is flagged as being Open Access, reuse is often prohibited. And even where there is no specific prohibition, usage conditions are frequently not specified, effectively placing the paper into licensing limbo.

In many cases, says Murray-Rust, Open Access publishers don't even articulate to themselves under what conditions they are making their papers available on the Web, let alone provide an appropriate licence. As a result, third parties cannot know what usage is permitted. And where publishers do think it through, and attach a licence, the usage conditions are in any case often non conformant with the BOAI definition.

The legal status of papers that researchers themselves self-archive on the Web, or in their institutional repositories, is equally uncertain, and sometimes reuse is expressly forbidden.

What frustrates him says Murray-Rust, is that this confusion could have been avoided — had the Open Access movement emulated the Open Source Initiative (OSI) and developed customised OA licences. And having done so, he adds, the movement (again like the OSI) could have policed the use of the term Open Access, and publicised and sanctioned publishers who fail to use the licences, or who make false claims about Open Access. It should also have better educated researchers about licensing.

Further limiting what he can do, adds Murray-Rust, traditional subscription publishers like the American Chemical Society and Wiley explicitly forbid text mining of papers they publish. At the same time these publishers insist that authors not only sign over the copyright in the paper, but also ownership of the supplemental data, despite the fact that factual data are not subject to copyright.

After failing to persuade Open Access advocates to hear his concerns, Murray-Rust began to direct his energies to what he calls the Open Data movement, for which he is now a leading advocate. While he remains an advocate for OA, he explains, he has come to believe that the issue of Open Data needs to be addressed separately. For where the Open Access movement is concerned only with ensuring that scholarly papers are human readable, the Open Data movement requires that they are also machine readable. And since Open Data implies reuse, it is vital that licences are provided that specifically permit this.

Fortunately, Science Commons stepped into the breach, and is proving a valuable ally, not least by developing the Open Data protocol and the recently-announced Public Domain Dedication & Licence (PDDL) — thereby providing the first component of the legal framework that Murray-Rust believes is needed to enable text mining, and helping in the creation of the chemical semantic web.

I had been keen to speak with Murray-Rust for some time, so I was pleased recently to be able to hook up with him on the telephone. I found his ebullient style, rapid delivery, and quick-fire mind both challenging and fascinating. Above all, the conversation offered me an interesting new perspective on Open Access, and confirmed suspicions I have long harboured that the Open Access movement would truly benefit from having an official body to represent its interests.

Murray-Rust is a vivid and rumbustious person who does not pull his punches. When I emailed the draft text of the interview to him, however, he asked that I stress the positive rather than the negative in this introduction. "Yes, I am angry, but not completely," he wrote. "I believe in the power of the bottom-up to change things and I am optimistic that we shall get change."

He also asked me to underline his appreciation for all that the Open Access movement has achieved, and requested I append this paragraph: "Although this interview highlights some of the shortcomings of Open Access movement I want to pay tribute to the many activists who have devoted and often courageously worked to make scholarly knowledge free for everyone. I'd particularly like to say something very appreciative about Peter Suber, and I'd like also to mention the Scholarly Publishing & Academic Resources Coalition (SPARC) and the Wellcome Trust — who in my opinion have probably been the largest force for change recently." (