Open Knowledge

From P2P Foundation
Jump to navigation Jump to search

A work is open if it is accessible, reproducible and re-usable without legal, social or technological restriction.

For Open Knowledge International's full definition of "open", see the Open Definition


Interesting definition of open-ness by the Open Knowledge Foundation, at We take open to have three distinct senses: legal, social and technological.

Legally Open

Knowledge is legally open if it is free of most of the standard legal restrictions and requirements. In particular it should be accessible without restriction, reproducible freely (at least for non-commercial purposes), and reusable - that is, freely incorporatable in derivative works. In short, it should fall within the bounds of one of the Creative Commons licenses.

Socially Open

Social openness consists of ensuring that a work is made available and not kept secret or mouldering on a CD at the back of the drawer. It means supporting sharing and reuse as well as collaborative working processes.

But most importantly it means an 'open source' approach to knowledge. That is, knowledge should be made available so that access is given to the raw, underlying data and not simply through a particular, usually limiting, interface (such as a human-only-usable web form).

This parallels the distinction with software programs, emphasized by the term open source, between access to the underlying source code and access simply to the compile version. Thus Open Knowledge in this sense can stand for access to the underlying 'source' rather than purely access to the 'compiled' end product. To illustrate consider the following examples.

For data in a database the 'source' form means the raw data and the 'compiled' form is any of the multitude of interfaces such as web query pages that can wrap that data. Providing access to the source data would be a major change - even open databases that are freely searchable rarely provide their data in source form - the only form in which it is any use to a computer.

Another example is provided by the common practice of providing a PDF version of a document rather than the original text file. This, perhaps intentionally, hinders access to the underlying text and inhibits activities such as annotation or indexing.

Technologically Open

Technological openness requires that knowledge is provided in a form and format that does not unnecessarily hinder access to humans or machines. This can be achieved by utilizing data formats and tools that are open - meaning that a full specification is publicly available and unencumbered by legal restraints, and that access and use of the formats will not require proprietary tools or products (for more information on 'openness' of formats see the Information Accessibility Initiative).

It also means providing the necessary documentation, structuring and presentation of data so as to ensure comprehensibility and usability. One should aim to achieve these ends not just for humans but also for computers - something that is increasingly essential in an information age.


Differences between open/free access to code, to text, to data

This is a contribution by open access publishing expert Stevan Harnad, at

"It would be a *great* conceptual and strategic mistake for the movement dedicated to open access to peer-reviewed research (BOAI) to conflate its sense of "free" vs. open" with the sense of "free vs. open" as it is used in the free/open-source software movements. The two senses are not at all the same, and importing the software-movements' distinction just adds to the still widespread confusion and misunderstanding that there is in the research community about toll-free access.

I will try to state it in the simplest and most direct terms possible: Software is code that you use to *do* things. It may not be enough to let you use the code for free to do things, because one of the things you may want to do is to modify the code so it will do *other* things. Hence you may need not only free use of the code, but the code itself has to be open, so you can see and modify it.

There is simply *no counterpart* to this in peer-reviewed research article use. None. Researchers, in using one another's articles, are using and re-using the *content* (what the articles are reporting), and not the *code* (i.e., the actually words in the text). Yes, they read the text. Yes (within limits) they may quote it. Yes, it is helpful to be able to navigate the code by character-string and boolean searching. But what researchers are fundamentally *not* doing in writing their own articles (which build on the articles they have read) is anything faintly analogous to modifying the code for the original article!

I hope that that is now transparent, having been pointed out and written in longhand like this. So if it is obvious that what researchers do with the articles they read is not to modify the text in order to generate a new text, as programmers may modify a program to generate a new program, then where on earth did this open/free source/access conflation come from?

And there is a second conflation inherent in it, namely, a conflation between research publishing (i.e., peer-reviewed journal articles) and public data-archiving (scientific and scholarly databases consisting of the raw and processed data on which the research reports are based).

Digital data archiving (e.g., the various genome databases, astrophysical databases, etc.) is relatively new, and it is a powerful *supplement* to peer-reviewed article publishing. In general, the data are not *in* the published article, they are *associated with* it. In paper days, there was not the page-quota or the money to publish all the data. And even in digital days, there is no standardized practice yet of making the raw data as public as the research findings themselves; but there is definite movement in that direction, because of its obvious power and utility.

The point, however, is this: As of today, articles and data are not the same thing. The 2,000,000 new articles appearing every year in the planet's 20,000 peer-reviewed journals (the full-text literature that -- as we cannot keep reminding ourselves often enough, apparently -- the open/free access movement is dedicated to freeing from access-tolls) consists of articles only, *not* the research data on which the articles are based.

Hence, today, the access problem concerns toll-access to the full-texts of 2,000,000 articles published yearly, not access to the data on which they are based (most of which are not yet archived online, let alone published; and, when they *are* archived online, they are often already publicly accessible toll-free!).

No doubt research practices will evolve toward making all data accessible to would-be users, along with the articles reporting the research findings. This is quite natural, and in line with researchers' desire to maximize the use and hence the impact of their research. What may happen is that journals will eventually include some or all the underlying data as part of the peer-reviewed publication itself (there may even be "peer-reviewed data"), but in an online digital supplement only, rather than in the paper edition.

(What is *dead-certain*, though, is that, as this happens, authors will not be idiotic enough to sign over copyright for their research data to their publishers, the same way they have been signing over copyright for the texts of their research reports! So let's not even waste time on that implausible hypothetical contingency. The research community may be slow off the mark in reaching for the free-access that is already within its grasp in the online era, but they have not altogether taken leave of their senses!)

But that bridge (digital data supplements), if it ever comes, can be crossed if/when we get to it. Right now, when we are talking about the peer-reviewed literature to which we are trying to free access we are talking about *articles* and not about *data*. Hence, exactly as in the conflation of text with software in the invalid and misleading open/free source analogy, the conflation of open/free full-text access to the refereed literature with hypothetical questions about data-access and data re-use and re-analysis capability is likewise invalid and misleading. Article-access and data-access are different, and it is only the first that is at issue today." (

Openness and Attention

This is from an article by Michael Goldhaber in First Monday, which discusses openness in the context of an Attention Economy, from the point of view of the interests of an individual, at

"Varieties of openness to strive for

0) Dissemination of your thoughts and other expression as widely as possible. This is the basic openness the Internet so well permits.

1) Dissemination with some possibility of audience feedback.

2) Open access — having one’s thoughts, expressions, etc., as available and accessible as possible, with as few barriers as possible. Obviously, charging money for access is one such highly limiting barrier. If such a barrier were easily enforceable, it would be even worse in limiting the attention one can get. (At times, though, it works to employ the tactic of temporarily concealing something, so as to create wide suspense and — hopefully — surprise and éclat when one finally lowers the veil. However, through overuse, the stratagem risks easy staleness — whereupon it can turn away attention as successfully as any other barrier.)

3) Self–revealing — the more aspects of oneself that express who one is, the more opportunities exist for people to align their minds to one, and the richer the accumulation of attention one may get. (Let me add it possibly may not pay a software programmer to also put sex videos on the Internet, especially if it turns out the programmers are especially squeamish about such videos or even worse, totally oblivious. It may also be too personally painful, of course, to contemplate doing that.)

4) Claiming priority by putting out one’s thoughts in their most preliminary forms. Since “waiting for" is one form that attention certainly takes, early hints can be successful and tantalizing, as long as they are not simply tricky come–ons as described above. More importantly, by putting out ideas as soon as one has them, one increases the chances that further development in the field will be understood according to one’s own thoughts, and the chances that minds aligned to yours. Even if someone else, shortly later, goes faster or further than you have or can, if your thoughts gain any notice, you remain in the important role of founder and mover.

5) One step beyond simply putting out an idea is defining a project, and possibly making you the prime anchor point for its further articulation and development. You might list what you see as the next steps, and then choose the best versions offered, etc.

6) Encouraging an entourage to form around you, or becoming openly part of one if you are not a star.

7) Building semi–independent fan–bases for acknowledged or new stars of any sort, and revealing oneself as a fan. (See, e.g., MySpace in this regard.)

8) If you are a star, offering ways for your fans to commune with each other, and to some degree with you.

9) Striving at least to create more or less purely equal communing communities" (

More Information

The Open Knowledge tag at Delicious, maintained by Samuel Rose, at