Too Big to Know

From P2P Foundation
Jump to navigation Jump to search

* Book: Too Big to Know. David Weinberger.


how the Internet is changing knowledge, and, in turn, how it is changing us.


Conducted by Rebecca J. Rosen:

"In your book, you argue that we are in a new age of "networked knowledge," meaning that knowledge -- ideas, information, wisdom even -- has broken out of its physical confines (the pages of a book or the mind of a person) and now exists in a hyperconnected online state. You say that this new structure "feels more natural because the old ideals of knowledge were never realistic." In what ways does it feel more natural? What were these old ideals of knowledge and in what ways were they unnatural?

We've known for a long time that there was more going on in the world than our libraries could contain or our media could show us. We've known that experts are not as reliable as they often were made out to be. We've known that world is less ready and able to come to rational agreement than we'd been promised. We've known much of our codified knowledge is less than perfectly unreliable. We've known that the topical domains into which we divide knowledge so we can master them are not nearly as separate as their shelves in the library indicate. We've known that we're little creatures in a universe vast beyond our ability to exaggerate.

Yet the short version of the history of knowledge goes something like: Plato defines knowledge as justified true belief. We then gradually increase the criteria of justification until knowledge has to pass a very high bar indeed. Knowledge comes to be that which we can know with certainty, what is settled and beyond reasonable dispute. Yet, there is one basic fact about us human beings: We are profoundly fallible. We've known since the dawn of civilization that we basically get everything wrong and then die. The demand for certainty and clarity placed on creatures who recognize their own uncertainty is, in some sense, unnatural.

I think the Net generation is beginning to see knowledge in a way that is closer to the truth about knowledge -- a truth we've long known but couldn't instantiate. My generation, and the many generations before mine, have thought about knowledge as being the collected set of trusted content, typically expressed in libraries full of books. Our tradition has taken the trans-generational project of building this Library of Knowledge book by book as our God-given task as humans. Yet, for the coming generation, knowing looks less like capturing truths in books than engaging in never-settled networks of discussion and argument. That social activity -- collaborative and contentious, often at the same time -- is a more accurate reflection of our condition as imperfect social creatures trying to understand a world that is too big and too complex for even the biggest-headed expert.

This new topology of knowledge reflects the topology of the Net. The Net (and especially the Web) is constructed quite literally out of links, each of which expresses some human interest. If I link to a site, it's because I think it matters in some way, and I want it to matter that way to you. The result is a World Wide Web with billions of pages and probably trillions of links that is a direct reflection of what matters to us humans, for better or worse. The knowledge networks that live in this new ecosystem share in that property; they are built out of, and reflect, human interest. Like our collective interests, the Web and the knowledge that resides there is at odds and linked in conversation. That's why the Internet, for all its weirdness, feels so familiar and comfortable to so many of us. And that's the sense in which I think networked knowledge is more "natural."

One of the central metaphors in your book is that the smartest person in the room is no longer a person but the room itself. But, you caution, this also means that if the room -- the network -- is stupid, we ourselves will be made more stupid. You write that "our task is to learn how to build smart rooms." What features would a smart room have? What are features of the network as it is now that you find worrisome?

I'll start with the negative.

The big worry is that when we're given lots of choices of what to read (or view, etc.), we'll tend to read that with which we already agree. This further confirms our current beliefs, and perhaps results in our moving to further extremes. This is called the "echo chamber" argument, and it is most famously associated with Cass Sunstein, a Harvard law professor currently in the Obama White House.

Assessing the danger posed by echo chambers is very difficult. All I'll say here is that we should assume it's a real danger, and we should work against it as parents, teachers, citizens, and participants on the Web.

But, I think we should not let the very real dangers posed by echo chambers blind us to the degree to which we need sameness in order just to have a conversation that advances our thinking. The speakers need to share a language, have a deep set of assumptions and norms in common, have the same goal for the conversation -- are you passing time, trying to make a friend, trying to make a deal, etc. -- and have a topic that they're both interested in. While too much sameness can lead to an echo chamber, a conversation cannot happen without a Costco-size shopping cart of samenesses.

So, to make a smart room -- a knowledge network -- you have to have just enough diversity. And it has to be the right type of diversity. Scott Page in The Difference says that a group needs a diversity of perspectives and skill sets if it is going to be smarter than the smartest person in it. It also clearly needs a set of coping skills, norms, and procedures that enable it to deal with diversity productively. For example, let's say you're on a mailing list that's talking about how to bake the perfect cheesecake, and someone enters who wants to talk about how cheesecake will clog your arteries, how it diverts precious resources from those in need, and how it relies upon agricultural techniques that are killing the planet. Those are three reasonable objections to making cheesecake, and your list may want to pursue them. But it may not. It may want to stick with figuring out how to make tastier cheesecakes. It will therefore need some norms that say how off-topic a thread can become and what happens to offenders. It may also adopt a forking technique that is very helpful online: those who want to talk about the morality of cheesecake have plenty of space on the Net where they can have that discussion while the cheesecake recipe thread continues. Many such environments benefit from having moderators. Many use some form of peer filtering to vote comments up, down, or away. Whatever the techniques, if a knowledge network is to be smarter than its members, it needs to incorporate enough diversity and the right types of diversity, and it needs ways to deal sensitively when that diversity threatens to disrupt it.

The really hard part is that there is no good way for a knowledge network to be sure that a disruption isn't a breakthrough. That is an eternal human predicament.

You write that objectivity, as a goal or even a possibility, has "fallen out of favor in our culture," so much so "that in 1996 the Society of Professional Journalists' Code of Ethics dropped it as an official value." You go on to say that our disillusionment with objectivity began long before the arrival of the Internet and, with it, networked knowledge. What have been the effects of the network on how we think about objectivity and facts? What values are taking its place?

Our distrust of objectivity predates the Internet. Indeed, much of what's happening to knowledge was prefigured by the postmodernists (just about none of whom accept that label). But it makes a huge difference that now the dominant (or soon to be dominant) medium is free of the old limitations. And the postmodernists could not have predicted the linked and open nature of the Net.

At its most stringent, objectivity is, as the press critic Jay Rosen calls it, borrowing a phrase from the philosopher Thomas Nagel, "the view from nowhere." Seeing the world as if a person with no point of view were looking at it is a weird and unnatural idea. We humans can only see things from a point of view, and we can only understand things by appropriating them into our already-existing context. (I am told there's the possibility through Eastern disciplines of seeing things without starting from a situated self, but I don't know enough about that to have an opinion.)

In fact, the idea of objectivity arose in response to the limitations of paper, as did so much of our traditional Western idea of knowledge. Paper is a disconnected medium. So, when you write a news story, you have to encapsulate something quite complex in just a relatively small rectangle of print. You know that the reader has no easy way to check what you're saying, or to explore further on her own; to do so, she'll have to put down the paper, go to a local library, and start combing through texts that are less current than the newspaper in which your article appears. The reporter was the one mediator of the world the reader would encounter, so the report had to avoid the mediator's point of view and try to reflect all sides of contentious issues. Objectivity arose to address the disconnected nature of paper.

Our new medium is, of course, wildly connective. Now we can explore beyond the news rectangle just by clicking. There is no longer an imperative to squeeze the world into small, self-contained boxes. Hyperlinks remove the limitations that objectivity was invented to address.

Hyperlinks also enable readers to understand -- and thus perhaps discount -- the writer's point of view, which is often a better way of getting past the writer's prejudices than asking the writer to write as if she or he had none. This, of course, inverts the old model that assumed that if we knew about the journalist's personal opinions, her or his work would be less credible. Now we often think that the work becomes more credible if the author is straightforward about his or her standpoint. That's the sense in which transparency is the new objectivity.

There is still value in trying to recognize how one's own standpoint and assumptions distort one's vision of the world; emotional and conceptual empathy are of continuing importance because they are how we embody the truth that we share a world with others to home that world matters differently. But we are coming to accept that we can't really get a view from nowhere, and if we could, we would have no idea what we're looking at." (


On the impact of Big Data

David Weinberger:

"In this excerpt from my new book, Too Big To Know, we'll look at a key property of the networking of knowledge: hugeness.

In 1963, Bernard K. Forscher of the Mayo Clinic complained in a now famous letter printed in the prestigious journal Science that scientists were generating too many facts. Titled Chaos in the Brickyard, the letter warned that the new generation of scientists was too busy churning out bricks -- facts -- without regard to how they go together. Brickmaking, Forscher feared, had become an end in itself. "And so it happened that the land became flooded with bricks. ... It became difficult to find the proper bricks for a task because one had to hunt among so many. ... It became difficult to complete a useful edifice because, as soon as the foundations were discernible, they were buried under an avalanche of random bricks."

If science looked like a chaotic brickyard in 1963, Dr. Forscher would have sat down and wailed if he were shown the Global Biodiversity Information Facility at Over the past few years, GBIF has collected thousands of collections of fact-bricks about the distribution of life over our planet, from the bacteria collection of the Polish National Institute of Public Health to the Weddell Seal Census of the Vestfold Hills of Antarctica. is designed to be just the sort of brickyard Dr. Forscher deplored -- information presented without hypothesis, theory, or edifice -- except far larger because the good doctor could not have foreseen the networking of brickyards.

Indeed, networked fact-based brickyards are a growth industry. For example, at you'll find information about the proteins specific to various organisms. An independent project by a grad student, Proteome Commons makes available almost 13 million data files, for a total of 12.6 terabytes of information. The data come from scientists from around the world, and are made available to everyone, for free. The Sloan Digital Sky Survey -- under the modest tag line Mapping the Universe -- has been gathering and releasing maps of the skies gathered from 25 institutions around the world. Its initial survey, completed in 2008 after eight years of work, published information about 230 million celestial objects, including 930,000 galaxies; each galaxy contains millions of stars, so this brickyard may grow to a size where we have trouble naming the number. The best known of the new data brickyards, the Human Genome Project, in 2001 completed mapping the entire genetic blueprint of the human species; it has been surpassed in terms of quantity by the International Nucleotide Sequence Database Collaboration, which as of May 2009 had gathered 250 billion pieces of genetic data.

There are three basic reasons scientific data has increased to the point that the brickyard metaphor now looks 19th century. First, the economics of deletion have changed. We used to throw out most of the photos we took with our pathetic old film cameras because, even though they were far more expensive to create than today's digital images, photo albums were expensive, took up space, and required us to invest considerable time in deciding which photos would make the cut. Now, it's often less expensive to store them all on our hard drive (or at some website) than it is to weed through them.

Second, the economics of sharing have changed. The Library of Congress has tens of millions of items in storage because physics makes it hard to display and preserve, much less to share, physical objects. The Internet makes it far easier to share what's in our digital basements. When the datasets are so large that they become unwieldy even for the Internet, innovators are spurred to invent new forms of sharing. For example, Tranche, the system behind ProteomeCommons, created its own technical protocol for sharing terabytes of data over the Net, so that a single source isn't responsible for pumping out all the information; the process of sharing is itself shared across the network. And the new Linked Data format makes it easier than ever to package data into small chunks that can be found and reused. The ability to access and share over the Net further enhances the new economics of deletion; data that otherwise would not have been worth storing have new potential value because people can find and share them.

Third, computers have become exponentially smarter. John Wilbanks, vice president for Science at Creative Commons (formerly called Science Commons), notes that "[i]t used to take a year to map a gene. Now you can do thirty thousand on your desktop computer in a day. A $2,000 machine -- a microarray -- now lets you look at the human genome reacting over time." Within days of the first human being diagnosed with the H1N1 swine flu virus, the H1 sequence of 1,699 bases had been analyzed and submitted to a global repository. The processing power available even on desktops adds yet more potential value to the data being stored and shared.

The brickyard has grown to galactic size, but the news gets even worse for Dr. Forscher. It's not simply that there are too many brickfacts and not enough edifice-theories. Rather, the creation of data galaxies has led us to science that sometimes is too rich and complex for reduction into theories. As science has gotten too big to know, we've adopted different ideas about what it means to know at all.

For example, the biological system of an organism is complex beyond imagining. Even the simplest element of life, a cell, is itself a system. A new science called systems biology studies the ways in which external stimuli send signals across the cell membrane. Some stimuli provoke relatively simple responses, but others cause cascades of reactions. These signals cannot be understood in isolation from one another. The overall picture of interactions even of a single cell is more than a human being made out of those cells can understand. In 2002, when Hiroaki Kitano wrote a cover story on systems biology for Science magazine -- a formal recognition of the growing importance of this young field -- he said: "The major reason it is gaining renewed interest today is that progress in molecular biology ... enables us to collect comprehensive datasets on system performance and gain information on the underlying molecules." Of course, the only reason we're able to collect comprehensive datasets is that computers have gotten so big and powerful. Systems biology simply was not possible in the Age of Books.

The result of having access to all this data is a new science that is able to study not just "the characteristics of isolated parts of a cell or organism" (to quote Kitano) but properties that don't show up at the parts level. For example, one of the most remarkable characteristics of living organisms is that we're robust -- our bodies bounce back time and time again, until, of course, they don't. Robustness is a property of a system, not of its individual elements, some of which may be nonrobust and, like ants protecting their queen, may "sacrifice themselves" so that the system overall can survive. In fact, life itself is a property of a system.

The problem -- or at least the change -- is that we humans cannot understand systems even as complex as that of a simple cell. It's not that were awaiting some elegant theory that will snap all the details into place. The theory is well established already: Cellular systems consist of a set of detailed interactions that can be thought of as signals and responses. But those interactions surpass in quantity and complexity the human brains ability to comprehend them. The science of such systems requires computers to store all the details and to see how they interact. Systems biologists build computer models that replicate in software what happens when the millions of pieces interact. It's a bit like predicting the weather, but with far more dependency on particular events and fewer general principles.

Models this complex -- whether of cellular biology, the weather, the economy, even highway traffic -- often fail us, because the world is more complex than our models can capture. But sometimes they can predict accurately how the system will behave. At their most complex these are sciences of emergence and complexity, studying properties of systems that cannot be seen by looking only at the parts, and cannot be well predicted except by looking at what happens.

This marks quite a turn in science's path. For Sir Francis Bacon 400 years ago, for Darwin 150 years ago, for Bernard Forscher 50 years ago, the aim of science was to construct theories that are both supported by and explain the facts. Facts are about particular things, whereas knowledge (it was thought) should be of universals. Every advance of knowledge of universals brought us closer to fulfilling the destiny our Creator set for us.

This strategy also had a practical side, of course. There are many fewer universals than particulars, and you can often figure out the particulars if you know the universals: If you know the universal theorems that explain the orbits of planets, you can figure out where Mars will be in the sky on any particular day on Earth. Aiming at universals is a simplifying tactic within our broader traditional strategy for dealing with a world that is too big to know by reducing knowledge to what our brains and our technology enable us to deal with.

We therefore stared at tables of numbers until their simple patterns became obvious to us. Johannes Kepler examined the star charts carefully constructed by his boss, Tycho Brahe, until he realized in 1605 that if the planets orbit the Sun in ellipses rather than perfect circles, it all makes simple sense. Three hundred fifty years later, James Watson and Francis Crick stared at x-rays of DNA until they realized that if the molecule were a double helix, the data about the distances among its atoms made simple sense. With these discoveries, the data went from being confoundingly random to revealing an order that we understand: Oh, the orbits are elliptical! Oh, the molecule is a double helix!

With the new database-based science, there is often no moment when the complex becomes simple enough for us to understand it. The model does not reduce to an equation that lets us then throw away the model. You have to run the simulation to see what emerges. For example, a computer model of the movement of people within a confined space who are fleeing from a threat--they are in a panic--shows that putting a column about one meter in front of an exit door, slightly to either side, actually increases the flow of people out the door. Why? There may be a theory or it may simply be an emergent property. We can climb the ladder of complexity from party games to humans with the single intent of getting outside of a burning building, to phenomena with many more people with much more diverse and changing motivations, such as markets. We can model these and perhaps know how they work without understanding them. They are so complex that only our artificial brains can manage the amount of data and the number of interactions involved.

The same holds true for models of purely physical interactions, whether they're of cells, weather patterns, or dust motes. For example, Hod Lipson and Michael Schmidt at Cornell University designed the Eureqa computer program to find equations that make sense of large quantities of data that have stumped mere humans, including cellular signaling and the effect of cocaine on white blood cells. Eureqa looks for possible equations that explain the relation of some likely pieces of data, and then tweaks and tests those equations to see if the results more accurately fit the data. It keeps iterating until it has an equation that works.

Dr. Gurol Suel at the University of Texas Southwestern Medical Center used Eureqa to try to figure out what causes fluctuations among all of the thousands of different elements of a single bacterium. After chewing over the brickyard of data that Suel had given it, Eureqa came out with two equations that expressed constants within the cell. Suel had his answer. He just doesn't understand it and doesn't think any person could. It's a bit as if Einstein dreamed e = mc2, and we confirmed that it worked, but no one could figure out what the c stands for.

No one says that having an answer that humans cannot understand is very satisfying. We want Eureka and not just Eureqa. In some instances well undoubtedly come to understand the oracular equations our software produces. On the other hand, one of the scientists using Eureqa, biophysicist John Wikswo, told a reporter for Wired: "Biology is complicated beyond belief, too complicated for people to comprehend the solutions to its complexity. And the solution to this problem is the Eureqa project." The world's complexity may simply outrun our brains capacity to understand it.

Model-based knowing has many well-documented difficulties, especially when we are attempting to predict real-world events subject to the vagaries of history; a Cretaceous-era model of that eras ecology would not have included the arrival of a giant asteroid in its data, and no one expects a black swan. Nevertheless, models can have the predictive power demanded of scientific hypotheses. We have a new form of knowing.

This new knowledge requires not just giant computers but a network to connect them, to feed them, and to make their work accessible. It exists at the network level, not in the heads of individual human beings." (