Everything is Miscellaneous
Book: David Weinberger Everything is Miscellaneous: The Power of The New Digital Disorder. Harvard University Press, 2007 (paperback:Times Books)
“To get as good at browsing as we are at finding — and to take full advantage of the digital opportunity — we have to get rid of the idea that there’s a best way of organizing the world.” (http://www.radioopensource.org/weinbergers-miscellany/)
The following is an edited transcript of David Weinberger’s new book talk at Harvard.
By Colin Rhinesmith at http://colinrhinesmith.com
For the full audio version including Q&A please visit http://tinyurl.com/yulfzd
The book is called Everything is Miscellaneous. The overall idea is we are a really well organized species. We like to organize things really neatly. But whether it's a kitchen or a library we almost always have stuff that doesn't fit. So we create a category called miscellaneous. And if it gets too big than your organization failed. What the book suggests is that as we digitize everything the miscellaneous category is going to eat the entire chart and that's a good thing. It's good for business, it's good for science, it's good for education, it's good for politics . . . In general that's a good thing, although it's quite counterintuitive.
When one person gets to organize his or her way of thinking, one person, one way, it's incredibly limited. It seems like the very nature and purpose of reality is to keep things apart. We’ve been organizing stuff for thousands of years by obeying the two basic principles of reality. Which is you cannot have two things in the same spot at the same time no matter how hard you try.
The second rule is that everything has to be in a place. In every domain, whether it's in the commercial realm, whether it's in education, if it's in science in how we organize species and or physical objects in a museum, everything has to go somewhere. So two basic principles are baked into reality and have some political consequences. It results instantly in authority coming forward because someone has to decide what’s going to make it on the front page and what the order will be.
Generally we make good decisions about this but nevertheless it's a single group's decision (for example, an editorial board that's supposed to reflect our best interests). That's true whether it's anything physical. If it's a newspaper, if it's an encyclopedia that tries to fit all of our knowledge into 32 volumes and 65,000 topics as in the Britannica where some set of people decides what's important and how to organize it in a single way.
Nature says that there is a single order. We have had this assumption for a long time and we still believe this with some depth and fervor. We do this because we want there to be order. We categorize. But it turns out that categorizing is just bringing things together that are alike, putting them next to each other whether physically or mentally. But we get to choose the things that make them alike. This works because the universe consists of things that tend to cluster.
We like to know where things are in their relation to everything else. But this way of organization is limited by the physical. When we put away our clean laundry we make piles, we lump and we split. As a result if you map this, you end up with a tree. You start with a big lump of laundry but in the end, by making binary decisions, we are creating a tree. And the trees that we create (these guides) have been the pinnacle for how the world is ordered. And how we organize our ideas is constrained by the same ideas that we had by sorting our laundry. That's a constraint that we no longer need.
Now we're digitizing everything and that changes everything. So it's useful to think about three orders of order:
1. You organize the physical things themselves. You put them on shelves, put them in folders. And you come up with some order of doing things. The Dewey decimal system is, for example, one very good way to organize books.
2. You separate the metadata about the things and you arrange it separately, which has tremendous advantages. By making a file card you greatly reduce the information so it fits on a 3x5 card. Because the cards are so small (which is a physical limitation) you can organize them in maybe 3 different ways (for example: author, subject, topic). This is much more convenient for finding things.
3. Everything is digital and online. The content and the information about that content. And that changes the basic principles that we've had for organizing physical things for thousands and thousands of years. The assumption has been in the physical world that a leaf can only go on one branch. Online, if you have a digital store, you are going to put a camera into as many categories as possible so that people will find it. Amazon is a master of this. You can go to an Amazon page and just count the different ways that they've organized it.
Physically you want a nice neat arrangement because otherwise you have entropy going on, wasted effort, you don't know where to find things. Online you want as much messiness as possible. That's because you can organize on top of the mess. You’re organizing the metadata. You don't have to actually touch the stuff itself. So if you have a web post that's got so many links that you can't even follow them. That's a huge success. It's enriched by all of this messiness.
In the real world we're really used to thinking about the content and all the information about it. When it's online that difference disappears. When everything is online you can say, “I know the first line of a book, but I don't know who wrote it”. There is no difference between data and metadata anymore when everything is online. The only difference is that metadata is the thing you know and data is the thing that you don't know, but you're trying to find out. This is important because we use metadata as a lever to pry up what we don't know based upon what we do. And if everything is now a lever then we are way more smarter than we were before everything went online. We have so many more ways of finding what we don't know based upon the little that we do.
It used to be that the people who owned the stuff also owned the organization of it. Now that's not true anymore. The people who own the stuff don't own the organization. We own the organization. Which means that now we are making up ways that we can sort through all this stuff and find it.
So another way that we are trying to pull ourselves together once we lose the classification and organization, that is given to us by single individuals in authority, is by tagging (for example, del.icio.us). But because we are an insanely social species we'll also most likely notice that some of the people are finding really interesting stuff. So tags are intensely practical. They are being taken up by corporations to share the stuff that people are finding. But there is something more going on, as well.
There is a lot of joy in tagging because, in part, it is a way of sticking it to the man. It's a way of saying, “We will classify. We are in charge of what's interesting”. So, doesn't this create chaos?
Well, it turns out when you have enough tags (for example, Flickr) there is so much data there, just in the tag set itself, that they are able to cluster photos without knowing anything about the photos except the tags they are using. And it's remarkably precise. So when you have enough tags (despite what common sense would say) you don't necessarily end up with chaos. You may end up with actually more meaning and quite precise meaning.
In the old way of classifying there was value in winnowing. In the digital world you want to include everything because we have alternative ways of sorting through it. The thing to you that looks like trash, in five years there's going to be a graduate student who will be studying it. So include everything. And instead of structuring everything ahead of time into neat categories postpone that moment until the user needs it. Because we'll sort through it based on our interest at the moment. Categorization always reflects interests. Our interests change. We can now have that dynamically presented.
Give us the tools and we will sort through things the way we want. The more, the better. This is radically different than the job that our knowledge workers and editors have had for thousands of years. The people who create the almanac, they want to get as much as will fit in within a thousand pages. But their value is within keeping stuff out.
3 types of implications for all of this:
1. We are in a process of making the world more complex after having to keep it simple in order to organize it. We don't have to keep it simple anymore. And it's an enormous relief not to have to keep it simple anymore. Complexity makes us smarter.
2. The world's greatest expert doesn't matter because he refuses to engage in a public negotiation of knowledge (for example, Wikipedia). Which is what happens when the authority vanishes and we are only left with each other and we engage with one another. This is how we get to the best truth we can manage. This is through the public negotiation of knowledge.
3. Something really important is going on. Human beings seem to advance by externalizing functions of consciousness. What we're doing now maybe is externalizing meaning. (In a Heidegger sense) The connection of things enriches them and lets them have the context in which they are what they are. They are there for the next generations to make sense of to see if there are connections between two things that are tagged the same way.
The semantic web is adding meaning to this collection of chaotic pieces that we have. Every link we make adds semantics, adds meaning to things in piles that we are able to mine and make sense of. And the amazing thing is, it's all ours. This is not done by someone else no matter how wise or smart they are. They can do this too. They can add into this and it becomes ours. It becomes our way of understanding the world. We've never had that ever before and now we do.
The Four Charasteristics of Traditional Knowledge Organization
David Weinberger in his book Everything is Miscellaneous
"NEW PROPERTIES, NEW STRATEGIES, A NEW SHAPE OF KNOWLEDGE
(From Chapter 5, pp. 100-106. The first four chapters have tried to convince the reader that how we order and classify our world has a history, is always the result of our culture and interests, confers power on those who get to do the classifying, and is complex and messy. I've also introduced the idea that there are three "orders of order": (1) Organizing the things themselves (books, photos...not Dinge an sich!), (2) physically separating the metadata and organizing them (e.g., catalog cards), and (3) digitizing both the content and the metadata. The third order requires us to invent new principles of organization.)
College students' silverware drawers, Delicious, Flickr, the BBC and Wikipedia are miscellaneous in different ways, except for one thing: How their content is actually arranged does not determine how that content can and will be arranged by their users. In some cases - Wikipedia, for example - no one even knows exactly where the raw contents are. These examples are miscellaneous _because_ users don't need to know the inner organization, _because_ that inner order doesn't result in a preferred order of use, and _because_ users have wide flexibility to order the pieces as they want, even and especially in unanticipated ways. This means that the miscellaneous enables _all_ of the information contained in the set to be discovered over time.
But this also means the miscellaneous doesn't much resemble our traditional view of knowledge. Knowledge, we've thought, has four characteristics, two of them modeled on properties of reality and two on properties of political regimes.
As we've seen, the first characteristic of traditional knowledge is that just as there is one reality, there is one knowledge, the same for all. If two people have contradictory ideas about something factual, we think they can't both be right. This is because we've assumed knowledge is an accurate representation of reality, and the real world cannot be self-contradictory. We treat ideas that dispute this view of knowledge with disdain. We label them "relativism" and imagine them to be the devil's work, we sneer at them as "postmodern" and assume that it's just a bunch of French pseudo-intellectual gibberish, or we say "whatever" as a license to stop thinking.
Second, we've assumed that just as reality is not ambiguous, neither is knowledge. If something isn't clear to us, then we haven't understood it. We may not be 100% certain whether the Nile or the Amazon is the longest river, we but we're confident one is. Conversely, if there's no possibility of certainty - "Which tastes better, beets or radishes?" - we say it isn't a matter of knowledge at all.
Third, because knowledge is as big as reality, no one person can comprehend it. So we need people who will act as filters, based on education, experience and clear thinking. We call them experts and we give them clipboards. They keep bad information away from us and provide us with the very best information.
Fourth, experts achieve their position by working their way up through social institutions. The people in these institutions are doing their best to be honest and helpful, but, until humans achieve divinity, our organizations will inevitably be subject to corrupting influences. Which groups get funded can determine what a society believes, and funding is often granted by people who know less than the experts: The fate of a DNA research center may rest with Congresspeople who couldn't tell a ribosome from a trombone.
The way we've organized knowledge has been largely determined by these four properties of knowledge. We've tried try to settle on a single, comprehensive framework for knowledge, with categories so clear and comprehensive that experts can put each thing in its proper place. Institutions grew to maintain the knowledge framework. Their ability to certify experts and to vouch for knowledge made them powerful and sometimes rich. So, when the miscellaneous shakes our certainty in the nature of knowledge, more than the future of the card catalog is at stake. Because a third order miscellany is digital, not physical, we no longer have to agree on a single framework. Things have their _places_, not a single place. We get to create our own categories, ones that suit our way of thinking. Experts can be helpful, but in the age of the miscellaneous they and their institutions are no longer in charge of our ideas.
Changes in the "Third Order" Digital Era
David Weinberger, continues his explanation, focusing on the present changes:
These are big changes, but perhaps the most urgent one is this: Over the course of the millennia, we've developed sophisticated methods and processes for developing, communicating and preserving knowledge. We have major institutions - serious contributors to our culture and our economy - devoted to those tasks. We're good at it. Now we have to invent new ways appropriate to the new shape of knowledge. We are doing so at a pace unparalleled in our history.
Three new strategic principles are emerging, severing the ties between the way we organize physical objects and ideas.
FILTER ON THE WAY OUT, NOT ON THE WAY IN. A friend of mine who worked at the Harvard Business Review tells amusing stories about the "slush pile," the unsolicited manuscripts that arrive every day. Harvard Business Review is a sober journal of research and ideas, yet people submit poetry, short stories, and arty photographs. My friend's job was to go through the slush pile to see what, if anything, was worth passing along for serious consideration. She was a gatekeeper, a filterer, a job that makes sense when the economics and physics of paper force us to make decisions about what knowledge we will publish and thus preserve. We rely on experts such my friend to spare us from having to wade through the slush pile on our own.
But, when anyone can publish at the press of a button, the social role of gatekeepers changes. For example, from the outside, the "blogosphere" looks like a self-indulgent pool of slush that wouldn't get past the usual publishing filters. While the economics of publishing ensure that most blogs indeed wouldn't be let through the gates, the aggregate value of all the blogs in the "long tail" (to use the term Chris Anderson made popular in his book of that name) - each perhaps of interest only to a few people - is incalculable. This is an inversion of the old model. In a world of parsimonious access to paper, filters increase the value of what's available by excluding the slush. But in the third order, where there's an abundance of access to an abundance of resources, filtering on the way in _decreases_ the value of that abundance by ruling out items that might be of great value to a few people. Filtering on the way out, on the other hand, increases the value of the abundance by locating what's of value to a particular person at a particular moment. For example, a young physics professor at McGill University, Bob Rutledge, started an electronic bulletin board that posts new findings for any research as soon as it can be summarized. Rutledge doesn't apply criteria to decide for the reader whether the research is important enough to be included (though only active, professional astronomers can register to post to the site). It's up to each reader to be the filterer. Similarly, the Public Library of Science's biology journal, a peer-reviewed but free online resource, started PLoS One in November 2006. "The idea is to take the editorializing out of the peer review process," says Hemai Parthasarathy, the managing editor. So long as a paper is "sound," it will be published. If it's good science, _someone_ may find it useful. So long as the user has good tools for finding what she needs - and this is a task many are working on - filtering on the way out vastly increases our shared potential for knowledge.
PUT EACH LEAF ON AS MANY BRANCHES AS POSSIBLE. In the real world, a leaf can only hang from one branch. In the first order of organization, there's no way around that limitation. In the second order, most cataloging systems have provisions for listing books under more than one heading, but the physicality of the second order still usually demands that one branch be picked as the primary one and there is a limit on the number of secondary listings.
In the third order, however, it's to our advantage to hang information from as many branches as possible. If you get a new Casio digital camera to sell in your online store, you'll want to list it under as many categories as you can think of, including cameras, travel gear, Casio products, graduation gifts, new items, sale items, and perhaps even sports equipment. Hanging a leaf on multiple branches makes it more findable by customers. Unlike in the second order, this doesn't make your e-store disorganized or messy. It makes it more usable‚Ä¶and more profitable.
EVERYTHING IS METADATA AND EVERYTHING CAN BE A LABEL. In a store, it's easy to tell the labels from the goods they label, and in a library the books and their metadata are kept in separate rooms. But it's not so clear online. If you can't remember the name of one of Shakespeare's plays, go to the search box at Google Book, type "Shakespeare tragedy," and you'll see a list of all of them. Click on, say, _King Lear_ and you can read the full text, including the famous line, "How sharper than a serpent's tooth it is to have a thankless child!" Now suppose you want to know where the quotation "How sharper than a serpent's tooth" comes from. Type the phrase into the search box and Google will list _King Lear_. Simple, but in the first case you used Shakespeare's name as metadata to find the contents of a book and in the second you used some of the contents of the book as metadata to find the author and title. In the miscellaneous order, the only distinction between metadata and data is that metadata is what you already know and data is what you're trying to find out.
In the first two orders of order, we've had to think carefully about which metadata we'll capture because the physical world limits the amount of metadata we can make available: A book's catalog card has to hold far less information than does the book itself. In the third order, not only can every word in a book count as metadata, so can any of the sources that link to the book. if we want to help our customers or users find information, we'll try to make as much of usable as metadata as we can.
This not only makes sites easier to use, it vastly increases the leverage of knowledge. Think of what we can do with just the few words that fit on a second-order card or label. Now that everything in the connected world can serve as metadata, knowledge is empowered beyond fathoming. We not only can find what we need based on whatever slight traces we have in our hand, we can see connections that would have escaped notice in the first two orders. The power of the miscellaneous comes directly from the fact that in the third order, everything is connected and therefore everything is metadata.
GIVE UP CONTROL. Build a tree and you surface information that might otherwise be hidden, just as Lamarck exposed information left hidden in Linnaeus' miscellaneous category of worms. But, a big pile of miscellaneous information contains relationships beyond reckoning. No one person or group is going to be able to organize it in all the useful ways, hanging all the leaves on all the branches where they might be hung. For example, iTunes shows users a branch that pulls together albums by a particular artist, but the millions of playlists that users have made there find relationships that the organizers of iTunes could not possibly have foreseen, from techno versions of children's songs to tracks played at someone's third wedding. iTunes simply cannot predict what people are going to be interested in, what a song is going to mean to them, and what connections they're going to see. Some of the combinations will be of passing value only to one person, but other people may find their world changed by how a stranger has pulled together a set of songs to express a mood, an outlook, or an idea.
That's why it's so powerful to let users mix it up for themselves. Go into a real world clothing store and try pulling everything in your size off the racks and into a shopping cart so ou can go through it in an orderly fashion. After all, that's the rational way to proceed. Everything that's not your size is just noise, a distraction. Yet, within ninety seconds you'll be thrown out of the store and firmly asked not to return. On line, on the other hand, we just naturally expect to organize digital information our way, through tags, bookmarks, playlists, and weblogs. And then we add to the information a site provides us by disagreeing with it in our own reviews. Users are now in charge of the organization of the information they browse. Of course, the owners of that information may still want to offer a prebuilt categorization, but that is no longer the only - or best - one available. Put simply, the owners of information no longer own the organization of that information.
Control has already changed hands. The new rules of the information jungle are in effect, transforming the landscape in which we work, buy, learn, vote and play."
By  Ryan Shaw:
"I think Weinberger identifies some important issues: specifically that digitalization frees information architecture from the constraints of physical location, and that multiple organizational schemes can overlap and possibly complement one another. Insofar as he brings these insights to a popular audience, his book is good.
Unfortunately, his book also typifies uncritical cheerleading of technological trends. Instead of closely examining the rhetoric surrounding "bottom-up" organization to help people decide whether, e.g. Amazon is really more "open" or "free" than your local library, he just echoes that rhetoric. As a result the book comes across like a religious tract. And in many cases he is clearly uninformed about how the systems he discusses actually work (as Prelinger notes in the thread on iDC).
Here's some specific examples of the problems I have with Everything is Miscellaneous (quotes are taken from Weinberger's recent presentation at Harvard):
"...it turns out when you have enough tags (for example, Flickr) there is so much data there, just in the tag set itself, that they are able to cluster photos without knowing anything about the photos except the tags they are using. And it's remarkably precise. So when you have enough tags (despite what common sense would say) you don't necessarily end up with chaos. You may end up with actually more meaning and quite precise meaning."
Flickr has decent (not great) precision and extremely low recall (only a small percentage of photos are tagged). Weinberger focuses on the former and ignores the latter. Low recall is not a problem for Flickr, because photo enthusiasts rarely need to see every picture taken at a certain place or with a certain subject. But low recall is a huge problem for scholars, researchers, patent attorneys, and investigative journalists. Creating information systems that don't serve these needs is not "sticking it to the man"--it's screwing ourselves over.
"The world's greatest expert doesn't matter because he refuses to engage in a public negotiation of knowledge (for example, Wikipedia). Which is what happens when the authority vanishes and we are only left with each other and we engage with one another. This is how we get to the best truth we can manage. This is through the public negotiation of knowledge."
Weinberger's "us vs. the man" rhetoric blinds him to the very real concentrations of power and authority that exist in the new digital order. Believing that authority has somehow "vanished" is stupid and dangerous. Just look at Weinberger's beloved tagging systems: it's become clear that different design choices (tagging interfaces, algorithms for indexing, suggesting, and aggregating tags) can radically affect how tags are used. Those design decisions are not reached through public negotiation--they are imposed by people in positions of authority. And Wikipedia hasn't made authority disappear, either--it has simply divorced authority from institutions and the credentials they provide."
- Cory Doctorow (BoingBoing) at http://www.boingboing.net/2007/05/02/everything_is_miscel.html
- Karen Schneider at http://www.techsource.ala.org/blog/2007/05/weinbergers-well-ordered-miscellany.html
- Ethan Zuckerman at http://www.ethanzuckerman.com/blog/?p=1413
- Peter Morville at http://semanticstudios.com/publications/semantics/000167.php
Another podcast at http://www.radioopensource.org/weinbergers-miscellany/