Robert Kaye on the MusicBrainz Project
= The Open Knowledge Foundation spoke with Robert Kaye of the MetaBrainz Foundation about the history of the MusicBrainz project, about the role of community, about the way open licensing helps Music Brainz work, and about the future he sees for music meta data.
"MusicBrainz is a user-maintained community music metadatabase. The MusicBrainz community collects and maintains data about recorded music releases such as artist name, release title and track listing. That data is re-used by music services across the web, including Amazon and Last.fm, as well as in Free and Open Source Software applications.
Robert Kaye is Executive Director of the MetaBrainz Foundation, the non-profit group which operates MusicBrainz. The Open Knowledge Foundation spoke with Robert about the history of the MusicBrainz project, about the role of community, about the way open licensing helps MusicBrainz work, and about the future he sees for music meta data."
"OKF: How big is the community that contributes to Music Brainz, and what do you think motivates people to contribute?
RK: We have 465,897 registered users and of those 1,385 were active in the last week. The MusicBrainz community fits into roughly three categories:
· The core people: These are the people who are hacking on MusicBrainz or editing profusely. MusicBrainz is their hobby, job or resume builder.
· Regular editors: People who love music more than the average person and want to make sure the data for their artists is clean and that their music collection is sparkling clean as well.
· Tagger users: People who use one of the tagging applications to clean up their music collection. These people tend to be a very transient group. They come, clean up their collection and leave. They may come back in a while to clean up new data in their collection. To them MB is a means to clean up their collection.
OKF: What is the role of the community inside MusicBrainz?
RK: The community is critical for MusicBrainz. If people stopped editing the data in MusicBrainz, the data would stop changing and the business model would instantly vaporise. The software that powers MusicBrainz is worthless without the data. The data is worthless without the people behind it. Given that, we need to make sure that we don’t alienate our contributors – we can’t afford any missteps that would cause the community to lose faith in MusicBrainz."
OKF: Why did you decide to use an open licence? What are the advantages of using an open licence from your point of view?
RK: Mainly I was upset that CDDB, which used to be freely downloadable, was taken private by Escient (now (dis)GraceNote). I typed in several hundred CDs and now someone else was making money off my work. I was pissed. At the time I was getting into open source and I saw that open data would be a critical play in the future – a future I perceived to be off in a number of months – I wasn’t ready to wait a decade for it to be really ready.
The vision I saw included a well linked data set with stable identifiers that didn’t change so that the data set could be cross-linked in a stable manner. What I saw was the “Semantic Web” or what we’re now calling “linked data” and it was clear to me that in order to play in this field you couldn’t make a walled garden around your data. If you ever hoped that others would link to your data, it was clear to me that I had to bend over backwards in order to make this data available to everyone. I also saw Linux growing steadily and slowly making in-roads against Microsoft – how can Microsoft compete with free AND high quality? It would be hard. We’re seeing the same happening with Wikipedia and classic encyclopaedias – Microsoft recently shelved Encarta, a sign that Wikipedia is edging out some of the smaller players.
This vision was the easy part. Then the hard work started – what licence should I use? The only licence out there was the Open Content licence, which was largely unproven. And it didn’t address the issues that faced data very well. In an email conversation with Richard Stallman he suggested that I use the GFDL… Compared to the GPL the GFDL is a horrid abomination! (I’m still trying to find the front matter and the appendix in my database tables!!) Mr. Stallman also brusquely informed me that the text of the GPL was *NOT* available under the GPL or any licence for that fact. He specifically forbade me from using his text to create a better, more data oriented licence. Not surprisingly, I stopped being a fan of RMS from that point on.
I ended up having many conversations about licences and was quite frustrated… then I got a call from the Creative Commons! They were about to launch and were looking for projects who would adopt their licences before they went public. I read the licences and was immediately jazzed about them. I had already been educating myself about the Public Domain and the Feist vs Rural Telephone company case and thought that my core data needed to be in the Public Domain. Now the CC provided a nice and clean method for doing this – I adopted the licenses clear across the board.
The non-commercial licence was actually the magic that enabled me to found the MetaBrainz Foundation! I was convinced to NOT create a legal entity for MusicBrainz until I could see a business model emerge that didn’t hinge on begging. My concept was to allow free access to the core data, but play gatekeeper on the data and control how quickly and how conveniently someone could get access to the data. By allowing the public non-commercial unfettered access to the data, I would win over the Open Source communities, which we have. But by taking money for timely and convenient access, I could fund the foundation and in turn fund my own paycheque. This has been working well so far – while we’re not making oodles of money (especially in this economic climate), we’ve been in the black year over year since inception. I never resort to begging and yet I can license public domain data to make ends meet.
What’s even more trippy about this is that I may have created the first 100% profit non-profit business model. Since the operations of the project are for the public at large, we make this as cheap as possible. And making the data available to the public is part of that deal – it is written into our IRS charter. When a commercial customer comes along, they tap into our live data feed, which they pick up from our FTP site, which is actually operated by the Oregon State University with support from Google. In other words, the incremental cost for adding a new customer is ZERO. After I sign the contract, I do nothing but cash the cheques. It’s a rather odd arrangement, but the IRS hasn’t given me grief and my community and customers are happy.
OKF: Where has MusicBrainz been re-used?
RK: A roster of our paying customers is here: