Open Government Data
Principles
Government data shall be considered open if it is made public in a way that complies with the principles below:
1. Complete
All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
2. Primary
Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
3. Timely
Data is made available as quickly as necessary to preserve the value of the data.
4. Accessible
Data is available to the widest range of users for the widest range of purposes.
5. Machine processable
Data is reasonably structured to allow automated processing.
6. Non-discriminatory
Data is available to anyone, with no requirement of registration.
7. Non-proprietary
Data is available in a format over which no entity has exclusive control.
8. License-free
Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.
Discussion
Alex Steffen [1]:
"All those qualifications were the subject of substantial discussion, some of which is ongoing on a wiki, which you’re welcome to contribute towards. It was a much faster process to draft a introduction - a mini-manifesto of sorts - which reads in part:
The Internet is the public space of the modern world, and through it governments now have the opportunity to better understand the needs of their citizens and citizens may participate more fully in their government. Information becomes more valuable as it is shared, less valuable as it is hoarded. Open data promotes increased civil discourse, improved public welfare, and a more efficient use of public resources.
The definition will surely evolve, especially as we get input from people who make government policy decisions on matters of data access and security. And there are a couple of questions that couldn’t be addressed in the course of a weekend meeting.
One concerns how broad the definition should be of “government data”. If it includes all data paid for by public funds, then a call for open data has substantial overlap with the Open Access Movement, which seeks to unlock scholarly materials published in licensed journals and make those materials available under less arduous licenses, trying to share scholarly research with people in developing nations. (Much of the scholarship Open Access seeks to unlock is produced with government funding - OA advocates argue that research paid for by public funds needs to be broadly available to the public.) While it would be exciting to see solidary between these movements, that definition is probably broader than what most of the people in the room were considering when they thought about government data.
A second concern regards non-digital data. The principles above apply to data that’s available in a digital form - they don’t apply to the vast stacks of paper records most governments have accumulated, or obsolete media like inaccessible computer tapes or disks. Ideally, governments will begin to make this material available, but there are unanswered questions of costs incurred during digitization and the priority of bringing old records online. There’s a danger that keeping records in analog format will become a way to avoid digital scrutiny. Before dismissing this as absurd, keep in mind that the current US administration evidently does not use the White House email system for fear of subpoena, and uses laptops issued by the RNC to keep their proceedings from public scrutiny. At some point, a statement of open data principles will need to address the desirability of ensuring that government data becomes digital as soon as reasonably feasible." (http://www.worldchanging.com/archives/007689.html)
Examples
Ethan Zuckerman reviews open government initiatives [2]:
"Adrian Holovaty is one of the superstars in this field, known for creating digital journalism tools like the Chicago Crime mashup and the Django web development framework. He shows off a tool created to help him co-author a book on Django. Rather than putting the text of the book into a wiki and allowing anyone to edit it, the system allows fine-grained commenting on a fixed text. While the book isn’t currently open for commenting, you can see the comments placed on each paragraph of text, often suggesting very specific refinements to the book.
There’s the interesting potential for this model for document annotation to start discussions around political documents. It probably doesn’t make sense to put the text of a political speech in a wiki - the speech was delivered and the discussion is around interpretation of the words of that speech. There’s the exciting possibility that document annotation could become a new form of community interaction. Tom Steinberg of MySociety pointed out that the Free Software Foundation is trying an annotation method to allow group discussion of the new GNU Public Licenses which shows lines that are uncontroversial or more controversial based on the number of comments they’ve received. There’s a sense in which tools for allowing group development of software - versioning systems, repositories - might be applied to group authorship of text as well.
Michael Dale from Metavid has created a remarkable tool for annotating video through a wiki model. It’s a bit like Democracy Player/Miro, DotSub and MediaWiki colliding at high speed. The current MediaWiki site hints at what the future will look like - it currently provides video from CSPAN correlated with transcripts, with the transcript and video embeddable within other publishign platforms. The forthcoming version allows users to improve these captions in wiki form, to search video via captioning, and to edit and package video for export. It looks like it’s going to be an amazing and powerful system when it’s released.
Greg Elin with Sunlight Labs is a master of meshing sets of political data. He talks about Sunlight’s holy grail - one click disclosure - integrating data from GovTrack.us, Open Congress, Center for Responsive Politics, GovernmentDocs.org and others. Sunlight has taken steps to ensure that these sites are cross-referenced and integrated, so you can view portraits of US politicians that include information on fundraising, contributions from lobbyists, voting on earmarks, etc. In the long term, Sunlight is looking into doing real-time analysis of newsfeeds from sources like AP, feeding the data through “data chewers” that monitor the articles for information on politicians and link the references to detailed profiles on the individuals in question. Elin points out that most newspapers don’t have the technical capacity to integrate this sort of data into online stories - his goal is to create a “journalists’ desktop” that puts this information at the hands of every reporter, and makes it as easy as possible for a paper to integrate this information into their coverage.
Tom Steinberg of MySociety is responsible for some of the most innovative projects in UK politics and online organizing. (Tom was very careful to correct me, reminding me that he’s not a programmer and that MySociety projects are put together by a team of paid and volunteer programmers and designers who work with him - he gives that team the credit for these remarkable projects.) He explains that They Work For You, a site he’s largely responsible for, began as a project to make the Hansard (the record of parliamentary proceedings) accessible, annotatable, and linkable. In the process, TWFY created profile pages for each UK parliamentarian, which includes information characterizing how they’ve voted, how many questions they ask in session and how well they respond to constituent questions.
These pages are often the best linked pages for UK parliamentarians, and they’re generated automatically, based on information reported by the UK government… and from Tom’s scripts as well. One of his sites invites constituents to ask questions of their parliamentarians, and surveys them two weeks later to see whether questions have been answered. This information is included in the profiles of MPs, which gives them a strong incentive to be responsive to constituent questions. (Steinberg has seen evidence that TWFY is so effective that some politicians have resorted to “spam speeches”, attempting to goose their numbers on TWFY to improve their electability.)
Other projects from MySociety focus on more personal aspects of politics. Fix My Street invites citizens to document problems in their local areas, including photos and geolocation information, so that local officials can see problems under their jurisdiction. The site has a comprehensive set of rules for routing email reporting problems to the proper authorities and has registered over 10,000 reported issues thus far. The Travel Time Maps project appeals directly to the heart of many Britons, showing the average commute time per neighborhood for areas across the nation. The isochrome maps make very clear what neighborhoods are and are not well served by public transit." (http://www.worldchanging.com/archives/007680.html)