ReCaptcha

From P2P Foundation
Jump to navigation Jump to search


A project that aims to put the filling in of CAPTCHA forms to better use, for example in checking the digitizing of scanned books.


Citation

From Forbes at http://www.forbes.com/feeds/ap/2007/05/24/ap3756168.html?


"instead of wasting time typing in random letters and numbers, Carnegie Mellon researchers have come up with a way for people to type in snippets of books to put their time to good use, confirm they're not machines and help speed up the process of getting searchable texts online.

"Humanity is wasting 150,000 hours every day on these," said Luis von Ahn, an assistant professor of computer science at Carnegie Mellon. He helped develop the CAPTCHAs about seven years ago. "Is there any way in which we can use this human time for something good for humanity, do 10 seconds of useful work for humanity?"

Many large projects are under way now to digitize books and put them online, and that's mostly being done by scanning pages of books so that people can "page through" the books online. In some cases, optical character recognition, or OCR, is being used to digitize books to make the texts searchable.

But von Ahn said OCR doesn't always work on text that is older, faded or distorted. In those cases, often the only way to digitize the works is to manually type them into a computer.

Von Ahn is working with the Internet Archive, which runs several book-scanning projects, to use CAPTCHAs for this instead. Internet Archive scans 12,000 books a month and sends von Ahn hundreds of thousands of files that are images that the computer doesn't recognize. Those files are downloaded onto von Ahn's server and split up into single words that can be used as CAPTCHAs at sites all over the Internet.

If enough users decipher the CAPTCHAs in the same way, the computer will recognize that as the correct answer.

"If we can correct these books so that they are really in good shape, then you can go and use these books in other type devices more easily" such as handheld computers or in programs for reading to the blind, said Brewster Kahle, co-founder of the Internet Archive.

Von Ahn approached the Internet Archive to get help in developing the new system, but it has not been put into use yet. Theoretically, von Ahn said the new book-based CAPTCHAs could be used in place of any CAPTCHA currently on the Web." (http://www.forbes.com/feeds/ap/2007/05/24/ap3756168.html?)