Faroo
= a web search engine based on peer-to-peer technology.
Description
From an interview with Wolf Garbe of FAROO at http://altsearchengines.com/2007/10/02/great-debate-peer-to-peer-p2p-search-part-i/
1) Architecture: How is your search engine different from today’s general search engines? Briefly, what does the architecture of your search engine looks like?
FAROO: FAROO is a web search engine based on peer-to-peer technology.
The users are connecting their computers, building a worldwide, distributed P2P web search engine. No centralized index and crawler are required anymore. Every web page visited is automatically included in the distributed index of the search engine. Installing our software, you become immediately part of the distributed search engine. FAROO’s distributed core architecture is fundamentally different from the centralized approach of today’s search engines.
2) Distribution/P2P: In what aspect is the architecture distributed? What are the benefits of this?
FAROO: FAROO is using a fully distributed architecture: distributed index, distributed crawler, distributed ranking, and distributed search.
Search, as the most frequently used Internet application, will be distributed, and thus follows a principle, which the whole Internet is based upon successfully. The distributed architecture provides cost advantages, better scaling, less intrusive crawling, democratic ranking and improved privacy protection.
- Each of the major search engines requires hundreds of thousands servers. We don’t need any hardware at all. This means huge saving of infrastructure costs, allowing us to share revenues with our users.
- The Internet is increasing steadily, and so also is the amount of required hardware in order to index all these new web pages and to serve the new users. In FAROO’s distributed architecture the users become part of the solution of this problem. Therefore FAROO scales with the growth of the Internet.
- FAROO indexes web pages without a dedicated crawler, therefore additional traffic for users and web servers is avoided.
3) Crawler: How does your distributed crawler work?
FAROO: We changed the way a crawler works. There is no traditional crawler at all. Every web page visited by one of our users is automatically included into our distributed index, and instantly searchable for all other users.
4) Ranking: How does your ranking algorithm work?
FAROO: FAROO is using an attention based ranking. If users spend a long time on a page, visit it often, put it to bookmarks or print it out, this page goes up in ranking. For the first time the ranking of the web pages is automatically done by the target audience itself. This leads to a more democratic, user centric ranking, while resistant against rank manipulation. Additional ranking parameters ensure a proper ranking also during the start with relatively few users and for freshly indexed pages.
5) Do you use the “wisdom of the crowds”? If so, how?
FAROO: When it comes to understanding, valuating, and rating of content, the human mind is still unsurpassed. Therefore FAROO uses “wisdom of the crowds” in two ways, for ranking and for crawling. An algorithm may distinguish between original content, trivial content and spam. But when it comes to more subtle distinctions, we are better off trusting our own species! And, it’s no surprise that even the well known PageRank uses indirect human judgment, as it is based on the popularity of a page among webmasters.
FAROO’s user generated ranking goes a step further, as it is based on the popularity of a page amongst all users. And this is done automatically. In this way many more people get involved then with current ranking methods, where either only webmasters are entitled to vote or a manual voting is required.
FAROO also uses user powered crawling. Pages which are changing often like, for eaxample, news, are visited frequently by users. And with FAROO they are therefore also re-indexed more often. So the FAROO users implicitly control the distributed crawler in a way that frequently changing pages are kept fresh in the distributed index, while preventing unnecessary traffic on rather static pages." (http://altsearchengines.com/2007/10/02/great-debate-peer-to-peer-p2p-search-part-i/)