Distributed Search Engines

From P2P Foundation
Jump to navigation Jump to search

URL = http://en.wikipedia.org/wiki/Distributed_search_engine

Additional info from: Disintermedia (this info now actively maintained on this P2PF page)

Definition

From the Wikipedia:

"A distributed search engine is a search engine where there is no central server. Unlike traditional centralized search engines, work such as crawling, data mining, indexing, and query processing is distributed among several peers in decentralized manner where there is no single point of control." (http://en.wikipedia.org/wiki/Distributed_search_engine)

Emre Sokullu sketched out the basic concepts and advantages of an open p2p search platform in 'How To Build an Open Source Google', a 2007 article on ReadWriteWeb called . SearchTools.com offers a useful introduction to the concept of applying peer-to-peer to search, and a few references to projects current at the time, but the page states that it's not been updated since 2012.

Directory

Via Wikipedia (since modified on this wiki):

Grub (GPLv2) (defunct)

Kord Campbell, Igor Stojanovski, and Ledio Ago started Grub as an open source project in January 2000. LookSmart acquired Grub in 2003, and in July 2007, sold it to the short-lived Wikia Search project.

InfraSearch (defunct)

In April 2000 three programmers built a prototype P2P web search engine based on Gnutella called InfraSearch. It was meant to run inside the participating websites' databases creating a P2P network that could be accessed through the InfraSearch website. InfraSearch was acquired by Sun Microsystems in 2001 to become part of the search system for the open source https://java.net/projects/jxta/ JXTA] project. Sun, and most its various subsidiaries and open source projects, was acquired by Oracle in 2009. Sadly, InfraSearch founder Gene Kan passed away on June 29th, 2002.

Opencola (founding company defunct)

On May 31, 2000 Steelbridge Inc. announced development of OpenCOLA a collaborative distributive open source search engine. It runs on the user's computer and crawls the web pages and links the user puts in their opencola folder and shares resulting index over its P2P network. Steelbridge was renamed Opencola, and the Opencola company was sold to the Open Text Company in 2003, after releasing Swarmcast under the GNU GPL.

FAROO (Proprietary freeware)

In February 2001 Wolf Garbe published an idea of a peer-to-peer search engine, started the Faroo prototype in 2004,[10] and released it in 2005.

YaCy (GNU GPLv2+, GNU LGPLv2+ for Cora library)

On December 15, 2003 Michael Christen announced development of a P2P-based search engine, eventually named YaCy, on the heise online forums. As of the January, 2016, the latest stable version was 1.8, released September 16, 2014.

Nutch (Apache 2.0)

In June 2003, Doug Cutting, and Mike Cafarella completed a 100-million-page test system, and in 2005, the project was accepted into the Apache Foundation Incubator program. Later that year it became a sub-project of the Lucene project, and in 2010, become an independent Apache project. Hadoop began as a MapReduce system and distributed file system that were implemented in the early years of the Nutch project. Current development of Nutch is not orientated towards p2p search so much as distributed search across supercomputer clusters.

Seeks (GNU AGPL) (defunct)

Unless it's moved off SourceForce, Seeks never got out of beta. The latest version on SF (0.4.1) was released in 2013.

Wowd (defunct)

Some time in 2006 Borislav Agapiev started thinking about a distributed search engine. Then on October 20, 2009 he publicly launched Wowd. In 2011, having failed to get traction, Wowd was split up and acquired by three different companies including FaceBook, a start-up called Jildy, and an anonymous third company which bought the Wowd software patents.

Open Search (defunct)

In 2007 a group funded by the Digital Pioneers Stimulation Fund set up an Open Search Foundation, but it went into hibernation after the 6 months of funding was up, producing only one beta release.

Discussion

Danyl Strype:

It seems clear from all the defunct projects that litter the pathway towards distributed search, including the companies above and the various defunct or moribund efforts to create user-edited directories (DMoz, Wikia Search), that p2p search is a difficult technical problem. Of those projects that have survived, with the exception of Faroo, all are free code projects with open source communities behind them. However, just adopting a free code license is not enough to save a project without a stable stewardship organisation to keep development moving.

The free code Dooble browser has integrated YaCy into its desktop software, and if a more popular browser such as Mozilla Firefox took the same approach - a more likely strategy now they're no longer receiving funding from Google - this may be the only way a p2p search engine can get the critical mass to become useful. It the p2p search project chosen followed the DuckDuckGo example of integrating metasearch of popular sites Wikipedia, YouTube, IMDB, and so on, this could also help to populate the index with enough information to make it useful, and give it a starting point for ongoing web crawling.

More Information