Common Crawl

From P2P Foundation
Jump to navigation Jump to search

= Common Crawl builds and maintains an open crawl of the web accessible to everyone. Our vision is of a truly open web that allows universal open access to information and enables greater innovation in research, business and education.

URL = http://commoncrawl.org/


Description

"Common Crawl Foundation is a California 501(c)3 registered non-profit founded by Gil Elbaz with the goal of democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible.

As the largest and most diverse collection of information in human history, the web grants us tremendous insight if we can only understand it better. For example, web crawl data can be used to spot trends and identify patterns in politics, economics, health, popular culture and many other aspects of life. It provides an immensely rich corpus for scientific research, technological advancement, and innovative new businesses. It is crucial for our information-based society that the web be openly accessible to anyone who desires to utilize it.

We strive to be transparent in all of our operations and we support nofollow and robots.txt." (http://commoncrawl.org/our-work/)