Cyberinfrastructure

From P2P Foundation
Jump to navigation Jump to search


Characteristics

Source: Cyberinfrastructure Inside Out. Definition and Influences Shaping Its Emergence, Development, and Implementation in the Early 21st Century. By KERK KEE, LUCY CRADDUCK, BRIDGET BLODGETT & RAMI OLWAN. Chapter Eight of the book: Nexus. New Intersections of Internet Research‎. 2010.


"Cyberinfrastructure can be characterized as data-intensive, computationally powerful, distributed, hierarchical, interoperable, and with second-order growth (i.e., generation of data about data and metadata). The first characteristic of data-intensiveness refers to cyberinfrastructure’s capacity to hold a large amount of data in various forms, including numbers, text, multimedia, acoustic, and nonverbal data (Poole, 2009). The goal of combining data sets among groups of researchers was a key driver of initial cyberinfrastructure development. Traditionally, science was limited by regional data, human resources, and the technological capacity of small groups of independent researchers at various locations. With cyberinfrastructure, researchers can combine multiple datasets into one that exceeds what a traditional small group of researchers can collect and analyze. Consequently, researchers can undertake research at a scale otherwise not possible.


Cyberinfrastructure is computationally powerful and has the capacity to analyze intensive data (Friendlander, 2008) via parallel and distributed computing processes. Traditionally, researchers executed computer analyses of scientific data in local laboratories, and research studies were limited by the processing speed and power of individual (or a small network of ) commercial personal computers (PCs) or local supercomputers, if available at affiliated institutions. A supercomputer is a large network of powerful modular servers and commercial PCs run on parallel and distributed computing algorithms. Via this technique, a data-intensive job can be divided into small chunks and fed into individual servers and/or PCs (within a supercomputer architecture) concurrently and recursively. The results are aggregated at the end of the computational process, thus increasing processing speed and capacity. Cyberinfrastructure is a network of supercomputers across the country, such as TeraGrid, which connects 11 supercomputers across the United States. Due to the combined computational power, cyberinfrastructure provides the fastest computational resources available, enabling science at a speed otherwise not possible.

As alluded to in the characteristic of computational power, cyberinfrastructure is a distributed platform. Via cyberinfrastructure, a group of researchers can submit a computationally and data-intensive job from a remote location, and have the job processed at multiple supercomputers, with the combined results returned back to the initiating location. The group of researchers is only required to have access to cyberinfrastructure through their local institution. The distributed characteristic of cyberinfrastructure takes research beyond local constraints to virtual computational resources at impressive speed.

Due to its complexity, it is logical to describe cyberinfrastructure as hierarchical. Cyberinfrastructure involves a range of large and small components from a cable modem that can be picked up by a child to a supercomputer the size of a building basement. However, it is important to note that since cyberinfrastructure is a network it cannot function properly without its smallest component (Friendlander, 2008). Therefore, the smallest component also holds the entire infrastructure together, although it may be hierarchical in a physical sense.

In order for cyberinfrastructure to operate and function as a coherent whole, the scientific data, computational resources, technological systems, and human organizations must be interoperable. Interoperability (ACLS, 2009; Baker, Ribes, Millerand, & Bowker, 2005) refers to a property of cyberinfrastructure wherein a range of diverse data, resources, systems, and organizations interoperate and work seamlessly together. Without interoperability, the former four characteristics reviewed have no value. Data sets, computer resources, computational jobs, and technological components will remain local, separate, and small-scale. Given the aforementioned characteristics, cyberinfrastructure data and its subsequent analysis grow over time, leading to second-order growth. The aggregated scientific data and the human activities recorded on cyberinfrastructure are first-order data that can be analyzed by researchers. Careful coding, qualitative observations, statistical analyses, and network visualizations by these researchers yield second-order data (Poole, 2009) or metadata added to the existing cyberinfrastructure data repositories. This unique characteristic facilitates longitudinal research in a wide range of disciplines at a scale and fashion never possible before. Cyberinfrastructure grows in its potential for new discoveries by means of complex cross-referencing (Poole, 2009) to explore largescale global challenges.

In sum, cyberinfrastructure possesses the characteristics of data-intensiveness, computational power, distribution, hierarchy, interoperability, and with secondorder growth. Its complex make-up requires a careful explication of its different layers.



Layers

The second dimension of cyberinfrastructure is constituted by its four layers of hardware, software, agents, and interactions. The hardware layer can be further divided into the specialized/niche hardware and the general/commercial layers. Based on the discussion thus far, it is apparent that a key piece of the cyberinfrastructure puzzle is a network of supercomputers. Supercomputers are mainly used for niche research analyses as described earlier, and specialized commercial applications, such as airplane design, automotive crash tests, and oil reservoir discovery, too big in scale and expensive for frequent trials and errors.

The specialized/niche layer of cyberinfrastructure also includes advanced instruments (Stewart, 2007), digitally-enabled sensors, observatories and experimental facilities (NSF, 2007), and large-scale data storage systems/repositories (Atkins et al., 2003). These are examples of a range of physical hardware for specialized purposes and niche usage. Many of them, such as observatories, were uniquely built with specific utilizations in mind. In other words, these instruments are hardware that cannot simply be bought “off-the-shelf.”

Beyond its specialized/niche hardware, cyberinfrastructure is also made up of a general/commercial layer of distributed personal computers (Atkins et al., 2003), desktops (Friendlander, 2008), and portals (Poole, 2009). In addition to commercial PCs, cyberinfrastructure also includes phone devices (landline, mobile, and smart phones), fax machines, printers, modems, and other off-the-shelf electronic devices researchers concurrently use for nonresearch purposes. These hardware components, both specialized/niche and general/commercial, are tied together through a range of software applications.

The software layer mirrors the hardware layer in terms of its specialized and general applications. In order to process large-scale data and specialized analyses on supercomputers, researchers need appropriate analytic tools (Poole, 2009) and high-performance computing (HPC) applications. HPC applications are used by highly trained researchers. Loosely generalized, HPC applications enable parallel and distributed processing of large-scale data on supercomputers to generate scientific results. However, these results need to be shared with collaborating researchers and interested colleagues. This is where the next category of software applications comes in.

A range of information and communication technologies (ICTs) supported by telecommunication systems, the Internet, and the World Wide Web make up another critical layer of cyberinfrastructure. Specific examples include email applications, online meetings, personal and organizational web pages, digital libraries, search engines such as Google (Hai, 2004), and web 2.0 technologies such as blogs (Poole, 2009). These ICTs can be used by researchers for interpersonal, group, and organizational communication between their scientific and nonscientific work concurrently. We suggest that individual or collective human agents are key to an active cyberinfrastructure. Without users, something that is called “a cyberinfrastructure” is not a real cyberinfrastructure (Ribes & Finholt, 2009), as it is not active but static. The notions of people (Stewart, 2007), groups and organizations (Lee, Dourish, & Mark, 2006), and personnel and institutions (Atkins et al., 2003) have consistently been mentioned in cyberinfrastructure literature. Human agents usually are assumed to be independent actors in the context of cyberinfrastructure. That is, they represent “nodes” in a network, as understood in traditional social network literature. Next to human agents, nonhuman agents are also important in cyberinfrastructure. Nonhuman agents refer to documents, concepts, key words, data sets, etc. (Contractor, 2009). They are discrete entities and resources (Friendlander, 2008) in the context of cyberinfrastructure. However, they are labeled as “agents” because they appear to do things, or have impacts on other “nodes” in the network. The notion of nonhuman agents departs from traditional social network literature and draws upon actor network theory (Latour, 2005). However, human and nonhuman agents as nodes have no impact on each other or the overall cyberinfrastructure if they do not interact.

Human and nonhuman agents interact and are tied together through multidimensional networks. The notion of networks concerns not only high-performance grid networks (Stewart, 2007) and the Internet in the physical sense, but relationships and ties as commonly defined in social network literature. Furthermore, the notion of networks is “multidimensional,” because nodes in cyberinfrastructure are both people and “nonhuman agents” (Contractor, 2009). Networks therefore represent complex physical connections and relational ties among human and nonhuman agents in cyberinfrastructure.

Middleware is a specific type of multidimensional network (NSF, 2007). It is composed of computer software that ties multiple software applications together and allows them to interact in a parallel, distributed, and interoperable environment. We highlight middleware because it plays a significant role in creating key processes in cyberinfrastructure, which will be discussed next. Cyberinfrastructure consists of hardware, software, agents, and interactions. When the four layers are in actions, they create cyberinfrastructure processes.


Processes

There are two key processes of cyberinfrastructure: the virtual environment and the virtual organization. The first key cyberinfrastructure process is the technologically generated virtual environment (VE) (ACLS, 2009; Poole, 2009; Schroeder & Axelsson, 2006), which represents the continuously generated virtual space in which researchers interact with data and each other. In the present development, virtual environment consists of visualizations (Stewart, 2007), simulations (Leonardi, 2009), and models (Monteiro & Keating, 2009). Based on HPC applications on large-scale data, researchers are able to use visualization techniques, interactive simulations, and computer modeling to analyze and predict complex scientific phenomena with significant societal and global implications. One example is realtime simulation on combined data from nearby locations that effect the development and expansion of a hurricane threatening a local community.

The second key cyberinfrastructure process is the socially generated virtual organization (VO). A VO brings a group of distributed researchers together for a common purpose and allows them to interact with each other. A VO is dispersed, diverse, and flexible while also remaining coordinated, coherent, and secured (Bird, Jones, & Kee, 2009). For instance, the VO for the Large Hadron Collider project brings about 2,000 researchers together across multiple countries. Embedded with the notion of a VO is interdisciplinary collaboration (Monteiro & Keating, 2009) and community-building (Poole, 2009). So far, cyberinfrastructure can be defined by its unique characteristics, multiple layers, and key processes; however, it is most importantly defined by its intended outcomes.


Outcomes

Cyberinfrastructure emerged to promote three specific outcomes. The first outcome of cyberinfrastructure is that it increases productivity (Stewart, 2007). Productivity can be understood as the ability to do more in less time. Due to its intensive data and computational power, cyberinfrastructure can process larger data at faster speed than traditional personal computers and local networked machines. If nothing else, cyberinfrastructure increases the productivity of researchers.


The second outcome of cyberinfrastructure is innovation (Atkins et al., 2003). Innovation refers to the ability to produce novel outcomes. Due to its intensive data and computational power, cyberinfrastructure enables research at a scale and speed never before possible, facilitating the exploration of complex phenomena at the edge of scientific frontiers. As a result, cyberinfrastructure enables and supports innovations in research.

The third outcome of cyberinfrastructure is that of revolution (Atkins et al., 2003; Stewart, 2007). Revolution can be defined as the ability to cause a paradigm shift. With increased productivity and a stream of innovations, cyberinfrastructure stimulates a revolution in science, causing researchers to think of science differently, explore big questions, and work in new ways. Cyberinfrastructure also generates a new set of scientific practices (Monteiro & Keating, 2009). Once transitioned, researchers cannot return to their previous paradigm of doing science, thus effecting a intellectual and practical revolution."