Cloud Computing

From P2P Foundation
Jump to navigation Jump to search

Definition

From the Wikipedia article on cloud computing :

"Cloud computing is a computing paradigm shift where computing is moved away from personal computers or an individual server to a “cloud” of computers. Users of the cloud only need to be concerned with the computing service being asked for, as the underlying details of how it’s achieved are hidden. This method of distributed computing is done through pooling all computer resources together and being managed by software rather than a human."

http://en.wikipedia.org/wiki/Cloud_computing



Characteristics

A nice overview of what a Cloud should be (as described by John M Willis):

  • Primary Characteristics
* It uses commodity-based hardware as its base. The hardware can be replaced anytime without affecting the cloud.
* It uses a commodity-based software container system. For example, a service should be able to be pulled from one cloud provider to any other cloud provider with no effect on the service.
  • Secondary Characteristics
* Virtualization engine
* Abstraction layer for the hardware, software, and configuration of systems
* Pay as you go with no lock-in
* Support
* Dynamic horizontal(global) and vertical(resources) scaling
* Autonomic computing (automated restarts, automated resource expansion and contraction).
* Flexible migration and restart capabilities


Typology

By Primavera De Filippi and Miguel Said Vieira:

"Cloud computing can be subdivided into three distinct categories that distinguish themselves according to the type of resources involved: Infrastructure as a Service [IaaS], Platform as a Service [PaaS], and Software as a Service [SaaS]. For the purpose of this paper, we will focus mainly on the latter—as the one most likely to affect information commons. In the context of cloud computing, SaaS refers to a new way of delivering software functionalities by providing a variety of online applications that can be accessed directly from a web-browser, without the need for users to download any application onto their own devices. The key idea is to separate the ownership and possession of software from its actual use (Turner, Budgen, and Brereton 2003).

In spite of the increasing complexity of underlying software, users only interact with the application through the user-interface provided by the cloud provider, without any knowledge as regards the technical implementation of the applications they are running; most or all of the back-end processing and storage is made in the cloud infrastructure, and not in the user’s own devices. Cloud providers can thus modify their software at any time, or diversify the operators that contribute to providing the underlying services without the need for any kind of intervention from users, who are often unaware of any changes made in the back-end infrastructure of the cloud." (http://biogov.uclouvain.be/iasc/doc/full%20papers/De%20Filippi%20-%20Said%20Vieira.pdf)


Business/consumption models based on Cloud Computing

Cloud Computing drives to on-demand resource allocation thus scaling. The different players adopt different approaches, sometimes leading to further layering/atomization in the service stack.

Models

Platform as a service

Cloud computing resource reselling can first be seen as platform as-you-use reselling. This enable a DIY-datatcenter model for individuals and organizations.

Amazon AWS example

The Amazon AWS service provider lets individuals (mostly, start-ups) consume on-demand storage space, cpu power (using virtualization), and more recently simple database and storage volumes. This allows to easily perform high-computation tasks, or host high-demand webservices.

Clients are billed based on usage service (mostly, data transfer and cpu time) on a monthly basis, and can even profit from a reseller-like status (using the DevPay service).

Softlayer

The Softlayer service exposes it's platform's feature trough an XML-RPC API (thus, highly integrable). Here, network devices (firewalls, load balancers, servers, storage servers) are driven easily through this API down to the hardware layer (e.g. hardware reboot).

Software as a service

Since the previously described example offers only basic "building blocks" (although some of the internal Amazon services can rely on them as well), the result is that innovation is let to the users.

The Mogulus.com streaming provider is a first example of professional-oriented, value added webservices based on cloud computing: it lets anyone create it's own web television, with professional features.

Managed, scalable hosting

Another approach is to manage scaling from end to end, and "only" offer simple, generic web hosting. Users (developers) only upload applicative code without having to worry about scaling concepts. This, however, tends to lack flexibility and tie the application to it's hosting platform.

Google recently unveiled Google AppEngine which relies and integrates on comparable tier services (storage, database and existing Google services such as gmail).

New business models

For the first time, anyone can use almost public data centers. Start-ups can explore new ways of building businesses on top of these quasi-unlimited resources.

Licensing models

Licensing models can then switch from a per-user/machine model to a per-usage model. In short, basic resources (e.g. data transfers) can be overcharged as licensing fee. Early examples are the Wowza Streaming Server or RedHat paid Amazon EC2 virtual images. Note that this models are compatible with Open Source software, where it is the integration work that is billed.

Billing metrics transformation

Billing-wise, the "transformation" also gets to the pricing model and usage metric, which can be then more suited to the intended service (presumably, viewer hours), both for the provider and the consumer.


Standards and Technical Aspects

Technical Aspects: The art of scaling

Cloud Computing only provides the basic means. It's up to the user to efficiently use and allocate them in proportion to the demand, which is denominated as scaling.

An important research effort concerning allocation strategies, error handling, availability clustering is ongoing.

Scalability is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged.


Freedom Conditions for Cloud Computing

Georg Greve:

"It’s come to the point that I was asked to explain what I consider necessary prerequisites for an open, free, sustainable approach towards what is often called “The Cloud” or also “Software as a Service” (SaaS).

So what do I think constitutes a socially acceptable and sustainable approach to “Cloud Computing” or “SaaS”?

I think it may be simpler than what I initially thought. There are two primary points that now seem most relevant to me:


  • Right to restrict

Users must be able to restrict access to their own data, especially by their service provider. Participating in social networks, or enjoying the convenience of having your data available at all times should never have to come at the price of giving up privacy. So users must be given a choice to restrict access to their data as much as they consider necessary or desirable, from fellow users, and their provider. Similarly, they should never lose the right in their data simply because they use a certain service.


  • Freedom to leave, but not lose

Users must be able to switch between providers, or even to host their own data, if they so choose. And they must be able to do so without losing their network.

They should still enjoy the same level of interconnectivity and not be penalized for having switched providers in the form of having to convince all their contacts and friends to switch, as well.

Software such as StatusNet which is powering Identi.ca allows to set up your own instance – this is a step in the right direction.

From these follow a couple of necessary conclusions to get to this point:


  • Free Software necessary, but not sufficient

Free Software is a necessary, but not a sufficient condition. Without the software being Free Software, the Freedom to leave, but not lose is exceedingly hard to implement. So in my view the GNU Affero General Public License (AGPL) is strongly preferred, followed by the GNU General Public License (GPL) Version 3, but ultimately any Free Software license will do. Implicitly therefore I am also not adverse to allowing companies to differentiate themselves to some level on code, as long as that does not violate the principles above.


  • Decentralized & Federated

In order to allow switching without losing the network, any software in this context should be designed federated and decentralized, based on protocols that allow such interconnectivity as well as re-discovering users that have moved.


  • Open Standards

In order to facilitate the connection of services and providers, as well as allow for innovation and differentiation, a certain level of freedom to experiment is necessary. So software and services should provide truly Open Standards with ongoing interoperability work through plug-fests and automated test suites which give some indication on how well which services actually interoperate.


  • Transparent Privacy Policies

In order to have control over data, users first need to understand what they are (or are not) allowing the provider to do, which is typically not the case. Most users have never read the 20 page privacy statements which are written in ways that make telephone books seem an entertaining read. So we need a way to simplify this.

A set of standardized privacy policies, maybe with a simple visualization approach similar to what Creative Commons came up with, would be a very useful step forward here. No change of policy without explicit consent

And naturally it should be illegal to change privacy policies on users without their explicit consent. They need to know what is changing, and how, and what will be the resulting level of privacy they enjoy – in the same clear, transparent and understandable manner." (http://blogs.fsfe.org/greve/?p=452)


Cloud Computing Standards

Dion Hinchcliffe:

"One of the most encouraging and consistently developing stories today is the standards work being done on the Open Cloud Computing Interface, which is creating a set of REST-based interfaces for the management of cloud resources including computing, storage, and bandwidth. One of my favorite aspects of OCCI is that it tries hard to be a minimal specification that is simple and straightforward, there's little or none of the WS-I megastandards here.

OCCI has been undergoing frequent and steady revision (read the latest iteration, version 5, released at the end of last month) and is coming together as a capable standard that is actively supported by Cisco, Sun Microsystems, Eucalyptus, Rackspace, GoGrid, and many other members. OCCI currently has my vote as the first major cloud computing standard that you're most likely going to see in a real-world cloud service near you in the future. What's missing? Support from the major vendors such as Amazon, Google, Microsoft, and Salesforce.

But OCCI is just one of many cloud computing standards. You can view a larger list of the current standards in development at Cloud Standards.org that includes efforts (or some cases, just involvement) from many of the usual suspects including DMTF, ETSI, NIST, OMG, SNIA, OASIS, The Open Group, and the Open Cloud Consortium.

When you combine the Open Virtualization Format along with OCCI you start to get a complete way to describe, deploy, and manage a cloud computing environment and begin to make it easier and practical to switch between providers that support enough of the base set of standards.

In an upcoming post, I'll take a look at the two key questions that will drive the interoperability and openness questions for the near future. These questions are 1) what is the absolute minimum set of standards required to have full open cloud computing portability and 2) what kind of cloud management efforts are emerging, either standards, products, or just practical techniques, that enable cloud interoperability for enterprises today." (http://www.ebizq.net/blogs/enterprise/2009/10/as_cloud_computing_grows_where.php)



Discussion

Kevin Kelly on the Cultural Effects of Cloud Computing

Kevin Kelly's discussion of the cultural effects of Cloud Computing are also of interest:

"What about us? What is the culture of cloudiness? My hunch (which I cannot prove yet) is that the consequences of going from the web to the cloud will exceed the changes we saw going onto the web originally. I've teased out some cultural dynamics I think will prevail in a cloudy world:


Always On. Constant connection makes the "on" invisible. We do nothing to connect since it is now the default. It is like air. As behavior economists have shown, defaults make huge differences. The on default biases us toward connection and sharing. The always on default biases us toward expecting everything to be connected and always on. We expect all agents should always be on. All services should always be available. The drive toward 24/7 availability for everything continues. Not being always on is a disadvantage (with some exceptions). Always on also means more of our lives are captured, analyzed, digested, and "on". The more the cloud is always on, the more of our self is moved into the cloud.

Omnigenous. The distinction between being on the cloud and off disappears as more of the world is included. In the beginning the cloud is the cloud of servers, then it becomes the cloud of servers and all our laptops, and then it includes all those plus all our mobile phones and then all our TV screens as well. As the cloud keeps improving "network effects" kick in and those improvements draw in more devices, more sensors, more chips, making it even more attractive, until the cloud is omnigenous and includes every kind of thing. Cameras, microphones -- anything producing data will shift toward the cloud. So the cloud is the first place we go to for whatever we want. We may not always find it there, but it will always be the place we begin.


More Smarter. Clouds don't have to be smarter than the web we have now, but they are likely to be. The web can be thought of hyperlinked documents. The clouds can be thought of as hyper-linked data. Ultimately the chief reason to put things onto the cloud is to share their data deeply. Not just to have a convenient backup, or to have always on access, which the cloud WILL give, but to be able to weave together the data and interactivity of the parts, and thereby make all the pieces much smarter and more powerful than they could possibly be alone. It is not too much of an exaggeration to think of the cloud as the tool which allows us to share the elemental aspects of our data and activities in a way makes them smarter. The cloud is sort of a hivemind tool.


Inseparable Dependence. "Always on" plus superior performance will lead to supreme dependence on our part. There is the curious paradox that as the hard-lifting computation leaves the devices near our bodies and takes place in the invisible cloud it psychologically moves the device closer to us. As devices get smarter they get more intimate. A friend of mine had to ground their teenager for a serious infraction. They took her cell phone away. They were horrified when she became physically ill. It was almost as if she had an amputation. And she had in one sense. I was reminded of the book/movie The Golden Compass wherein the children in that world have spiritual guardian animals, called demons. These intangible animals sit on their shoulders or hover nearby and advise and comfort them. The most horrible torture in this world is to be separated from your demon. In the future, the cloud and cloud intelligence will be our Golden Compass demons. Separation from the advice and comfort afforded by the cloud will be horrendous and unbearable.


Extreme Reliability. No machine (or body) is perfect, but clouds will be more reliable than your standalone computer. The number of outage incidents recorded for clouds is fairly small given the total number of access-hours they provide. According to the Cloud Computing Incident Database there have been 11 reported incidents in 2008. My very stable Mac has frozen more times than that this year. The reliability index for the cloud will mean it will increasingly be seen as the Backup. Our life's backup. No matter how many copies of something important you have offline, it won't feel safe until you put it online, on the cloud. We may also feel that if it is only on the cloud, it is not safe, but the reliability of the cloud will likely trump our own reliability. The consensus reliability of Wikipedia is changing our attitudes about where trust lies. In cloud life we may come to trust the aggregation of all sources over any single source.


The Extended Self. Where is my stuff? If I google my own mail to find out what I said, or rely on the cloud for my memory, where do "I" end and it starts? If all the images of my life, and all the snippets of interest, and all my notes, and all my chitchat with friends, and all my choices, and all my recommendations, and all my thoughts, and all my wishes -- if all this is sitting somewhere -- but nowhere in particular -- it changes how I think of myself. What happens if it were to go away? A very distributed aspect of me would go away. If McLuhan is right that tools are extensions of our selves -- a wheel an extended leg, a camera an extended eye -- than the cloud is our extended soul. Or, if you prefer, our extended self.


Legal Conflict. The war over copyright will seem tame compared to the legal battles that the life in the cloud will hatch. Who's laws will prevail? The laws of your domicile, the laws of your server's domicile, or the laws of international exchange? Who gets your taxes if all the work is being done in the cloud? The transparent discontinuity between legal regimes will be a threat to the expansion of the cloud. This friction will also force the growth of multiple clouds. Clouds with varying legal frameworks will compete at the global level, although within many geographical regions, there may be little choice. But the legal issues are not merely international. Who owns the data, you or the cloud? If all your email and voice calls go through the cloud, who is responsible for what it says? In the new intimacy of the cloud, when you have half-baked thoughts, weird daydreams, should they not be treated differently than what you really believe? What are the rights (and duties) of government's attempt at justice and fairness in an always on, omni cloud.


SharePrivacy. Privacy is over. Or more precisely, privacy as we imagined it is over. The extended self requires a different finesse for grappling with the levels of intimacy humans need. The binary functions of public/private, or even friend/not friend have to yield to more nuanced, more complex ways to describe our relationships. The Chinese have a unique name for every type of cousin (younger than you, older than you, your mom's brother, your dad's sister's son, etc.); the cloud will breed distinct ways of relating to agents we know, agents we once knew, agents we know we don't know, and so on. Sharing is the foundational action on the cloud. Some types of sharing will come to resemble what we used to call privacy. It is impossible to share the same cloud to do everything and not evolve our notions and powers of sharing.


Socialism 2.0. The cloud is a collective. Social media is a type of socialism. Open source software projects are kinds of communitarian schemes. When people share their medical records (Patients Like Me), or personal genomes (23andme), or their family photo albums -- they are feeding a collective because by sharing them, their goods increase in value. The success of Wikipedia, Linux, and the web in general is priming a generation to be open to the power of the group. But unlike the old socialism models of old, the top-down social media of communism, the individuals are not forced to homogenize. Instead in this emerging Socialism 2.0, individuals (anyone can edit the encyclopedia!) are liberated via the power of the group. We don't have a very good vocabulary for this dynamic right now, so we are stuck using words like socialism which carry a very heavy cultural baggage. Nonetheless, living in the collective cloud will enhance the status of group power. " (http://www.kk.org/thetechnium/archives/2008/10/cloud_culture.php)


Eben Moglen on the Civil Rights and Privacy Implications of Cloud Computing

From an interview by Glyn Moody:

"Eben Moglen: We have a kind of social dilemma which comes from architectural creep. We had an Internet that was designed around the notion of peerage - machines with no hierarchical relationship to one another, and no guarantee about their internal architectures or behaviours, communicating through a series of rules which allowed disparate, heterogeneous networks to be networked together around the assumption that everybody's equal.

In the Web the social harm done by the client-server model arises from the fact that logs of Web servers become the trails left by all of the activities of human beings, and the logs can be centralised in servers under hierarchical control. Web logs become power. With the exception of search, which is a service that nobody knows how to decentralise efficiently, most of these services do not actually rely upon a hierarchical model. They really rely upon the Web - that is, the non-hierarchical peerage model created by Tim Berners-Lee, and which is now the dominant data structure in our world.

The services are centralised for commercial purposes. The power that the Web log holds is monetisable, because it provides a form of surveillance which is attractive to both commercial and governmental social control. So the Web, with services equipped in a basically client-server architecture, becomes a device for surveillance as well as providing additional services. And surveillance becomes the hidden service wrapped inside everything we get for free.

The cloud is a vernacular name which we give to a significant improvement in the server-side of the web - the server, decentralised. It becomes, instead of a lump of iron, a digital appliance, which can be running anywhere. This means that for all practical purposes servers cease to be subject to significant legal control. They no longer operate in a policy-directed manner, because they are no longer iron, subject to territorial orientation of law. In a world of virtualised service provision, the server which provides the service, and therefore the log which is the result of the hidden service of surveillance, can be projected into any domain at any moment and can be stripped of any legal obligation pretty much equally freely.


GM: So what's the solution you are proposing?

My proposal is this: if we could disaggregate the logs, while providing the people all of the same features, we would have a Pareto-superior outcome. Everybody – well, except Mr Zuckenberg - would be better off, and nobody would be worse off. And we can do that using existing stuff." (http://www.h-online.com/open/features/Interview-Eben-Moglen-Freedom-vs-the-Cloud-Log-955421.html)

...

"What I am proposing is that we build a social networking stack based around the existing free software we have, which is pretty much the same existing free software the server-side social networking stacks are built on; and we provide ourselves with an appliance which contains a free distribution everybody can make as much of as they want, and cheap hardware of a type which is going to take over the world whether we do it or we don't, because it's so attractive a form factor and function, at the price.

We take those two elements, we put them together, and we also provide some other things which are very good for the world. Like automatically VPNing everybody's little home network place with my laptop wherever I am, which provides me with encrypted proxies so my web searching, wherever I am, is not going to be spied on. It means that we have a zillion computers available to the people who live in China and other places where there's bad behaviour. So we can massively increase the availability of free browsing to other people in the world. If we want to offer people the option to run onion routeing, that's where we'll put it, so that there will be a credible possibility that people will actually be able to get decent performance on onion routeing networks.

And we will of course provide convenient encrypted email for people - including putting their email not in a Google box, but in their house, where it is encrypted, backed up to all their friends and other stuff. Where in the long purpose of time we can begin to return email to a condition - if not being a private mode of communication - at least not being postcards to the secret police every day.

So we would also be striking a blow for electronic civil liberties in a way that is important, which is very difficult to conceive of doing in a non-technical way." (http://www.h-online.com/open/features/What-s-the-solution-cont-956215.html)


Paul Bucheit on the Cloud Operating System

See the entry on the Cloud OS

References

  1. List of cloud platforms providers and enablers
  2. Useful cloud computing blogs

Wikipedia articles

  1. Virtualization
  2. Utility Computing

Press

  1. The High Scalability blog is a technical resource about Web2.0 scaling and contains analysis of popular services [past?] scaling issues (e.g.: youtube case)
  2. Elasticvapor is Enomaly's CTO blog
  3. Google and the Wisdom of Clouds
  4. Consumer Cloud Computing
  5. Cloud Computing. Available at Amazon.com Today (Wired.com)
  6. Amazon AWS services now consume more bandwidth than the Amazon stores

Cloud Computing providers

  1. Gandi
  2. Amazon AWS
  3. Microsoft also annouced plans for a cloud service, named Live Mesh, with an orientation towards device-wide synchronisation (computers, mobile and gaming devices).
  4. bodhost cloud hosting

Managed and scalable hosting providers

  1. Microsoft Cloud Solutions
  2. Google AppEngine
  3. Joyent offers so-called "accelerators", which are Google AppEngine-similar hosting services. Since scaling problems mostly come from social networks, one can find free Facebook and OpenSocial accelerators

Behind the scenes : distributed P2P technology

Unsurprisingly, this is often P2P technology that powers the underlying resource allocation mecanisms.

  • Amazon S3 (the storage service) is powered by the Dynamo distributed data store.
  • The Hadoop project (also see http://en.wikipedia.org/wiki/Hadoop) is a Free Java software framework that supports data intensive distributed applications running on large clusters of commodity computers. [1] It enables applications to easily scale out to thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.
  • Allmydata.com's internal software Tahoe is powering the storage service is an Open Source secure, decentralized, fault-tolerant filesystem.

Storage providers

  1. AllmyData.com

Scaling consulting firms

Some consulting companies specialize in the art of scaling, and tend to adopt Open Source business models:

  1. Enomaly has released Open Source tools
  2. Itridea Open Sourced scalr], its "fully redundant, self-curing and self-scaling hosting environment utilizing Amazon's EC2"
  3. RightScale's automated cloud computing management system helps you create scalable web applications that run on Amazon's Elastic Compute Cloud. Our advanced auto-scaling and load balancing features ensure your site's uptime and reliability. The RightScale Dashboard makes it easy to setup, launch, and monitor all of your EC2 and AWS activities.