Cloud Technology Best Practices From the Field

In Part I of this series, I shared a few best practices that I have collected from my work with cloud leaders in the field. These best practices are drawn from real-world case studies of companies who are actively implementing cloud projects within their organizations. These best practices and lessons learned will provide you guidelines that are potentially applicable to your situation and can help mitigate the many risks and challenges associated with adoption of cloud in your enterprise.

Architect for a “virtual image” mindset and not for a “patch-and-upgrade” mindset

Most large enterprises that are leaning toward private clouds are first building the foundational capability of highly virtualized infrastructures. What began many years ago as a data center consolidation and virtualization process to achieve improved efficiencies has now become sound preparation for an internal cloud computing strategy. As private cloud technologies mature, we will start to see enterprises moving directly to the private cloud without going through the intermediate virtualization step.

A virtualized infrastructure enables the construction of replicable, general-purpose and commodity compute and storage on the cloud, instead of handcrafting individual physical boxes for each individual application or workload. Because the physical assets themselves are also standardized, it is possible to rapidly provision virtual resources through automation. The application developers get on-demand capability (take what you need when you need it) that they can tap into as their workload scales, so they don’t have to overprovision in anticipation of a future maximum workload scenario. You get breakthrough efficiencies from your developers because you can provide resources in a few hours, instead of several weeks or months; you can further provide capability for self-service and self-provisioning. The notion of internal clouds begins to open up a two-way communication channel between infrastructure teams and application developers, replacing the old, one-way system of developers asking for resources and IT providing those resources. One caveat though: an executive I spoke to recently pointed out that the old mentality of “hoarding” is still quite prevalent in the enterprise and despite that fact that these virtualized resources are available on demand, they are not returned after use!

The reduction and standardization of physical hardware assets has one major benefit for the business. The number of platform choices can be reduced dramatically. One CIO mentioned that his company is in the process of reducing its platform choices from 120 to five. This, in turn, means much faster procurement cycles, with dramatic time savings in receiving vendor quotes, placing purchase orders, and racking, stacking, networking, and configuring systems.

“The key for us is to architect our infrastructure and to adhere to standard containers for virtual instances so that they are mobile and they can move from one data center to another and that will enable us to maintain a competition between hungry IaaS and hosting providers and our traditional vendors.” – Chief architect, petroleum company

If you look at a typical application workload instance in a large enterprise datacenter, it is up and running for maybe five years; the application has been patched fifty times and been installed and uninstalled 500 times. What you end up with is an application system that is showing its wear and tear in terms of software configuration. On top of that, you have staff that must patch thousands of servers each time there is a change. In a cloud environment, you can instantiate a clean instance of the application. If you want to upgrade your application, you create a brand new virtual server (because it is so much easier and faster to create a new one than to patch an old one), and now you apply your patch to this new server and focus our efforts on migrating of the data state. If the new instance works, you return the old instance and it will get recycled. If it doesn’t work, diagnose it, ask for a third one, and return the other two. The thousandth instance is exactly the same as the first instance.

“Think of all the people who are trying to diagnose a problem at 2:00am, trying to figure out “What’s changed?” or “This one’s working over here and that one over there is not working, what’s the difference?” We get away from a lot of that kind of stuff. It’s an “image” mindset and not a “patch and upgrade” mindset. If you can get an image that does exactly what you want, then you can cut it any number of times you like with very little effort.” – CIO, financial services company

However, only a limited part of the value of the cloud is captured through consolidation, virtualization, and standardization. That goes only so far. The real value is turning around and asking how do you scale up (and down) rapidly and doing that with a high-degree of cost efficiency. For example, how does Wikipedia work with 350 million users and a staff of 30? How do Facebook and Skype serve millions of users with only 600 staff members? Amazon is similarly “scarily” mean and lean. The nature of the services that you overlay and how you provide them, the business process, and the business models involved are what matters.

You can’t blame it on the cloud

Ultimately, you are responsible for guaranteed services to your customers. You can’t say to a complaining customer, “I’m sorry, but we lost access to one of our cloud providers, it’s not my fault.” When you get to a point where you have a significant portion of your business-critical systems and services at various external cloud vendors, you essentially have no control of each segment and yet you are required to provide guaranteed service levels and reliability to your customers. Therefore, several executives in our study stressed the importance of building strong and cooperative relationships with their cloud vendors. Contracts and SLA agreements are important but they only go so far. A good relationship and open communication will come in really handy in a crisis situation.

But what you need is an architecture that is resilient and secure. Some tips to consider:

Keep the number of moving parts low and reduce the number of failure point.
Reduce the type of failures by putting in the right forms of resilience and failover for compute, storage, and communication infrastructure: deploy redundant and/or standby systems for high-probability failure points.
Manage your capacity that shows both the service level you are contracted to and the premiums that are going to be paid to guarantee that.
Create a base of information about failure points, failure modes, performance characteristics over period of time and collect this over a long enough time over a wide enough set of cloud vendors and services so you can make better decisions.
Encrypt everything: data in your database, in memory, in file systems, and in backups. Encrypt all data network communications.
Importantly, before you sign up with a cloud vendor, test everything. Most cloud vendors will deliver exactly what they say and nothing more. Be prepared to provide your own support unless you are willing to go to another source in the ecosystem that can.

Build Community Clouds

If you are a large enterprise or a government agency and are discovering that the current public cloud offerings do not meet your specific needs around security, compliance, performance and so on, you may be tempted to build your own private cloud within a specific department or geographical location or even within one of your supply-chains of suppliers and consumers. However, in cases where you need to smooth out and aggregate workloads across multiple departments or geographical locations, the private cloud may not be the answer. A community cloud where multiple organizations/departments share resources might be the answer.

Take for example, the NASA Ames Nebula cloud, the Department of Interior’s federal business center cloud, and the DISA cloud. All of these clouds were originally created for a specific purpose but very quickly they were opened up to other entities within a trusted environment.

A community cloud has huge benefits because it allows the large enterprises to pull resources together from different clouds with the same technology stacks and gain some scalability benefits with the automated provisioning and migration of workloads. You also get the benefits of availability and disaster proofing to some extent. If you have just one private cloud isolated to one department or if it is all housed in one container in one location, it is a potential single point failure. Each of your private clouds could already have has massive capabilities, but if multiple department had three of those things in three different locations and are collaborating and sharing those resources, they not only get the ability to rapidly scale to the resources as needed, but also get disaster recovery, redundancy, and greater cost efficiencies by better utilization of resources at a global scale.

In general, large enterprises and Government agencies will move toward a private cloud and community cloud model. There are still challenges with certain aspects of the model such as the procurement processes and the pay-per-use portion, which make things difficult to implement the community model effectively.

One major hurdle preventing community cloud adoption is procurement and incentives. Many departments of very large enterprises and government agencies do not have a mandate to provide their services to the rest of the agencies. They are focused entirely on their individual departments. When departments/agencies want to share their private cloud with other departments/agencies, they run into a cost recovery problem. Due to the nature of budgeting and procurement policies, companies may not have the mechanisms to accept money from other departments but they do need to share costs with them—certainly not something that any individual department may be willing to do without a restructuring of the procurement, charge-back, and cost-sharing policies.

To hear the summary of the research and other related topics, join us at the “Leaders in the Cloud” Webcast (co-hosted by Ness Technologies) on September 16th, 2010 at 11:00 am EST.

I’m participating in the panel discussion about Stages of On-Demand Success Stories at: A Virtual Summit on SaaS for Growing Enterprises- Join us on 9/28 at 11 am ET- free registration here! Twitter Hash tag: #CloudSummit

2010 Rackspace SaaS Summit – Oct. 7th, San Antonio, TX – Westin La Cantera Resort A full-day event featuring presentations to help you grow your SaaS business from Accenture, Akamai, Boomi, Microsoft, Marketo, Nimsoft, and best-selling author Daniel Pink. Learn Rackspace’s strategy for the SaaS market and see the launch of the new matchmaking engine for businesses and SaaS applications, AppMatcher. In addition to the various presentations, there will be a “Fund My App” session where 10 management teams will be able to pitch to a panel of judges for feedback on their business model and potentially be funded. To register and learn more, visit: Rackspace SaaS Summit.

Kamesh Pemmaraju heads cloud research at Sand Hill Group and he helps companies—enterprises and technology vendors—accelerate their transition to the cloud. He welcomes your comments, opinions, and questions. Drop in a line to kamesh@sandhill.com. For updates on news, views, interviews, webcasts, events, and blog posts, follow me on twitter @kpemmaraju.

Tags:

Cloud Technology Best Practices From the Field – Part II

Fresh From The Blog

Learn More