The structure, culture, and skills of an organization are closely tied to its enterprise architecture and choice of technologies, whether they are internal or cloud-based. As pointed out in our Leaders in the Cloud research study, corporations will migrate to the cloud incrementally. The migration path to a successful cloud adoption is strewn with many technical challenges and pitfalls. In this series, I will share with you best practices that I have collected from my work with cloud leaders in the field. These best practices will mitigate the many risks and challenges associated with adoption of cloud in your enterprise.
Don’t underestimate the value of your own internal IT team
I have pointed in my earlier blog post The Cloud’s Impact on IT that some IT positions will be made redundant in the cloud and some IT skills will morph to adapt to the new world.
However, the importance of retaining internal experts must be balanced against cost savings. If you are using IaaS solutions from popular cloud vendors you will be offered a default insecure OS build with no tools and no services on top of it. You don’t get 24/7 system admin support and other things you would normally expect from an internal IT infrastructure and system administration teams.
“Maintaining an internal system administrator has critical business knowledge of an important application which would not be available outside. What’s the process name? How do we restart it?” – CIO, Financial Services Company
The CIO of a financial services company listed the benefits of maintaining an internal infrastructure team going so far as saying that the internal team can sell these services for cheaper:
- An authentication environment that is aligned with internal mechanisms
- Access to NFS mount points
- Automated backup
- 24/7 system administration support and service
- Management and replicate change
- Monitoring and Alerting
- Governance and Security
- Guaranteed service levels and performance
With that said, there are a number of common commodity tasks that are used that can be automated and/or outsourced to reduce costs.
Think differently about application architecture
To begin, applications must be built on top of modern technology stacks using modern application architectural concepts. When you move to the cloud however, a shift is required in the development paradigm. A typical IaaS provider would say, “Build your application so that it is resilient to failure.” So, you should build in capabilities to recover another way if the system goes down. It is easy to spin up another instance using the interfaces that vendors provide into their environment.
Easy recovery comes with a hidden caveat, though. You need to modify how your applications preserve state because the data you store in your application is lost when that instance goes down and you bring up a brand new instance. Your application architecture should treat permanent storage mechanisms and the database itself as the authority of state in your system.
“Through load balancing and other mechanisms, if things start failing, you start up other instances (within a very short time—very difficult to do internally). As long as applications preserves state, you are back up and running without any loss. But our model was that we were using commercial off-the-shelf software without redesigning anything. How you recover (exception processing) is really a different process in [cloud vendor] than what you would do internally.” – Enterprise architect, petroleum company
Another key architectural consideration is to ensure that your transactional logic and integrity is preserved in a multi-server cloud environment so that you can seamlessly take advantage of the cloud’s ability to scale dynamically with demand variations.
Recently, Newsweek moved their entire mission-critical infrastructure to Amazon. By being smarter about their architecture, Newsweek was not only able to take advantage of cloud’s elasticity for managing unpredictable and peak traffic demands, but was also able to build a fault-resilient architecture to achieve high-levels of reliability. The CIO at Newsweek had this to say about their architecture:
“We took an in-depth look at how Amazon architected their infrastructure. We discovered that if we engineered our own fault-tolerant and resilient layer on top of Amazon, we can get to whatever availability we need. It’s definitely surmountable by just being smarter about it. Our architecture and Amazon’s infrastructure is holding up nicely without any noticeable hiccups. Every week we are learning and getting smarter about building-in more fault-tolerance and resilience into our architecture. Our goal is to be really bulletproof by the end of the summer.”
See my blog post In Conversation with Newsweek CIO for more details.
Analyze your existing workloads to make sure they are suitable for cloud deployments
First, you need to identify candidate applications based on your business needs and practical and technical considerations (see Do You Need a Cloud Strategy? for a classification scheme). Are these applications virtualized in your data center? The economics of the move to the cloud only makes sense when you compare the costs in the cloud with the costs of running your workloads in a virtualized internal environment. However, this analysis is not as straightforward as it seems. You will need to perform a more rigorous analysis of the workloads and examine several factors such as SLA requirements, latencies, regulatory requirements, access to sensitive data, and the size of the data being processes, operational windows, utilization patterns, and so on to fully understand the cloud alternatives.
When it comes to workloads processing large data sets, you need to really consider whether the cloud model is the most cost—and performance—efficient. In the past, you centralized the large data for data management; you had compute services or the servers themselves at the edge of that internal data “cloud” and a SAN would connect these things together. For a while, that was the right model on how to deal with large data sets. Then you got to the limitations of data bandwidth—you don’t want to deal with 2GB/s speeds now, you want 30 Gbits/sec, the kind of speed you get on PCI buses inside those powerful servers—speeds you simply don’t get with a SAN. When you are dealing with data at Petabyte scale, you have to think of data as not being dynamic that is flowing through the servers but that the data is now static and the compute resource has to flow through the data, just the other way around. Now you have to think how much of the data is static in that definition and that data has to be anchored to a datacenter simply because the ability to handle those Petabytes of data externally just won’t happen.
Second, when you are analyzing your internal workloads, you may need to consider hardware constraints even in a virtualized environment. For example, you may not want two virtualized workloads to run on the same physical hardware for reasons of redundancy or security. Or you may want those workloads running in different locations for disaster recovery etc.
Third, you need to thoroughly understand the utilization patterns of your workloads. The promised pay-as-you-go paradigm can be deceiving because you are paying for allocated capability not for actual use at a granular level. Most cloud environments require that you pay for an entire instance (compute or storage) even if you are using only a fraction of it. If your workload has tons of unused whitespace, you may end up overpaying for it in the cloud. If you analyze your application utilization patterns properly, you could pool this whitespace in your datacenter and run many workloads on fewer servers.
Kamesh Pemmaraju heads cloud research at Sand Hill Group and he helps companies—enterprises and technology vendors—accelerate their transition to the cloud. He welcomes your comments, opinions, and questions. Drop in a line to firstname.lastname@example.org. For updates on news, views, interviews, webcasts, events, and blog posts, follow me on twitter @kpemmaraju.