Aaron Cordova's Blog

Architecting for the Cloud

architecting for the cloud

Amazon just released a new whitepaper outlining best practices for getting the most out of cloud computing. Amazon's on-demand infrastructure services enable organizations to gain access to new computing resources instantly and cheaply. But simply moving existing servers and apps to the cloud leaves most of the potential power untapped. This is because most applications are designed to run on a single machine where all resources are shared. In the whitepaper, author Jinish Varia explains that some refactoring is necessary to truly scale an application to meet demand.

"... you cannot leverage all that scalability in infrastructure if your architecture is not scalable. Both have to work together. You will have to identify the monolithic components and bottlenecks in your architecture, identify the areas where you cannot leverage the on-demand provisioning capabilities in your architecture and work to refactor your application in order to leverage the scalable infrastructure and take advantage of the cloud."

In many cases, this means moving away from traditional databases and servers to more distributed applications. As Jinish notes:

"For example, if the cloud does not provide you with exact or greater amount of RAM in a server, try using a distributed cache like memcached15 or partitioning your data across multiple servers. If your databases need more IOPS and it does not directly

map to that of the cloud, there are several recommendations that you can choose from depending on your type of data and use case. If it is a read-heavy application, you can distribute the read load across a fleet of synchronized slaves. Alternatively, you can use a sharding [10] algorithm that routes the data where it needs to be or you can use various database clustering solutions."

This may be beyond the capability of many IT departments, as refactoring applications to be highly distributed and scalable can be a daunting task, requiring deep insight into algorithmic and data structure theory. Fortunately, however, there are several applications that are emerging to fill the need for such applications. These include a host of new databases that fall under the moniker "NoSQL", including HBase, Voldemort, and MongoDB, projects like memcached - a distributed caching layer, and Hadoop - a distributed file system and processing framework. These are designed to be highly scalable out of the box and support many of the functions that are required of today's web applications.

Moving to a highly scalable architecture is easier than ever with the proliferation of infrastructure services and highly scalable, often open source applications. The transition still requires a hard look at existing applications and not all will be easy to modify for the cloud, requiring a full rewrite. As new application requirements emerge, system architects should start to consider a highly scalable platform or software framework as the foundation, which will provide a much easier path to meet fluctuations and increases in demand. Until applications are built to be distributed, organizations will have a tough time realizing the full potential of the cloud.