In the past couple of years, academic and corporate researchers have begun to see the value of the cloud. Faced with a need to run demanding jobs and to deliver meaningful results as quickly as possible while keeping costs under control, they are now using AWS to run a wide variety of compute-intensive, highly parallel workloads.
Instead of fighting for time on a cluster that must be shared with other researchers, they accelerate their work by launching clusters on demand, running their jobs, and then shutting the cluster down shortly thereafter, paying only for the resources that they consume. They replace tedious RFPs, procurement, hardware builds and acceptance testing with cloud resources that they can launch in minutes. As their needs grow, they can scale the existing cluster or launch a new one.
This self-serve, cloud-based approach favors science over servers and accelerates the pace of research and innovation. Access to shared, cloud-based resources can be granted to colleagues located on the same campus or halfway around the world, without having to worry about potential issues at organizational or network boundaries.
Alces Flight in AWS Marketplace
Today we are making Alces Flight available in AWS Marketplace. This is a fully-featured HPC environment that you can launch in a matter of minutes. It can make use of On-Demand or Spot Instances and comes complete with a job scheduler and hundreds of HPC applications that are all set up and ready to run. Some of the applications include built-in collaborative features such as shared graphical views. For example, here's the Integrative Genomics Viewer (IGV):
Each cluster is launched into a Virtual Private Cloud (VPC) with SSH and graphical desktop connectivity. Clusters can be of fixed size, or can be Auto Scaled in order to meet changes in demand. Once launched, the cluster looks and behaves just like a traditional Linux-powered HPC cluster, with shared NFS storage and passwordless SSH access to the compute nodes. It includes access to HPC applications, libraries, tools, and MPI suites.
We are launching Alces Flight in AWS Marketplace today. You can launch a small cluster (up to 8 nodes) for evaluation and testing or a larger cluster for research.
If you subscribe to the product, you can download the AWS CloudFormation template from the Alces site. This template powers all of the products, and is used to quickly launch all of the AWS resources needed to create the cluster.
EC2 Spot Instances give you access to spare AWS capacity at up to a 90% discount from On-Demand pricing and can significantly reduce your cost per core. You simply enter the maximum bid price that you are willing to pay for a single compute node; AWS will manage your bid, running the nodes when capacity is available at the desired price point.
Running Alces Flight
In order to get some first-hand experience with Alces Flight, I launched a cluster of my own. Here are the settings that I used:
I set a tag for all of the resources in the stack as follows:
I confirmed my choices and gave CloudFormation the go-ahead to create my cluster. As expected, the cluster was all set up and ready to go within 5 minutes. Here are some of the events that were logged along the way:
Then I SSH'ed in to the login node and saw the greeting, all as expected:
After I launched my cluster I realized that this post would be more interesting if I had more compute nodes in my cluster. Instead of starting over, I simply modified my CloudFormation stack to have 4 nodes instead of 1, applied the change, and watched as the new nodes came online. Since I specified the use of Spot Instances when I launched the cluster, Auto Scaling placed bids automatically. Once the nodes were online I was able to locate them from within my PuTTY session:
Then I used the pdsh
(Parallel Distributed Shell command) to check on the up-time of each compute node:
Learn More
This barely counts as scratching the surface; read Getting Started as Quickly as Possible to learn a lot more about what you can do! You should also watch one or more of the Alces videos to see this cool new product in action.
If you are building and running data-intensive HPC applications on AWS, you may also be interested in another Marketplace offering. The BeeGFS (self-supported or support included) parallel file system runs across multiple EC2 instances, aggregating the processing power into a single namespace, with all data stored on EBS volumes. The self-supported product is also available on a 14 day free trial. You can create a cluster file system using BeeGFS and then use it as part of your Alces cluster.
-Jeff;
No comments:
Post a Comment