Adventure Through the Amazon: 2015

Thursday, December 31, 2015

AWS Podcasts – Banter, Predicsis, Peak, Urban Massage, Contentful, Babbel, and Intel IoT

For my final batch of 2015 podcasts, I spoke with representatives of Banter, Predicsis, Peak, Urban Massage, Contentful, Babbel, and Intel IoT. As always, the “Episode” links go directly to the audio files. You can also find several subscription options on the AWS Podcast page.

Episode 127
For Episode 127, I interviewed Diego Villareal (CEO and Co-founder) of Banter!, a discovery platform app that provides real-time updates on nightlife options. In the podcast they discussed how to differentiate from competitors, strategies for building your app’s audience, and how Banter! has evolved since it launched on AWS.

Episode 128
For Episode 128, I interviewed Jean Louis Fuccellaro (CEO) and Bastien Murzeau (CTO) of Predicsis, a French-based startup. Predicsis offers businesses the opportunity to enhance customer performance through machine learning and big data. Listen as they discuss how the guys decided to tackle the problem of customer churn with machine learning and AWS.

Episode 129
For Episode 129, I interviewed Bertrand Lamarque (Director of Engineering) and Itamar Lesuisse (CEO) of Peak, a London-based startup bringing the latest advancements in brain training to mobile. Learn how Peak has changed since they started, and how they use data to make decisions about their product.

Episode 130
For Episode 130, I interviewed Giles Williams, co-founder and CTO of Urban Massage, a UK-based on-demand massage service app. We discused how Urban Massage is changing the online massage booking space, the company’s origins, how the business works, and how AWS powers their application behind the scenes.

Episode 131
For Episode 131, I interviewed interviews Paolo Negri co-founder and CTO of Contentful, a Berlin-based startup that has their API first content management system running on AWS. Listen in as we discuss the API first development model, what’s next for Contentful, and advice for aspiring and would-be entrepreneurs.

Episode 132
For Episode 132, I interviewed Boris Diebold (EVP Engineering) and Christian Hillemeyer (Director PR) of Babbel, a language learning company running on AWS. Listen as they discuss how Babbel started, the pivot from their original idea, and how their company makes learning a new language fun, easy, and effective.

Episode 133
For Episode 133, I interviewed Rose Schooler, vice president of the Internet of Things Group and general manager of IoT Strategy and Technology Office at Intel Corporation. We discussed the current state of IoT, practical applications, and business benefits. Learn how IoT is solving real, tangible business problems, with specific use cases such as an IoT-connected rice farm! You can hear how Intel’s Internet of Things Group offers products and solutions to help IoT become a reality for customers in three areas: things, network, and cloud.

Thanks Again
Wrapping up 2015, I would like to thank all of my guests, and all of my colleagues on the AWS Podcast team for their patience, hard work, support, and enthusiasm! We are working on our plans for 2016, so (as always) stay tuned for more!

— Jeff;

Monday, December 28, 2015

AWS Podcasts – Aerobatic, Aire, Prairie Cloud, and Osper

After some experimentation, I am now somewhat proficient at recording podcasts remotely! In fact, three of the four podcasts below were recorded over Skype with the aid of Zencastr.

This time around, I spoke with representatives of Aerobatic, Aire, Prairie Cloud, and Osper. As usual, the “Episode” links go directly to the audio files.

Episode 123
For Episode 123, I spoke with David Von Lehman and Jason Gowans, cofounders of Seattle-based startup Aerobatic. Aerobatic is a static-hosting platform for Bitbucket that delivers easy git push deployment for static HTML websites. David and Jason share their best practices for using S3, Lambda, Activate, and other AWS services to run their company successfully. Finally, we chat about their expansion plans for the future and how they plan to scale in the years to come. The guys participated in the AWS Activate Startup Pitch Event and Networking Mixer in Seattle earlier this month – don’t miss their promo code at the end of the podcast for a free first month of usage!

Episode 124
For Episode 124, I interviewed Tim Kimball, the VP of Engineering at UK startup Aire. Aire’s mission is to bring more people into the financial system who would otherwise have a difficult time entering. Aire lowers the barrier to entry to financial services by enhancing credit scores and ensuring accuracy in credit modeling, thus helping to solve the financial inclusion problem. Tim talks about how he and his team got into the space, their experience with the FinTech accelerator, and how they hope to make significant change in the financial industry. After talking a bit about the business of being a financial services startup, we chat about the technical systems and processes that make Aire secure and successful.

Episode 125
For Episode 125, I interviewed Doug Parr, CRO of innovative payment startup company Prairie Cloudware. Prairie Cloudware enables banks and credit unions to utilize digital payments (i.e., mobile wallets), rather than rely on third-party partners like Apple Pay or Samsung Pay. Prairie Cloudware is designed to provide financial institutions with the ability to deliver secure customer-controlled digital payment services to customers. Doug and I talk about the appeal of mobile-driven services, entering the startup market, and how his team got started in the payment industry. We also discuss how AWS can deliver a more secure service to ensure customer satisfaction, and the various features that allow Prairie Cloudware to deliver on its security promise.

Episode 126
For Episode 126, I spoke with Nico Esteves, VP of Engineering of mobile banking startup Osper. Osper is mobile banking built just for families, with tools for young people to learn how to manage their finances. Parents can set up allowances for their kids, control online spending, and get notifications for failed transactions or insufficient funds. Children can check their balance, learn how to save responsibility, and transfer money to family and friends. Nico shares his insights on the future of mobile banking, how he got into the space, and the security services that allow Osper to grow and succeed.

Special Thanks
Once again, thanks are due to my awesome colleagues on the AWS Podcast team. These episodes were brought to you by:

Melissa Higa – Program Management.
Sarah Silverstein – Editing, content management, and show notes.
Ashwin Rodrigues – Editing, Content management, and show notes.

— Jeff;

Tuesday, December 22, 2015

New – Launch Amazon EMR Clusters in Private Subnets

My colleague Jon Fritz wrote the guest post below to introduce you to an important new feature for Amazon EMR.

— Jeff;

Today we are announcing that Amazon EMR now supports launching clusters in Amazon Virtual Private Cloud (VPC) private subnets, allowing you to quickly, cost-effectively, and securely create fully configured clusters with Hadoop ecosystem applications, Spark, and Presto in the subnet of your choice. With Amazon EMR release 4.2.0 and later, you can launch your clusters in a private subnet with no public IP addresses or attached Internet gateway. You can create a private endpoint for Amazon S3 in your subnet to give your Amazon EMR cluster direct access to data in S3, and optionally create a Network Address Translation (NAT) instance for your cluster to interact with other AWS services, like Amazon DynamoDB and AWS Key Management Service (KMS). For more information on Amazon EMR in VPC, visit the Amazon EMR documentation.

Network Topology for Amazon EMR in a VPC Private Subnet
Before launching an Amazon EMR cluster in a VPC private subnet, please make sure you have the required permissions in your EMR service role and EC2 instance profile, and that you have a route (either through a route from your subnet to an S3 endpoint in your VPC or a NAT/Proxy instance) to the required S3 buckets for your cluster’s initialization. Click here for more information about configuring your subnet.

You can use the new VPC Subnets page in the EMR Console to view the VPC subnets available for your clusters, and configure them by adding S3 endpoints and NAT instances:

Also, here is a sample network topology for an Amazon EMR cluster in a VPC private subnet with a S3 endpoint and NAT instance. However, if you do not need to use your cluster with AWS services besides S3, you do not need a NAT instance to provide a route to those public endpoints:

Encryption at Rest for Amazon S3 (with EMRFS), HDFS, and Local Filesystem
A typical Hadoop or Spark workload on Amazon EMR utilizes Amazon S3 (using the EMR Filesystem – EMRFS) for input datasets/output results and two filesystems located on your cluster: the Hadoop Distributed Filesystem (HDFS) distributed across your cluster and the Local Filesystem on each instance. Amazon EMR makes it easy to enable encryption for each filesystem, and there are a variety of options depending on your requirements:

Amazon S3 Using the EMR Filesystem (EMRFS) – EMRFS supports several Amazon S3 encryption options (using AES-256 encryption), allowing Hadoop and Spark on your cluster to performantly and transparently process encrypted data in S3. EMRFS seamlessly works with objects encrypted by S3 Server-Side Encryption or S3 client-side encryption. When using S3 client-side encryption, you can use encryption keys stored in the AWS Key Management Service or in a custom key management system in AWS or on-premises.
HDFS Transparent Encryption with Hadoop KMS – The Hadoop Key Management Server (KMS) can supply keys for HDFS Transparent Encryption, and it is installed on the master node of your EMR cluster with HDFS. Because encryption and decryption activities are carried out in the client, data is also encrypted in-transit in HDFS. Click here for more information.
Local Filesystem on Each Node – The Hadoop MapReduce and Spark frameworks utilize the Local Filesystem on each slave instance for intermediate data throughout a workload. You can use a bootstrap action to encrypt the directories used for these intermediates on each node using LUKS.

Encryption in Transit for Hadoop MapReduce and Spark
Hadoop ecosystem applications installed on your Amazon EMR cluster typically have different mechanisms to encrypt data in transit:

Hadoop MapReduce Shuffle – In a Hadoop MapReduce job, Hadoop will send data between nodes in your cluster in the shuffle phase, which occurs before the reduce phase of the job. You can use SSL to encrypt this process by enabling the Hadoop settings for Encrypted Shuffle and providing the required SSL certificates to each node.
HDFS Rebalancing – HDFS rebalances by sending blocks between DataNode processes. However, if you use HDFS Transparent Encryption (see above), HDFS never holds unencrypted blocks and the blocks remain encrypted when moved between nodes.
Spark Shuffle – Spark, like Hadoop MapReduce, also shuffles data between nodes at certain points during a job. Starting with Spark 1.4.0, you can encrypt data in this stage using SASL encryption.

IAM Users and Roles, and Auditing with AWS CloudTrail
You can use Identity and Access Management (IAM) users or federated users to call the Amazon EMR APIs, and limit the API calls that each user can make. Additionally, Amazon EMR requires clusters to be created with two IAM roles, an EMR service role and EC2 instance profile, to limit the permissions of the EMR service and EC2 instances in your cluster, respectively. EMR provides default roles using EMR Named Policies for automatic updates, however, you can also provide custom IAM roles for your cluster. Finally, you can audit the calls your account has made to the Amazon EMR API using AWS CloudTrail.

EC2 Security Groups and Optional SSH Access
Amazon EMR uses two security groups, one for the Master Instance Group and one for slave instance groups (Core and Task Instance Groups), to limit ingress and egress to the instances in your cluster. EMR provides two default security groups, but you can provide your own (assuming they have the necessary ports open for communication between the EMR service and the cluster) or add additional security groups to your cluster. In a private subnet, you can also specify the security group added to the ENI used by the EMR service to communicate with your cluster.

Also, you can optionally add an EC2 key pair to the Master Node of your cluster if you would like to SSH to that node. This allows you to directly interact with the Hadoop applications installed on your cluster, or access web-UIs for applications using a proxy without opening up ports in your Master Security Group.

Hadoop and Spark Authentication and Authorization
Because Amazon EMR installs open source Hadoop ecosystem applications on your cluster, you can also leverage existing security features in these products. You can enable Kerberos authentication for YARN, which will give user-level authentication for applications running on YARN (like Hadoop MapReduce and Spark). Also, you can enable table and SQL-level authorization for Hive using HiveServer2 features, and use LDAP integration to create and authenticate users in Hue.

Run your workloads securely on Amazon EMR
Earlier this year, Amazon EMR was added to the AWS Business Associates Agreement (BAA) for running workloads which process PII data (including eligibility for HIPAA workloads). Amazon EMR also has certification for PCI DSS Level 1, ISO 9001, ISO 27001, and ISO 27018.

Security is a top priority for us and our customers. We are continuously adding new security-related functionality and third-party compliance certifications to Amazon EMR in order to make it even easier to run secure workloads and configure security features in Hadoop, Spark, and Presto.

— Jon Fritz, Senior Product Manager, Amazon EMR

PS – To learn more, read Securely Access Web Interfaces on Amazon EMR Launched in a Private Subnet on the AWS Big Data Blog.

Monday, December 21, 2015

EC2 Container Registry – Now Generally Available

My colleague Andrew Thomas wrote the guest post below to introduce you to the new EC2 Container Registry!

— Jeff;

I am happy to announce that Amazon EC2 Container Registry (ECR) is now generally available!

Amazon ECR is a fully-managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images. We pre-announced the service at AWS re:Invent and have been receiving a lot of interest and enthusiasm from developers ever since.

We built Amazon ECR because many of you told us that running your own private Docker image registry presented many challenges like managing the infrastructure and handling large scale deployments that involve pulling hundreds of images at once. Self-hosted solutions, you said, are especially hard when deploying container images to clusters that span two or more AWS regions. Additionally, you told us that you needed fine-grained access control to repositories/images without having to manage certificates or credentials.

Amazon ECR was designed to meet all of these needs and more. You do not need to install, operate, or scale your own container registry infrastructure. Amazon ECR hosts your images in a highly available and scalable architecture, allowing you to reliably deploy containers for your applications. Amazon ECR is also highly secure. Your images are transferred to the registry over HTTPS and automatically encrypted at rest in S3. You can configure policies to manage permissions and control access to your images using AWS Identity and Access Management (IAM) users and roles without having to manage credentials directly on your EC2 instances. This enables you to share images with specific users or even AWS accounts.

Amazon EC2 Container Registry also integrates with Amazon ECS and the Docker CLI, allowing you to simplify your development and production workflows. You can easily push your container images to Amazon ECR using the Docker CLI from your development machine, and Amazon ECS can pull them directly for production deployments.

Let’s take a look at how easy it is to store, manage, and deploy Docker containers with Amazon ECR and Amazon ECS.

Amazon ECR Console
The Amazon ECR Console simplifies the process of managing images and setting permissions on repositories. To access the console, simply navigate to the “Repositories” section in the Amazon ECS console. In this example I will push a simple PHP container image to Amazon ECR, configure permissions, and deploy the image to an Amazon ECS cluster.

After navigating to the Amazon ECR Console and selecting “Get Started”, I am presented with a simple wizard to create and configure my repository.

After entering the repository name, I see the repository endpoint URL that I will use to access Amazon ECR. By default I have access to this repository, so I don’t have to worry about permissions now and can set them later in the ECR console.

When I click Next step, I see the commands I need to run in my terminal to build my Docker image and push it to the repository I just created. I am using the Dockerfile from the ECS Docker basics tutorial. The commands that appear in the console require that I have the AWS Command Line Interface (CLI) and Docker CLI installed on my development machine. Next, I copy and run each command to login, tag the image with the ECR URI, and push the image to my repository.

After completing these steps, I click Done to navigate to the repository where I can manage my images.

Setting Permissions
Amazon ECR uses AWS Identity and Access Management to control and monitor who and what (e.g., EC2 instances) can access your container images. We built a permissions tool in the Amazon ECR Console to make it easier to create resource-based policies for your repositories.

To use the tool I click on the Permissions tab in the repository and select Add. I now see that the fields in the form correspond to an IAM statement within a policy document. After adding the statement ID, I select whether this policy should explicitly deny or allow access. Next I can set who this statement should apply to by either entering another AWS account number or selecting users and roles in the entities table.

After selecting the desired entities, I can then configure the actions that should apply to the statement. For convenience, I can use the toggles on the left to easily select the actions required for pull, push/pull, and administrative capabilities.

Integration With Amazon ECS
Once I’ve created the repository, pushed the image, and set permissions I am now ready to deploy the image to ECS.

Navigating to the Task Definitions section of the ECS console, I create a new Task Definition and specify the Amazon ECR repository in the Image field. Once I’ve configured the Task Definition, I can go to the Clusters section of the console and create a new service for my Task Definition. After creating the service, the ECS Agent will automatically pull down the image from ECR and start running it on an ECS cluster.

Updated First-Run
We have also updated our Amazon ECS Getting Started Wizard to include the ability to push an image to Amazon ECR and deploy that image to ECS:

Partner Support for ECS
At re:Invent we announced partnerships with a number of CI/CD providers to help automate deploying containers on ECS. We are excited to announce today that our partners have added support for Amazon ECR making it easy for developers to create and orchestrate a full, end-to-end container pipeline to automatically build, store, and deploy images on AWS. To get started check out the solutions from our launch partners who include Shippable, Codeship, Solano Labs, CloudBees, and CircleCI.

We are also excited to announce a partnership with TwistLock to provide vulnerability scanning of images stored within ECR. This makes it even easier for developers to evaluate potential security threats before pushing to Amazon ECR and allows developers to monitor their containers running in production. See the Container Partners Page for more information about our partnerships.

Launch Region
Effective today, Amazon ECR is available in US East (Northern Virginia) with more regions on the way soon!

Pricing
With Amazon ECR you only pay for the storage used by your images and data transfer from Amazon ECR to the internet or other regions. See the ECR Pricing page for more details.

Get Started Today
Check out our Getting Started with EC2 Container Registry page to start using Amazon ECR today!

— Andrew Thomas, Senior Product Manager

Friday, December 18, 2015

New – Enhanced Monitoring for Amazon RDS (MySQL 5.6, MariaDB, and Aurora)

Amazon Relational Database Service (RDS) makes it easy for you to set up, run, scale, and maintain a relational database. As is often the case with the high-level AWS model, we take care of all of the details in order to give you the time to focus on your application and your business.

Enhanced Monitoring
Advanced RDS users have asked us for more insight into the inner workings of the service and we are happy to oblige with a new Enhanced Monitoring feature!

After you enable this feature for a database instance, you get access to over 50 new CPU, memory, file system, and disk I/O metrics. You can enable these features on a per-instance basis, and you can choose the granularity (all the way down to 1 second). Here is the list of available metrics:

And here are some of the metrics for one of my database instances:

You can enable this feature for an existing database instance by selecting the instance in the RDS Console and then choosing Modify from the Instance Options menu:

Turn the feature on, pick an IAM role, select the desired granularity, check Apply Immediately, and then click on Continue.

The Enhanced Metrics are ingested into CloudWatch Logs and can be published to Amazon CloudWatch. To do this you will need to set up a metrics extraction filter; read about Monitoring Logs to learn more. Once the metrics are stored in CloudWatch Logs, they can also be processed by third-party analysis and monitoring tools.

Available Now
The new Enhanced Metrics feature is available today in the US East (Northern Virginia), US West (Northern California), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo) regions. It works for MySQL 5.6, MariaDB, and Amazon Aurora, on all instance types except t1.micro and m1.small.

You will pay the usual ingestion and data transfer charges for CloudWatch Logs (see the CloudWatch Logs Pricing page for more info).

— Jeff;

Thursday, December 17, 2015

New – AWS Cost and Usage Reports for Comprehensive and Customizable Reporting

Many of our customers have been asking us for data and tools to allow them to better understand and manage their AWS costs.

New Reports
Today we are introducing a set of new AWS Cost and Usage Reports that provide you with comprehensive data about products, pricing, and usage. The reports allow you to understand individual costs and to analyze them in greater detail. For example, you can view your EC2 costs by instance type and then drill-down in order to understand usage by operating system, instance type, and purchase option (On-Demand, Reserved, or Spot).

The new reports are generated in CSV form and can be customized. You can select the data included in each report, decide whether you want it aggregated across an hour or a day, and then request delivery to one of your S3 buckets, with your choice of ZIP or GZIP compression. The data format is normalized so that each discrete cost component is presented in an exclusive column.

You can easily upload the reports to Amazon Redshift and then run queries against the data using business intelligence and data visualization tools including Amazon QuickSight.

Creating a Report
To create a report, head on over to the AWS Management Console, and choose Billing & Cost Management from the menu in the top-right:

Then click on Reports in the left navigation:

Click on Create report to create your first report:

Enter a name for your report, pick a time unit, and decide whether you want to include Resource IDs (more detail and a bigger file) or not:

Now choose your delivery options: pick an S3 bucket (you’ll need to set the permissions per the sample policy), set a prefix if you’d like, and select the desired compression (GZIP or ZIP):

Click on Next, review your choices, and then create your report. It will become visible on the AWS Cost and Usage Reports page:

A fresh report will be delivered to the bucket within 24 hours. Additional reports will be provided every 24 hours (or less) thereafter.

From there you can transfer them to Redshift using a AWS Data Pipeline job or some code triggered by a AWS Lambda function, and then analyze them using the BI or data visualization tool of your choice.

Visualizing the Data
Here are some sample visualizations, courtesy of Amazon QuickSight. Looking at our EC2 spend by instance type gives an overall picture of our spending:

Viewing it over time shows that spending varies considerably from day to day:

Learn More
To learn more, read about Understanding Your Usage with Billing Reports.

— Jeff;

Wednesday, December 16, 2015

InfoWorld Review – Amazon Aurora Rocks MySQL

Back when I was young, InfoWorld was a tabloid-sized journal that chronicled the growth of the PC industry. Every week I would await the newest issue and read it cover to cover, eager to learn all about the latest and greatest hardware and software. I always enjoyed and appreciated the reviews — they were unfailingly deep, objective, and helpful.

With this as background, I am really happy to be able to let you know that the team at InfoWorld recently put Amazon Aurora through its paces, wrote a detailed review, and named it an Editor’s Choice. They succinctly and accurately summarized the architecture, shared customer feedback from AWS re:Invent, and ran an intensive benchmark, concluding that:

This level of performance is far beyond any I’ve seen from other open source SQL databases, and it was achieved at far lower cost than you would pay for an Oracle database of similar power.

We’re very proud of Amazon Aurora and I think you’ll understand why after you read this review.

— Jeff;

Tuesday, December 15, 2015

EC2 Run Command Update – Now Available for Linux Instances

When we launched EC2 Run Command seven weeks ago (see my post, New EC2 Run Command – Remote Instance Management at Scale to learn more), I promised similar functionality for instances that run Linux. I am happy to be able to report that this functionality is available now and that you can start using it today.

Run Command for Linux
Like its Windows counterpart, this feature is designed to help you to administer your EC2 instances in an easy and secure fashion, regardless of how many you are running. You can install patches, alter configuration files, and more. To recap, we built this feature to serve the following management needs:

A need to implement configuration changes across their instances on a consistent yet ad hoc basis.
A need for reliable and consistent results across multiple instances.
Control over who can perform changes and what can be done.
A clear audit path of what actions were taken.
A desire to be able to do all of the above without the need for unfettered SSH access.

This new feature makes command execution secure, reliable, convenient, and scalable. You can create your own commands and exercise fine-grained control over execution privileges using AWS Identity and Access Management (IAM). All of the commands are centrally logged to AWS CloudTrail for easy auditing.

Run Command Benefits
The Run Command feature was designed to provide you with the following benefits (these apply to both Linux and Windows):

Control / Security – You can use IAM policies and roles to regulate access to commands and to instances. This allows you to reduce the number of users who have direct access to the instances.

Reliability – You can increase the reliability of your system by creating templates for your configuration changes. This will give you more control while also increasing predictability and reducing configuration drift over time.

Visibility – You will have more visibility into configuration changes because Run Command supports command tracking and is also integrated with CloudTrail.

Ease of Use – You can choose from a set of predefined commands, run them, and then track their progress using the Console, CLI, or API.

Customizability – You can create custom commands to tailor Run Command to the needs of your organization.

Using Run Command on Linux
Run Command makes use of an agent (amazon-ssm-agent) that runs on each instance. It is available for the following Linux distributions:

Amazon Linux AMI (64 bit) – 2015.09, 2015.03, 2014.09, and 2014.03.
Ubuntu Server (64 bit) – 14.04 LTS, 12.04 LTS
Red Hat Enterprise Linux (64 bit) – 7.x

Here are some of the things that you can do with Run Command:

Run shell commands or scripts
Add users or groups
Configure user or group permissions
View all running services
Start or stop services
View system resources
View log files
Install or uninstall applications
Update a scheduled (cron) task

You can launch new Linux instances and bootstrap the agent by including a few lines in the UserData like this (to learn more, read Configure the SSM Agent in the EC2 Documentation):

Here’s how I choose a command document (separate command documents are available for Linux and for Windows):

And here’s how I select the target instances and enter in a command or a set of commands to run:

Here’s the output from the command:

Here’s how I review the output from commands that I have already run:

Run a Command Today
This feature is available now and you can start using it today in the US East (Northern Virginia), US West (Oregon), and Europe (Ireland) regions. There’s no charge for the command, but you will be billed for other AWS resources that you consume.

To learn more, visit the Run Command page.

— Jeff;