Artificial intelligence and machine learning (AI and ML) are key technologies that help organizations develop new ways to increase sales, reduce costs, streamline business processes, and better understand their customers. AWS helps customers accelerate their AI / ML adoption by providing powerful computing, high-speed networking, and scalable high-performance storage options tailored to the needs of any machine learning project. This reduces the barriers to entry for companies seeking to use the cloud to scale their ML applications.
Developers and data scientists are pushing the boundaries of technology and increasingly taking in in-depth learning, a type of machine learning based on neural network algorithms. These deep learning models are larger and more sophisticated which increases the cost of running the underlying infrastructure for training and deploying these models.
To enable customers to accelerate their AI / ML conversion, AWS is building high-performance and low-cost machine learning chips. AWS Inferentia is the first machine learning chip built from the ground up by AWS for the lowest cost machine learning interface in the cloud. In fact, the Amazon EC2 Inf1 instances, powered by Inferentia, offer 2.3x higher performance and up to 70% lower cost for machine learning estimates than current generation GPU-based EC2 instances. AWS Trainium is AWS ‘second machine learning chip designed for in-depth learning model training and will be available by the end of 2021.
Customers across the industry have placed their ML applications in production at Infrantia and have seen significant performance improvements and cost savings. For example, AirBnB’s customer support platform provides intelligent, scalable, and exceptional service experiences to its millions of host and guest communities around the world. It used the Inferentia-based EC2 Inf1 instance to establish the Natural Language Processing (NLP) model that supports its chatbots. This has led to a 2x improvement in out-of-the-box performance compared to GPU-based instances.
With this innovation from Silicon, AWS is enabling customers to easily deliver high-performance and significantly low-cost throughput training to their deep learning models in production.
Machine learning challenges speed change in cloud-based infrastructure
Machine learning is a repetitive process that requires teams to quickly create, train, and deploy applications, as well as training, retraining, and frequently testing models to increase prediction accuracy. When placing trained models in their business applications, companies need to scale their applications to serve new users around the world. To ensure a superior user experience they need to be able to serve multiple requests that come at the same time, including real-time latency near them.
In emerging uses such as Object Detection, Natural Language Processing (NLP), Image Classification, Conversational AI and Time Series data rely on deep learning technology. Deep learning models are rapidly increasing in size and complexity, going from millions of parameters to billions in a few years.
Training and deploying these complex and sophisticated models translates into significant infrastructural costs. Costs can quickly snowball as companies scale their applications to provide a real-time experience for their users and customers.
This is where cloud-based machine learning infrastructure services can help Provides on-demand access to cloud computing, high-performance networking, and large data storage, integrates seamlessly with ML operations and high-level AI services, enabling organizations to quickly start and scale their AI / ML initiatives.
How AWS is helping customers accelerate their AI / ML conversion
The goal of AWS Inferentia and AWS Trainium is to democratize machine learning and make it accessible to developers regardless of experience and organization size. Inferentia’s design is optimized for high performance, throughput and low latency, which makes it ideal for setting ML estimates on a scale.
Each AWS Inferentia chip has four neuroncores that apply a high-performance systolic array matrix multiply engine, which greatly speeds up general deep learning activities such as convolutions and transformers. Neuroncores are equipped with a large on-chip cache, which helps reduce external memory access, reduce latency, and increase throughput.
AWS Neuron, Inferentia’s software development kit, natively supports native ML frameworks such as TensorFlow and PyTorch. Developers can continue to use the same framework and lifecycle development tools that they know and like. For many of their trained models, they can compile and deploy them in Inferentia by simply changing one line of code without changing any additional application code.
The result is setting a high-performance estimate, which can be easily scaled while keeping costs under control.
Sprinklr, a software-in-service company has an AI-powered unified customer experience management platform that enables companies to collect real-time customer feedback across multiple channels and translate it into effective insights. This results in active problem solving, improved product development, improved content marketing and better customer service. Sprinklr has used Inferentia to establish its NLP and some of its computer vision models and has seen significant performance improvements.
Several Amazon services also set up their machine learning models in Infernia.
Prime Video uses the Computer Vision ML model to analyze the video quality of Amazon Prime Video live events to ensure an optimal viewer experience for Prime Video members. It has placed its image classification ML models in EC2 Inf1 instances and has seen 4x improvement in performance and up to 40% savings in cost compared to GPU-based instances.
Another example is Amazon Alexa’s AI and ML-based intelligence, powered by Amazon Web Services, which is currently available on more than 100 million devices. Alex’s promise to customers is that it’s always getting smarter, more conversational, more active and even more enjoyable. Delivering on that promise requires continuous improvement in response time and cost of machine learning infrastructure. By installing the Alexa text-to-speech ML models in the Inf1 instance, it was able to reduce estimated latency by 25% and cost-per-estimate by 30% to enhance the service experience of millions of customers using Alexa per month.
Revealing new machine learning capabilities in the cloud
As companies strive to prove the future of their business by enabling the best digital products and services, no organization can lag behind in setting up sophisticated machine learning models to help their customers innovate. Over the past few years, the application of machine learning in a wide variety of uses, from personalization and churning predictions to fraud detection and supply chain forecasting, has grown exponentially.
Fortunately, the machine learning infrastructure in the cloud is revealing new capabilities that were not possible before, making it even more accessible to non-specialist practitioners. That’s why AWS customers are already using the Inferentia-powered Amazon EC2 Inf1 example to provide intelligence behind their recommendation engines and chatbots and gain effective insights from customer feedback.
With options of AWS cloud-based machine learning infrastructure suitable for different skill levels, it is clear that any organization can accelerate innovation and embrace the entire machine learning lifecycle on a scale. As machine learning becomes more widespread, companies are now able to fundamentally transform the customer experience — and the way they do business – through cost-effective, high-performance cloud-based machine learning infrastructure.
Learn more about how AWS’s machine learning platform can help your company innovate.
This content is produced by AWS. It was not written by the editorial staff of MIT Technology Review.