Evaluate models with the Amazon Nova evaluation container using Amazon SageMaker AI

This blog post introduces the new Amazon Nova model evaluation features in Amazon SageMaker AI. This release adds custom metrics support, LLM-based preference testing, log probability capture, metadata analysis, and multi-node scaling for large evaluations. The new features include: Custom metrics use the bring your own metrics (BYOM) functions to control evaluation criteria for your […]

Beyond the technology: Workforce changes for AI

Workplaces are increasingly integrating AI tools into daily operations, with AI assistants supporting teams, predictive analytics informing strategies, and automation streamlining workflows. AI has moved from experimental technology to standard business practice, changing how work gets done. Organizations need to understand what AI can do and how it affects their workforce to implement it successfully. […]

Enhanced performance for Amazon Bedrock Custom Model Import

You can now achieve significant performance improvements when using Amazon Bedrock Custom Model Import, with reduced end-to-end latency, faster time-to-first-token, and improved throughput through advanced PyTorch compilation and CUDA graph optimizations. With Amazon Bedrock Custom Model Import you can to bring your own foundation models to Amazon Bedrock for deployment and inference at scale. These […]

Amazon SageMaker AI introduces EAGLE based adaptive speculative decoding to accelerate generative AI inference

Generative AI models continue to expand in scale and capability, increasing the demand for faster and more efficient inference. Applications need low latency and consistent performance without compromising output quality. Amazon SageMaker AI introduces new enhancements to its inference optimization toolkit that bring EAGLE based adaptive speculative decoding to more model architectures. These updates make […]

Train custom computer vision defect detection model using Amazon SageMaker

On October 10, 2024, Amazon announced the discontinuation of the Amazon Lookout for Vision service, with a scheduled shut down date of October 31, 2025 (see Exploring alternatives and seamlessly migrating data from Amazon Lookout for Vision blog post). As part of our transition guidance for customers, we recommend the use of Amazon SageMaker AI tools […]

Practical implementation considerations to close the AI value gap

Artificial Intelligence (AI) is changing how businesses operate. Gartner® predicts at least 15% of day-to-day work decisions will be made autonomously through agentic AI by 2028. And 92% of companies are boosting their AI spending, according to McKinsey. But here’s the problem: most companies are yet to realize a positive impact of AI on their […]

Introducing bidirectional streaming for real-time inference on Amazon SageMaker AI

In 2025, generative AI has evolved from text generation to multi-modal use cases ranging from audio transcription and translation to voice agents that require real-time data streaming. Today’s applications demand something more: continuous, real-time dialogue between users and models—the ability for data to flow both ways, simultaneously, over a single persistent connection. Imagine a speech […]

Warner Bros. Discovery achieves 60% cost savings and faster ML inference with AWS Graviton

This post is written by Nukul Sharma, Machine Learning Engineering Manager, and Karthik Dasani, Staff Machine Learning Engineer, at Warner Bros. Discovery. Warner Bros. Discovery (WBD) is a leading global media and entertainment company that creates and distributes the world’s most differentiated and complete portfolio of content and brands across television, film and streaming. With iconic […]

Physical AI in practice: Technical foundations that fuel human-machine interactions

In our previous post, Transforming the physical world with AI: the next frontier in intelligent automation, we explored how the field of physical AI is redefining a wide range of industries including construction, manufacturing, healthcare, and agriculture. Now, we turn our attention to the complete development lifecycle behind this technology – the process of creating intelligent […]

HyperPod now supports Multi-Instance GPU to maximize GPU utilization for generative AI tasks

We are excited to announce the general availability of GPU partitioning with Amazon SageMaker HyperPod, using NVIDIA Multi-Instance GPU (MIG). With this capability you can run multiple tasks concurrently on a single GPU, minimizing wasted compute and memory resources that result from dedicating entire hardware (for example, entire GPUs) to tasks that can under-utilize the resources. By […]