AWS Machine Learning Blog
Category: Expert (400)
Ray jobs on HAQM SageMaker HyperPod: scalable and resilient distributed AI
Ray is an open source framework that makes it straightforward to create, deploy, and optimize distributed Python jobs. In this post, we demonstrate the steps involved in running Ray jobs on SageMaker HyperPod.
Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval
In this post, we discuss best practices for applying LLMs to generate ground truth for evaluating question-answering assistants with FMEval on an enterprise scale. FMEval is a comprehensive evaluation suite from HAQM SageMaker Clarify, and provides standardized implementations of metrics to assess quality and responsibility. To learn more about FMEval, see Evaluate large language models for quality and responsibility of LLMs.
LLM continuous self-instruct fine-tuning framework powered by a compound AI system on HAQM SageMaker
In this post, we present the continuous self-instruct fine-tuning framework as a compound AI system implemented by the DSPy framework. The framework first generates a synthetic dataset from the domain knowledge base and documents for self-instruction, then drives model fine-tuning through SFT, and introduces the human-in-the-loop workflow to collect human and AI feedback to the model response, which is used to further improve the model performance by aligning human preference through reinforcement learning (RLHF/RLAIF).
Achieve ~2x speed-up in LLM inference with Medusa-1 on HAQM SageMaker AI
Researchers developed Medusa, a framework to speed up LLM inference by adding extra heads to predict multiple tokens simultaneously. This post demonstrates how to use Medusa-1, the first version of the framework, to speed up an LLM by fine-tuning it on HAQM SageMaker AI and confirms the speed up with deployment and a simple load test. Medusa-1 achieves an inference speedup of around two times without sacrificing model quality, with the exact improvement varying based on model size and data used. In this post, we demonstrate its effectiveness with a 1.8 times speedup observed on a sample dataset.
Security best practices to consider while fine-tuning models in HAQM Bedrock
In this post, we implemented secure fine-tuning jobs in HAQM Bedrock, which is crucial for protecting sensitive data and maintaining the integrity of your AI models. By following the best practices outlined in this post, including proper IAM role configuration, encryption at rest and in transit, and network isolation, you can significantly enhance the security posture of your fine-tuning processes.
Implementing login node load balancing in SageMaker HyperPod for enhanced multi-user experience
In this post, we explore a solution for implementing load balancing across login nodes in Slurm-based HyperPod clusters. By distributing user activity evenly across all available nodes, this approach provides more consistent performance, better resource utilization, and a smoother experience for all users. We guide you through the setup process, providing practical steps to achieve effective load balancing in your HyperPod clusters.
Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale
This post dives deep into how to set up data governance at scale using HAQM DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product. It enables different business units within an organization to create, share, and govern their own data assets, promoting self-service analytics and reducing the time required to convert data experiments into production-ready applications.
Build a reverse image search engine with HAQM Titan Multimodal Embeddings in HAQM Bedrock and AWS managed services
In this post, you will learn how to extract key objects from image queries using HAQM Rekognition and build a reverse image search engine using HAQM Titan Multimodal Embeddings from HAQM Bedrock in combination with HAQM OpenSearch Serverless Service.
Super charge your LLMs with RAG at scale using AWS Glue for Apache Spark
In this post, we will explore building a reusable RAG data pipeline on LangChain—an open source framework for building applications based on LLMs—and integrating it with AWS Glue and HAQM OpenSearch Serverless. The end solution is a reference architecture for scalable RAG indexing and deployment.
Genomics England uses HAQM SageMaker to predict cancer subtypes and patient survival from multi-modal data
In this post, we detail our collaboration in creating two proof of concept (PoC) exercises around multi-modal machine learning for survival analysis and cancer sub-typing, using genomic (gene expression, mutation and copy number variant data) and imaging (histopathology slides) data. We provide insights on interpretability, robustness, and best practices of architecting complex ML workflows on AWS with HAQM SageMaker. These multi-modal pipelines are being used on the Genomics England cancer cohort to enhance our understanding of cancer biomarkers and biology.