AWS Storage Blog

Tag: machine learning

Unlock higher performance for file system workloads with scalable metadata performance on HAQM FSx for Lustre

Imagine a company like a movie studio, one that works with enormous volumes of video files, scripts, and animation assets. They store these files on a high-performance file system such as HAQM FSx for Lustre, a fully managed shared storage built on the world’s most popular high-performance file system. Each file has metadata, such as […]

HAQM S3 featured image - new

Accelerate HAQM S3 throughput with the AWS Common Runtime

Data is at the center of every machine learning pipeline. Whether pre-training foundation models (FMs), fine-tuning FMs with business-specific data, or serving inference queries, every step of the machine learning lifecycle needs low-cost, high-performance data storage to keep compute resources busy and performing useful work. Customers use HAQM Simple Storage Service (HAQM S3) to store training data […]

Best practices for monitoring HAQM FSx for Lustre clients and file systems

Lustre is a high-performance parallel file system commonly used in workloads requiring throughput up to hundreds of GB/s and sub-millisecond per-operation latencies, such as machine learning (ML), high performance computing (HPC), video processing, and financial modelling. HAQM FSx for Lustre provides fully managed shared storage with the scalability and performance of the popular Lustre file […]

Machine Learning with Kubeflow on HAQM EKS with HAQM EFS

Training Machine Learning models involves multiple steps, it gets more complex and time consuming when the size of the data set for training is in the range of 100s of GBs. Data Scientists run through large number of experiments and research which includes testing and training large number of models. Kubeflow provides various ML capabilities […]

Novetta delivers IoT and Machine Learning to the edge for disaster response

During disaster response, maintaining high safety standards is paramount. Knowing the location of personnel, vehicles, and equipment is critical to maximizing the effectiveness of first responders and supporting safety efforts. Tracking the location of these resources across a complex response, with participation from multiple organizations, can be challenging. Responding organizations often lack interoperability with local, […]