Elevating LLM Observability with HAQM Bedrock and Dynatrace

By Kristof Muhi, Principal Product Manager – Dynatrace
By Varun Jasti, Solutions Architect – AWS
By Shashiraj Jeripotula, Principal Partner Solutions Architect – AWS

Dynatrace

Introduction

Organizations leveraging HAQM Bedrock for their generative AI applications need to ensure reliable, secure, and responsible AI operations at scale. As these applications become integral to business processes, implementing comprehensive Large Language Model (LLM) observability becomes essential. Monitoring model performance, detecting hallucinations, prompt injections, toxic language, and PII leakage while also tracking latency, drift, data lineage, and maintaining cost control are some critical use cases. By implementing robust observability practices, teams can gain deep insights into their LLM applications’ behavior, optimize resource utilization, ensure consistent response quality, and maintain compliance with governance requirements.

Dynatrace is an all-in-one observability platform that automatically collects production insights, traces, logs, metrics, and real-time application data at scale. With the powerful AI engine (Davis AI), Dynatrace alerts the team about production level issues before they disrupt users, helps to predict resource usage and costs, performance issues, and delivers guardrails that protect data and maintain compliance.

In this post, we explain how Dynatrace provides end-to-end monitoring and visibility into generative AI applications utilizing HAQM Bedrock models allowing for comprehensive LLM observability.

LLM Observability Use Cases

Dynatrace helps with the following LLM and generative AI observability use cases at scale.

Complexity of Multi-Model Tracing

Multi-model tracing presents hidden complexities as interactions between models with different architectures, output formats, and latency profiles must be correlated coherently across the entire chain. When diverse models operate in sequence, errors can silently cascade through these heterogeneous systems, making root cause analysis especially challenging without standardized telemetry that can effectively connect the dots between varying inputs and outputs.

Dynatrace enables end-to-end tracing across various models, connecting the frontend and backend components of the application stack.

This multi-model tracing provides complete visibility and tracing of events, allowing you to understand what happened when an issue occurred or when an invalid response was sent to the customer in the whole model chain.

Predictive Operations of AI Workloads – cost and performance

Predictive operations leverage advanced analytics and machine learning to anticipate and optimize AI workload behavior before issues impact business operations powered by Davis AI. This proactive approach transforms traditional monitoring into forward-looking operational intelligence for AI systems.

Cost projection and optimization: Forecasting token usage, API calls, and associated costs in HAQM Bedrock to enable better budget planning and resource allocation for the future. Efficient budget planning with accurate cost forecasting, which helps to reduce operational costs through better resource planning and allocation.
Performance degradation prediction: Identifying early warning signs of potential model performance issues through pattern recognition.
Anomaly and problem detection: Using Dynatrace predictive Davis AI to spot unusual patterns in model behavior that could indicate emerging problems, peak usage in the architecture. This will minimize service and application downtime by addressing potential issues before they become critical

Guardrail Analysis

Guardrail analysis focuses on monitoring and enforcing safety boundaries around AI systems to ensure they operate within defined ethical, security, and performance parameters. This critical capability helps organizations maintain control over their AI applications while protecting against potential risks and misuse.

Key components like, real-time detection of prompt injection attempts and security vulnerabilities to protect your business and your customers data. This allows organizations to track unauthorized PII exposure and sensitive data leakage.
Guardrails allows you to monitor for toxic and inappropriate language, harmful content, and biased responses with early threat detection. Enables to improve model reliability through consistent boundary enforcement and compliance easily
Guardrail analysis safeguards AI applications by protected brand reputation by preventing inappropriate responses and inquiries.

Data Governance, Compliance and Audit

In the context of applications built using LLMs on HAQM Bedrock, governance, compliance, and audit capabilities through observability ensure organizations maintain control, transparency, and accountability for their generative AI applications while meeting regulatory requirements and industry standards.

Dynatrace helps to track every input and output for a full audit trail. It enables you to query all data in real-time and store it for future reference. It is easy to set up and maintain full data lineage from prompt to response across the whole pipeline and collect all the evidence for responsible AI practices and regulatory reporting like the FIPS, FedRAMP, and EU AI Act.

Dynatrace’s model fingerprinting capability creates unique identifiers for LLM versions based on architecture, training data, and parameters, enabling precise version tracking for regulatory and auditing compliance. This precise tracking of model versions via model fingerprinting to meet regulatory and auditing requirements

Dynatrace + Layers of LLM Observability

Observing LLMs requires an extensive approach that spans multiple layers, from the user-facing application to the underlying infrastructure. Each layer plays a crucial role in understanding LLM performance, identifying bottlenecks, ensuring reliable operation, and detecting potential security risks. Dynatrace provides a unified end-to-end observability platform that can help organizations gain deep insights into each of these layers enabling them to effectively monitor, optimize, troubleshoot, and secure their LLM-powered applications.

Figure 1 : Bedrock observability pipeline of travel app running on Kubernetes utilizing Dynatrace

In our example, the application runs in a Kubernetes cluster. Traceloop‘s OpenLLMetry enhances LLM observability for HAQM Bedrock models by capturing critical AI-specific KPIs. It enriches OpenTelemetry data and integrates seamlessly with Dynatrace. This provides a holistic view of LLM application performance in production environments. Ultimately, it empowers businesses to optimize and scale their AI deployments effectively.

Figure 2 : High-level overview of the different layers of instrumenting generative AI applications for observability

From the diagram above (Figure 2), Dynatrace provides end-to-end visibility of AI applications through the entire technology stack.

Application Layer: Dynatrace monitors user-facing applications interacting with LLMs, tracking performance metrics, user experience, and usage patterns. Continuous data collection for the whole architecture (frontend, backend, generative AI stack) reveals real-time application behavior, while logging captures user interactions and application-specific errors for debugging. Visualization tools offer customizable dashboards to track key application metrics and identify trends or anomalies related to LLM integration.
Orchestration Layer – Monitoring orchestration frameworks performance:
These frameworks (e.g., LangChain, LlamaIndex) manage prompt workflows and pipeline integrations. Dynatrace observes these workflows, providing metrics on prompt engineering effectiveness, chain performance, and caching. Its anomaly detection capabilities alert teams to potential issues, bottlenecks, ensuring smooth operation of AI-driven processes.
Semantic Layer and vector databases: Analyzes the meaning and content of LLM inputs and outputs. This involves understanding the relationships between concepts, tracking sentiment, and identifying potential biases, inaccuracies, or anomalies along with performance bottlenecks in Retrieval Augmented Generation (RAG) architectures using vector databases (e.g., Pinecone, Milvus, Weaviate, Qdrant, Chroma). Dynatrace’s extensibility allows for integration with semantic analysis tools. By ingesting data from vector databases, Dynatrace can provide a unified view of LLM outputs, including sentiment analysis, topic modeling, and bias detection. This data can then be visualized and analyzed using Dynatrace’s Metrics & Performance Analysis and Visualization capabilities.
Model Layer – Observe model providers and platform providers:
Dynatrace monitors token usage, stability, latency, throughput, resource consumption, and model drift in HAQM Bedrock. Metrics & Performance Analysis allows for deep dives into model performance, tracking metrics like latency, throughput, and resource consumption. Model fingerprinting supports detailed version tracking, while anomaly detection flags significant changes in performance or output quality—helping teams understand and optimize model behavior.
Infrastructure Layer:
Dynatrace’s full-stack observability extends to compute (e.g., HAQM EC2, NVIDIA GPU) and network resources. It automatically captures CPU/GPU utilization, memory usage, and other vital statistics. Real-time anomaly detection with Davis AI helps teams swiftly address hardware bottlenecks that could impact LLM performance.

“From initial model assessment through production deployment, comprehensive monitoring is crucial for Generative AI systems. The integration between Dynatrace and HAQM Bedrock enables organizations to effortlessly track key performance indicators and trace data, ensuring their AI applications remain optimized and operate reliably.” – Denis Batalov, Tech Leader, ML & AI, AWS.

“Generative AI is rapidly becoming the standard for customer experience, pushing companies to deliver AI-native interactions at speed. At the same time, we’re witnessing an accelerated evolution of AI systems, resulting in exponentially increasing capabilities. However, deploying these highly complex AI application stacks in production presents significant challenges. AI observability plays a critical role in ensuring reliable performance, enhancing customer satisfaction, and driving measurable ROI for businesses.” – Alois Reitbauer VP Chief Technology Strategist, Dynatrace

Summary

In this blog post, we discussed how Dynatrace can give enhanced visibility into generative AI applications leveraging HAQM Bedrock.

Taking advantage of Dynatrace’s capabilities, you can:

Maintain Operational Efficiency: Monitoring latency, resource utilization, and costs to optimize performance and control expenses.
Accelerate Your Path to Production: Deploy reliable and secure AI applications faster with confidence.
Make Data-Driven Decisions: Leverage comprehensive data to inform model selection, fine-tuning, and risk mitigation strategies.
Improve Application Reliability: Proactively identify and resolve performance bottlenecks and other issues before they impact users.
Enhance Your Security Posture: Detect and mitigate security risks in real-time, protecting your data and your reputation.
Strengthen Governance and Compliance: Maintain clear data lineage and ensure responsible AI practices, meeting regulatory requirements and ethical standards.

For instructions on how to set up Dynatrace end-to-end instrumentation solution with HAQM Bedrock, please refer to the following Dynatrace blog.

.
.

Dynatrace – AWS Partner Spotlight

Dynatrace is an AWS Advanced Technology Partner and AWS Competency Partner that provides software intelligence to simplify cloud complexity and accelerate digital transformation. With advanced observability, AI, and complete automation, our all-in-one platform provides answers, not just data, about the performance of applications, the underlying infrastructure, and the experience of all users.

Contact Dynatrace|Partner Overview|AWS Marketplace

AWS Partner Network (APN) Blog