Streamlining content compliance: Automating media analysis with HAQM Nova

Transform content compliance from a bottleneck to a competitive advantage. Learn how media organizations can reduce content review time using HAQM Bedrock, and HAQM Nova foundation models. Automate the analysis of content across multiple rating systems while maintaining accuracy and reducing costs.

Introduction

Every minute counts in today’s competitive streaming landscape. As content libraries expand and international markets multiply, the challenge of ensuring content compliance has grown. Content teams face:

Escalating volumes of content
Increasingly complex and disparate rating systems across regions
Pressing need to maintain accuracy while accelerating time-to-market

Compliance officers spend countless hours reviewing content frame-by-frame, juggling multiple rating systems, and documenting findings—all while racing against release deadlines. This approach isn’t just time-consuming; it’s expensive, prone to inconsistency, and increasingly unsustainable as content volumes grow.

The recent class of generative AI models includes HAQM Nova, a new generation of state-of-the-art foundation models available exclusively in HAQM Bedrock. HAQM Nova offers understanding models that accept text, images, and video inputs and generate text outputs.

These powerful models can be utilized to revolutionize content compliance workflows by bringing generative AI to content moderation at scale. Using these models, we can:

Create an intelligent content moderation system that can analyze content
Automatically flag potential issues with the associated timestamps
Adapt to multiple rating systems simultaneously

Compliance officers can then focus their expertise on edge cases and nuanced decisions that truly require human judgment.

Through the lens of analyzing music videos, we’ll explore how to build this automated content compliance system, examining the technical architecture, implementation steps, and real-world benefits. This solution offers a blueprint for scaling content compliance, while reducing cost, without sacrificing accuracy or speed.

Understanding the challenge

Content compliance in the media industry has reached a critical inflection point. Some customers manage libraries of hundreds of thousands of hours of content across multiple territories. This scale makes traditional frame-by-frame manual review increasingly impractical.

Adding to this complexity is the proliferation of rating systems and content standards worldwide. Content providers must simultaneously navigate multiple frameworks. Each system has its own nuanced criteria, cultural considerations, and compliance requirements, creating a multi-dimensional challenge for content teams.

Human moderators, despite their expertise, face inherent limitations in this environment. Consistency becomes difficult to maintain across long review sessions, and fatigue can impact judgment, particularly when dealing with sensitive or challenging content. The psychological toll of continuous exposure to potentially disturbing content also raises important concerns about reviewer wellbeing.

The financial implications of maintaining purely manual review processes are equally significant. Beyond the direct labor costs, the time required for comprehensive manual review can delay content releases, impacting revenue opportunities and market competitiveness. For organizations expanding into new territories, scaling up manual review teams for each market becomes prohibitively expensive.

These challenges create a clear imperative for transformation. Media organizations need a solution that can:

Process high volumes of content efficiently and consistently
Adapt to multiple rating systems simultaneously
Support human reviewers rather than replace them
Reduce operational costs while maintaining or improving accuracy

This is where the combination of HAQM Nova foundation models and HAQM Bedrock offers a compelling path forward, enabling a hybrid approach that amplifies human expertise with generative AI capabilities.

Solution overview

High-level architecture diagram of content compliance workflow in three stages - Content Preparation (take existing lyrics text file or use HAQM Transcribe to generate it), Step 1: HAQM Nova Video Understanding (HAQM Bedrock takes lyrics, video, and media analysis prompt containing compliance ratings as input, and generates compliance report), and Step 2: HAQM Nova Image Understanding (if Step 1 exceeds thresholds for compliance flags, extract frames from video at 1fps and analyze image using HAQM Bedrock to generate a compliance timeline analysis). Compliance ratings used in the media analysis prompt are: Region A, Region B, Region C, Redion D, and Moderation Flags.

Figure 1: Architecture diagram for content compliance using HAQM Nova.

This solution contains the following key components:

HAQM Bedrock provides access to leading foundation models through a single API, enabling flexible development of generative AI applications.
HAQM Nova Lite and HAQM Nova Pro are multimodal understanding models. HAQM Nova Lite and HAQM Nova Pro support a 300k context window and can analyze videos, documents, and images, making it ideal for comprehensive content analysis. HAQM Nova models understand the publicly available media taxonomies that are used to generate compliance analysis.

Traditional computer vision models, such as HAQM Rekognition, are used for image classification, label detection, object segmentation, text detection, face recognition, and content moderation. While computer vision models excel at certain tasks, we will instead use foundation models (FMs) for deep image and video understanding. FMs are capable of handling a variety of tasks in a single shot, and allow us to craft multi-use-case prompts that are capable of understanding media compliance taxonomies.

In Step 1, we pass in the music video (video), accompanying lyrics (text document), and a carefully crafted prompt with the necessary ratings and compliance taxonomies (text). HAQM Nova processes the request with low latency, and returns a compliance analysis with confidence scores. We convert the compliance analysis to a standard report, and use it to determine if we need a deeper frame-by-frame analysis (Step 2).

Step 1: Video level analysis using HAQM Nova

In order to achieve higher accuracy for video understanding, both video and audio should be used as context to the model. HAQM Nova multimodal understanding models currently require the audio transcript to be passed in separately from the video. In cases where the audio transcript is not available in text format, HAQM Transcribe may be used to provide the transcription. HAQM Transcribe is a fully managed automatic speech recognition service that provides high accuracy transcriptions.

In the following example prompt, we use the HAQM Bedrock Converse API to pass the video and transcript to HAQM Nova. This is to assess regional compliance against the multiple rating systems. We will also be requesting a general content moderation analysis and executive summary:

Analyze the attached video and lyrics. Provide a detailed analysis including:

Content Rating Analysis:
1. Region A Rating Assessment
- Suggested rating (Adult 18+, Teen 13+, Child 7+, General)
- Confidence score (0-100%)
- Justification for rating
- Content elements influencing rating

2. Region B Rating Assessment
- Suggested rating (Mature, Youth, Family, All Ages)
- Confidence score (0-100%)
- Key Classification issues
- Content advisories

3. Region C Rating Assessment
- Suggested rating (Restricted, Teen+, Child+, Universal)
- Confidence score (0-100%)
- Content warnings required
- Justification for rating

4. Content Moderation Flags
- Violence
- Adult Content
- Hate Speech
- Self-Harm
- Profanity
- Drug Reference
- Alcohol Reference
(Include confidence score for each flag)

Content Analysis:
5. Theme and Messaging
6. Visual Content Assessment
7. Lyrical Content Analysis
8. Target Audience
9. Potentially Sensitive Content Areas

Format the ratings and flags with confidence scores in the response.

Provide a brief executive summary (2-3 sentences) of the overall content analysis and rating recommendation.

An example response can be seen in the following. The response shows that HAQM Nova has performed the assessment and flagged that there are compliance considerations. It has also provided content moderation flags with confidence scores and an executive summary.

Content Rating Analysis:

1. Region A Rating Assessment:
Suggested rating: Adult 18+
Confidence score: 95%
Justification for rating: The video contains profanity, suggestive themes, references to alcohol and drug use, and some suggestive content. The lyrical content includes themes of crime, drinking, and struggle, which are not suitable for younger audiences.
Content elements influencing rating:
- Profanity (Confidence score: 90%)
- References to alcohol and drug use (Confidence score: 75%)

2. Region B Rating Assessment:
Suggested rating: Mature
Confidence score: 90%
Key classification issues: The video includes explicit language, and references to alcohol and drug use, which are not suitable for a younger audience
Content advisories: Viewer discretion advised for mature themes

3. Region C Rating Assessment:
Suggested rating: Teen+
Confidence score: 85%
Key classification issues: The video includes explicit language and references to alcohol and drug use, which may not be suitable for younger teens.

4. Content Moderation Flags:
- Violence: 10%
Depictions of violence in the video and lyrics.

- Adult Content: 60%
References to sexual content and mature themes.

- Hate Speech: 10%
No direct hate speech, but some lyrics may be interpreted as aggressive.

- Self-Harm: 5%
Indirect references to personal struggles and loss.

- Profanity: 80%
Frequent use of explicit language in the lyrics.

- Drug Reference: 75%
References to drug use and dealing in the lyrics.

- Alcohol Reference: 80%
References to partying and nightlife, including drinking alcohol.

Content Analysis:

5. Theme and Messaging: The video revolves around themes of betrayal, blame, and the consequences of reckless behavior. The lyrics suggest a narrative of mutual involvement in wrongdoing and the fallout from it.

6. Visual Content Assessment: The visuals are dark and moody, with flashing lights and silhouettes that enhance the dramatic and intense atmosphere of the content.

7. Lyrical Content Analysis: The lyrics contain strong language, references to alcohol and drug use, and suggest a narrative of mutual blame and involvement in wrongdoing.

8. Target Audience: The target audience is likely adults, given the mature themes and language. The content may not be suitable for younger viewers.

9. Potentially Sensitive Content Areas:
- References to drug use and alcohol consumption.
- Strong language.
- Suggestive themes.

Executive Summary:
The video contains strong language, references to drug and alcohol use, and suggestive themes. It carries a medium risk for brand safety and requires appropriate warnings and restrictions for broadcast. The content is likely to cause offense and is best suited for an adult audience.

The results of the analysis can be output in JSON format, which can then be parsed to create a compliance report for use within your organization.

$JSON output from video level analysis. Shows JSON visual hierarchy: { "metadata": { "artist": "Sample-Artist", "song_name": "Sample-Song", "genre": "Rock", "release_date": "2025-03-25" }, "ratings": { “region_a“: { }, “region”_b: { }, “region”_c: { “rating”: “Teen”+, "warnings_required": true, "confidence": 80 “justification”: “Mature themes and mild language” "content_flags": { "content_analysis": { } }$

Figure 2: JSON output from Video Level Analysis.

Following is an example compliance report created by parsing the JSON file:

A comprehensive content analysis report for a rock song, including video player, metadata, content ratings from Regional Rating Systems, and content moderation flags, showcasing detailed insights for a music video released on March 25, 2025. Text is highlighted green or red to draw attention to topics that may require human review. Video Metadata: Field Value Artist Sample-Artist Song Name Sample-Song Genre Rock Release Date 2025-03-25 Content Ratings Summary: Rating System Classification Confidence Region A Rating Teen 13+ 85% Region B Rating Youth 82% Region C Rating Teen+ 80% Content Moderation Flags: Violence Not Detected 65% Adult Content Not Detected 92% Hate Speech Not Detected 95% Self-Harm Not Detected 75% Profanity Detected 80% Drug Reference Detected 75% Alcohol Reference Detected 80%.

Figure 3: Output from Video Level Analysis.

If any of the flags exceed your thresholds, the detailed frame analysis can be triggered as a secondary step.

Step 2: Frame level analysis using HAQM Nova

This stage is only invoked if needed, to further explore any compliance issues discovered in the previous stage. With each frame analyzed, we can generate a timeline analysis visualization, which shows a time series and detailed analysis of specific compliance event.

Frame-by-frame analysis follows two steps:

1. Extract frames from the video at a given frequency. For example, by using AWS Elemental MediaConvert to extract one frame per second.
2. For each frame extracted:
  - Skip the frame if it is too similar to the previous frame. One way to assess this would be using perceptual hashing between frames—reducing costs.
  - Analyze frames for moderation flags.

Use the following prompt, along with passing the frame image to HAQM Nova:

You are an expert content moderator. Analyze the image for:
- Violence or graphic content
- Adult or inappropriate content
- Hate or offensive symbols
- Self-harm indicators
- Drug or alcohol references
Provide detailed analysis with confidence scores.

The timestamp of any moderation issues can be calculated by knowing the rate of extraction of frames and how many frames into the sequence of processing we are. Flags and timestamps can be collated to generate a timeline analysis visualization, which shows a time series of specific compliance events.

$A JSON data output showing frame-level content analysis results at timestamp 00:43, displaying confidence scores and descriptions for various content categories including violence, adult content, hate speech, self-harm, and alcohol references, with the analysis summary indicating "No significant flags detected." JSON hierarchy displayed: { "timestamp": 43, "timestamp_formatted": "00:43", "flags": { "violence_or_graphic_content": { "description": "There is no visible violence or graphic content in the image.", "confidence_score": 0.95 }, "adult_or_inappropriate_content": { "description": "The image shows a bar setting with people drinking, but there is no explicit adult or inappropriate content visible.", "confidence_score": 0.9 }, "hate_speech_or_offensive_symbols": { "description": "There are no visible hate speech or offensive symbols in the image.", "confidence_score": 0.95 }, "self_harm_indicators": { "description": "There are no visible indicators of self-harm in the image.", "confidence_score": 0.95 }, "drug_or_alcohol_references": { "description": "The image shows people drinking at a bar, which implies the presence of alcohol. There are no visible drug references.", "confidence_score": 0.85 } }, "analysis_summary": "There is presence of alcohol, but no other significant flags are detected." }$

Figure 4: JSON output from Frame Level Analysis.

Following (Figure 5) is an example compliance report created by parsing the timeline analysis JSON file.

A content analysis report showing video metadata, content ratings from various systems (Region A, Region B, Region C), and a timeline analysis with no significant flags detected across multiple timestamps for a rock song released on March 25, 2025. Content Ratings Summary shows rating system, classification, and confidence score. Video Metadata: Field Value Artist Sample-Artist Song Name Sample-Song Genre Rock Release Date 2025-03-25 Content Ratings Summary: Rating System Classification Confidence Region A Rating Teen 13+ 85% Region B Rating Youth 82% Region C Rating Teen+ 80% Timeline Analysis: Timestamp Detected Flags Flag Details 00:00 No significant flags detected None 00:01 No significant flags detected None 00:02 No significant flags detected None 00:03 Alcohol Two glasses on the bar counter contain liquid 00:04 No significant flags detected None 00:05 No significant flags detected None 00:06 No significant flags detected None 00:07 No significant flags detected None 00:08 No significant flags detected None 00:09 Alcohol There are visible glasses in a bar setting 00:10 No significant flags detected None 00:11 No significant flags detected None 00:12 No significant flags detected None 00:13 No significant flags detected None 00:14 Alcohol The man in the center is holding a glass with a dark liquid.

Figure 5: Output from Frame Level Analysis.

Benefits and results

Music videos pose unique challenges for content review due to their fast-paced nature, multiple themes, and complex visual elements. Music videos are fast-paced with frequent shot changes, they may present multiple themes in a short period of time, and are visually powerful. These elements make human review of content more error prone. We tested the visual understanding capabilities of HAQM Nova Lite and HAQM Nova Pro with music videos across regional ratings. There are many variables that contributed to our output, and this analysis is meant to be directional and not definitive. We encourage you to run your own tests on your specific content. The following section describes observations from our tests.

Key Observations:

1. - Both HAQM Nova models demonstrated caution, rating categories conservatively
  - HAQM Nova Pro showed higher accuracy in detecting nuanced content elements
  - HAQM Nova Lite proved sufficient for standard cases with greater than 83 percent accuracy
  - HAQM Nova Lite was sufficient in many test cases, though there were cases where HAQM Nova Pro provided more detailed descriptions of scenes (without being prompted to describe these aspects). Consider using HAQM Nova Pro when more detail is required for:
    - Facial expressions
    - Advanced plot or theme analysis
    - Spatial understanding
    - Specific location of objects
  - A human reviewer neglected to identify elements such as:
    - Fast flashing lights
    - Certain self-harm themes
    - Partial nudity in the background due to strong blurring in out-of-focus areas
    - Suggestive content displayed on body tattoos

Best practices and considerations

When implementing content compliance automation with HAQM Nova models, organizations should consider both cost and performance optimizations. Our testing revealed that analyzing a typical four-minute music video using HAQM Nova Lite consumed 76,497 input tokens and 496 output tokens, resulting in a cost of $0.0047 based on current pricing. This three-stage approach (content preparation, video analysis, and frame analysis) enables independent scaling and targeted problem-solving without disrupting the entire workflow.

For maximum effectiveness, prompts should be structured hierarchically with clear categories and specific examples, with media content presented before analysis instructions. For more prompting best practices, refer to the Prompting best practices for HAQM Nova understanding models.

Organizations should leverage HAQM Bedrock Prompt Management for version control and updates. Implementing human-in-the-loop workflows for content within specific confidence thresholds is recommended. You should also include feedback mechanisms that allow moderators to correct AI decisions and build valuable datasets for future improvements. Regular performance reviews and automated metrics should inform prompt refinements, enabling fast turnaround updates independent from the application.

By following these best practices, organizations can optimize their content compliance automation, ensuring efficient, accurate, and cost-effective operations while maintaining the flexibility to adapt to changing requirements.

Conclusion

Leveraging HAQM Nova foundation models and HAQM Bedrock for content compliance workflows represents a paradigm shift for media organizations. This generative AI-powered approach not only addresses the immediate challenges of scale, complexity, and cost but also positions content teams to thrive in an increasingly global and content-rich landscape.

Organizations can expect faster content review, consistent ratings, and significant cost savings. Begin by assessing your current workflow and implementing a proof of concept before scaling to full deployment.

Ready to transform your content compliance workflow? Contact an AWS Representative to know how we can help accelerate your business.

Visit the following links to learn more about additional media and entertainment industry use cases:

AWS for M&E Blog