Nomalab automates ad break detection in media workflows with AWS

The Media and Entertainment industry is undergoing unprecedented transformation, with the volume and complexity of content continuing to grow at a rapid pace. To stay competitive, broadcasters and publishers must find innovative solutions to optimize their media workflows from ingest to delivery.

Recognizing this challenge, Nomalab, a leading cloud-based media management platform available on the HAQM Web Services (AWS) Marketplace has leveraged the power of AI and generative AI through the Guidance for Media2Cloud on AWS (Media2Cloud). The guidance is an open source, serverless framework. This implementation exemplifies how embracing the latest cloud and AI technologies can drive efficiency and position media organizations for long-term success.

We’ll explore how Nomalab’s innovative use of AWS services and generative AI is transforming media workflows and setting a new standard for content processing efficiency.

Nomalab use case

Nomalab’s customers, which include major media groups, are facing increasing demand for delivering more content to linear TV and over-the-top (OTT) platforms than ever before. This pressure has made automation not just desirable, but essential for maintaining a competitive edge for viewer’s experiences and operational efficiency.

A particular pain point in this content delivery pipeline is the generation of segmentation metadata for long-form content. Specifically, the precise identification of title sequences, end credits, and ad insertion points that respect the viewer’s narrative experience.

To this end, the goal became the creation of a system that accurately generates metadata and could be deployed without subsequent human intervention. Achieving this would dramatically accelerate the content preparation workflow and ultimately reduce costs. This automation imperative became the driving force behind Nomalab’s innovative solution. The solution transforms what was once a time-consuming human process into a streamlined, automated operation that keeps pace with today’s demanding content delivery schedules.

“Our clients need to be able to insert ads seamlessly into their programming, but manually reviewing hours of footage to identify suitable ad breaks is extremely time-consuming,” explains Romane Lafon, Digital Media CTO at Nomalab. “We knew AI could be the answer, but we wanted a solution that was highly accurate and cost-effective to run at scale.”

Success for this automated solution was measured against rigorous quality standards. They set a target of achieving over a 90 percent accuracy rate for the identified segmentation points, compared to manual identification by human experts. This high threshold ensures that the automated system performs at a level that matches traditional manual processes. This level provided Nomalab’s clients with the confidence they needed to fully embrace the automated workflow in their content delivery pipelines.

Integrating Media2Cloud and generative AI

When faced with the challenge of automating ad break detection in video content, Nomalab turned to Media2Cloud. The Guidance for Media2Cloud on AWS not only streamlines the migration of video assets and metadata to AWS but also unlocks deeper content insights. It enables automatic extraction of valuable metadata from multiple content types by leveraging AWS AI services.

“By integrating Media2Cloud and advanced AI capabilities, we’ve been able to develop a uniquely powerful solution that automates key tasks and delivers measurable results for our clients,” says Lafon.

One of the most powerful features of Media2Cloud is ad break opportunity detection. Its approach combines traditional artificial intelligence and machine learning (AI/ML) with generative AI to detect natural content breaks, known as chapter points. Chapter points are logical breaks where both visual context and narrative context shift. For instance, transitioning from an indoor conversation to an outdoor scene can be considered as a break. Identifying these effectively requires analyzing both visual and audio components simultaneously.

The process begins with a Segment API from HAQM Rekognition, which provides frame-accurate camera shot detection and technical cue identification (like color bars, black frames, and end credits).

Dividing frame-accurate shots detected by the HAQM Rekognition Segment API from a video with timeline markers indicating where transitions occur. It has analyzed the segment and detected five different shots.

Figure 1: Frame accurate shots – HAQM Rekognition Segment API.

The next step regroups similar shots into scenes. To do so, the solution uses the HAQM Titan Multimodal Embedding model through HAQM Bedrock. The process involves converting frame images into vector embeddings that capture semantic features. The embeddings are stored in HAQM OpenSearch Service with a K-NN plugin and then analyzes similarity patterns to group related shots into coherent scenes.

Visualization of the process that groups similar camera shots into logical scenes, displaying frame thumbnails clustered together based on visual similarity. There is an original input of four different shots. The shots are vectorized using HAQM Bedrock with the HAQM Titan Multi-Modal Embedding model and store in HAQM OpenSearch Service vector store index. Using K-NN search engine enables to determine that there is an 80 percent contextual similarity between two of the original segments and therefore combines them. The results end in three different scenes having actually been detected, not four.

Figure 2: Group similar shots into logical scenes.

The final step combines scene detection with conversation analysis to create meaningful chapter breaks. Media2Cloud uses HAQM Transcribe to convert speech to text with precise timestamps, along with HAQM Bedrock foundation models to analyze conversation flow and identify topic transitions.

Depiction of a transcript generated by HAQM Transcribe is analyzed using HAQM Bedrock using a foundation model to identify topic changes and conversation flow, breaking the transcript into chapters with start and end timecodes. Two computer windows are open, on the left side is a window showing HAQM Transcribe and the right side window shows HAQM Bedrock. Within the HAQM Bedrock window there are two different boxes encircling two different start and end time codes for different conversational flows.

Figure 3: Analyze transcript for topic changes.

As a result, this intelligent workflow generates natural transition points where commercial breaks would be least disruptive to the viewing experience—laying the groundwork for sophisticated ad placement strategies.

Aligning visual and audio elements to find scenes resulting in natural, unintrusive ad break opportunities. The alignment between audio and visual results combines 5 shots into 3 scenes with timeline markers indicating where transitions occur from a video.

Figure 4: Aligning visual and audio elements to find unintrusive breaks.

Placing ad-breaks

Building on this foundation, the ad break detection feature of Media2Cloud takes video content analysis to the next level. The system pinpoints frame-accurate opportunities for ad placement while providing rich contextual insights through industry-standard classifications like IAB Content Taxonomy V3 and GARM Taxonomy. This contextual awareness is crucial—helping ad decision servers make smart choices about which advertisements to place based on the surrounding content.

Generation of the context response flow, describing the combination of a timeframe of the scene shots and chapter summary provided by conversational analysis using HAQM Bedrock foundation model's multi-modal capabilities. The interface displays both visual scene information and corresponding content classifications generated by the HAQM Bedrock foundation model that help ad decision servers make informed placement choices based on content context. The Chapter Summary: The scene depicts a bustling street in Los Angeles in 1947. The image shows a crowded urban landscape with a mix of vintage cars, pedestrians, and buildings. The street is lined with storefronts and billboards, creating a vibrant and lively ... in addition to The Prompt: Given a sequence of frame images starting from left to right and then from top to bottom, and the audio transcription, answer the following questions. (1) describe the frame images in details (2) Classify the scene in AIB Content Taxonomy V3 (3) Classify the scene in GARM taxonomy (4) Sentiment of the scene This is all passed to HAQM Bedrock, which provides the results: A person in a hat and coat walks alone across a rocky shoreline towards a crashing wave. The person wearing a suit stands alone on a rocky beach, looking out towards the ocean in several frames of the sequence. VKIV56 - Nature Death, injury or Military Conflict Negative

Figure 5: Generate the contextual response.

Integration

The intelligence of Nomalab’s solution extends beyond initial detection by Media2Cloud to include a sophisticated post-evaluation phase. This critical step employs proprietary algorithms that mirror human decision-making processes to refine and optimize ad break selection. By applying customized business rules, the system intelligently filters and prioritizes potential ad break opportunities. It ensures that only the most strategic and appropriate breaks make it to the final cut. This smart curation process transforms raw detection data into production grade actionable insights, delivering precisely the number of ad breaks needed, while maintaining the integrity of the viewing experience.

Figure 6: Nomalab’s analysis workflow.

“The integration with Media2Cloud was critical,” says Lafon. “It provided us with a scalable, cloud-native foundation to build upon, allowing us to focus on the AI-powered features that would deliver real value to our clients and on our own post-evaluation algorithm. Also, the generative AI capabilities of HAQM Bedrock were a game-changer for us. By combining conventional signal analysis, computer vision, speech-to-text, natural language understanding, and business rules we were able to create an incredibly accurate ad break detection system that closely matches the decisions a human expert would make.”

Benefits and results

By automating ad break opportunity detection, Nomalab has been able to dramatically improve the efficiency of its ad break detection workflow. A typical ad breaks and markers placement workload for a qualified craftsperson is approximately 25 programs each workday. Previously, an analysis of a complete library took days to complete. Now it can be achieved in a matter of hours, at a virtually unlimited scale, with a high degree of accuracy.

“Our overall target is a 90 percent or higher accuracy rate in our ad break detection, compared to manual evaluation, and we are nearly there or higher depending on content typology” says Lafon. “The best part is, our clients don’t have to worry about the underlying complexity—they just get a fully enriched media asset ready for ingest and distribution.”

The scalability of the AWS infrastructure is key for Nomalab. “We handle tens of thousands of content pieces per year, all while keeping our costs under control,” Lafon adds. “It’s a win-win for us and our clients.”

Looking to the future

As the media and entertainment industry continues its rapid evolution, the capabilities showcased in Nomalab’s workflow highlight the transformative potential of AWS services and generative AI in optimizing media operations.

“This is just the beginning,” says Lafon. “We’re already exploring how we can use these technologies to automate tasks like content classification, metadata enrichment, and even media QC [quality control]. The possibilities are endless, because the availability of usable metadata is a major issue in the industry.”

The deep integration of generative AI capabilities powered by HAQM Bedrock (combined with services like HAQM Rekognition and HAQM Transcribe) showcase the transformative potential of converging AI and cloud technologies. By seamlessly integrating computer vision, speech-to-text, audio processing, large language models and their own algorithms, Nomalab was able to develop a highly accurate ad break opportunity detection system that rivals human-level performance.

This synergy of foundational AI/ML services, and the advanced reasoning of generative AI, unlock rich insights that can be extracted from media assets—setting a new bar for content analysis and automation in the media industry.

Looking ahead, the media industry can expect to see these technologies continue to converge, driving increasingly intelligent and automated content processing pipelines. Without any doubt, the ability to leverage cloud-based AI and machine learning will be a key competitive differentiator for media organizations seeking to stay ahead of the curve.

For media companies seeking to future-proof their operations, the success of Nomalab’s solution on AWS serves is a powerful example of what’s possible. By embracing the latest advancements in cloud, AI, and generative AI, industry leaders can unlock new levels of efficiency, accuracy, and innovation—transforming their media workflows for the digital era.

Contact an AWS Representative to know how we can help accelerate your business.

AWS for M&E Blog