Back to basics: Accessibility signaling with AWS Elemental Media Services

In the first part of this series, Back to basics; Accessibility services for Media, we reviewed the accessibility services available for broadcast and streaming today. In this second part we plan look at how these accessibility requirements can be met using HAQM Web Services (AWS) Elemental Media Services.

In many countries regulators require Broadcasters to increase the accessibility of their content. TV companies are now electing to also apply these regulations to their Over The Top (OTT) streaming offerings. They want to preserve the broadcast end user experience as more and more viewers are starting to consume media content delivered over the internet.

Streaming over the internet is not trivial, as the number of types of devices that can receive media content are much broader in comparison with broadcast. Signaling of the accessibility components becomes a key part of content processing. To help with this OTT streaming formats describe how accessibility services should be signaled in their specifications: like Apple’s HTTP Live Streaming (HLS) specification or the DVB organization’s DVB DASH (Dynamic Adaptive Streaming over HTTP) specification. DASH and HLS – these are the two most commonly used streaming formats today.

In HLS, accessibility information is typically signaled through:

EXT-X-MEDIA tags in variant playlist with:

- CHARACTERISTICS attribute: Indicates accessibility features like “public.accessibility.describes-video” or “public.accessibility.transcribes-spoken-dialog”
- LANGUAGE attribute: Specifies the language of the audio or subtitle track
- TYPE attribute: Can be “CLOSED-CAPTIONS” or “SUBTITLES”

CEA-608/708 closed captions can be signaled using:

- CLOSED-CAPTIONS attribute in the EXT-X-STREAM-INF tag
- “CLOSED-CAPTIONS=NONE” indicates no embedded closed captions

This helps players and streaming services properly handle accessibility features like closed captions, subtitles, and audio descriptions for viewers who need them.

In DASH, accessibility is typically signaled as Accessibility Metadata in manifest/playlist file:

Closed Captions/Subtitles signaled like this:

<AdaptationSet mimeType="application/ttml+xml">
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle"/>
<Accessibility schemeIdUri="urn:mpeg:dash:role:2011" value="caption"/>
</AdaptationSet>

Audio Description like this:

<AdaptationSet mimeType="audio/mp4">
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="alternate"/>
<Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="2"/>
</AdaptationSet>

Interestingly that DASH specification defines both accessibility and role descriptors. These can be confusing because of the overlap, as they can signal effectively the same information. For example, both can be set to values that indicate enhanced-audio-intelligibility. In this situation, it is important to understand which of these your client player supports and use the supported option.

Roles tell players what sets of content to default to and provide what general type of bucket the content falls into:

Main – The primary language of the region or (more rarely) the language of the source content
Dub – Languages other than the primary, or languages the audio has also been translated to
Alternate – Visually impaired, hard of hearing, enhanced audio intelligibility
Commentary – Visually impaired
Caption – May be used with burned in captions where the media-type is “video”
Sign – Video representing sign-language interpretation
Description – Textual or audio media containing a textual description
Enhanced-audio-intelligibility – Experience containing an element for improved intelligibility of the dialogue where multiple AdaptationSets can be marked similarly, but differentiate by codecs or languages.

If Roles are not supported in the player, the alternative is to use an accessibility tag which informs the player, depending on the schema used, specific details about what type of alternate or commentary data is present.

For example, urn:tva:metadata:cs:AudioPurposeCS:2007:

@value = “1” for the visually impaired
@value = “2” for the hard of hearing
@value = “8” for enhanced audio intelligibility or dialogue enhancement

AWS as a solution provider for modern OTT platforms enables accessibility features in purpose-built services for video processing: AWS Elemental MediaLive, AWS Elemental MediaConvert, AWS Elemental MediaPackage and other AWS Elemental Media Services.

Implementing accessibility features opens your content to a broader viewership while delivering superior user experiences. Beyond meeting regulatory requirements, these services provide a competitive advantage that distinguishes your content in the marketplace

AWS Elemental MediaLive

AWS Elemental MediaLive (MediaLive) is a live video processing service that enables to encode high-quality live video streams for broadcast television and multiscreen devices.

With MediaLive you can configure DASH audio description signaling for an audio track for Microsoft Smooth Streaming (MSS) and a Common Media Application Format (CMAF) output group. It might be used by downstream packagers, like AWS Elemental MediaPackage (MediaPackage), to create the correct accessibility signaling for DASH output.

Graphic showing AWS Console UI for AWS Elemental MediaLive, accessibility audio configuration for output packaging using Microsoft Smooth Streaming or Common Media Application Format. DASH Role Audio here specifically set for audio accessibility, with DASH Role Audio set to DESCRIPTION and DVB DASH Accessibility set to DVBDASH_2_HARD_OF_HEARING.

Figure 1: Example of accessibility signaling configuration for audio output on MediaLive.

For captions, with MediaLive you can configure DASH accessibility signaling for an output with captions when a MSS and CMAF output group is used. Alternatively, if you are using a HTTP Live Streaming (HLS) output group, the caption track can define accessibility features, such as written descriptions of spoken dialog, music, and sounds.

Graphic showing AWS Console UI for AWS Elemental MediaLive. Here as the prevoius image IMPLEMENTS_ACCESSIBILITY_FEATURES is selected, but now as we are looking at captions here under Caption DASH Roles 1 the Caption is set to SUBTITLE and the DVB DASH Accessibility is set to DVBDASH_2_HEARD_OF_HEARING.

Figure 2: Example of accessibility signaling configuration for caption output on MediaLive.

AWS Elemental MediaConvert

AWS Elemental MediaConvert (MediaConvert) is a file-based transcoder with packaging capability. MediaConvert enables the captions signaling for both HLS and DASH output types.

Figure 3: Example of the accessibility signaling configuration for a caption output on MediaConvert.

For captions in HLS variant playlists, MediaConvert adds the following accessibility attributes under EXT-X-MEDIA for a subtitles track:

CHARACTERISTICS="public.accessibility.describes-spoken-dialog,public.accessibility.describes-music-and-sound" and AUTOSELECT="YES"

For captions with DASH playlists, MediaConvert adds the following in the adaptation set for the track:

<Accessibility schemeIdUri="urn:mpeg:dash:role:2011" value="caption"/>

For accessible audio, it is possible to use an audio track with pre-mixed audio descriptions, also known as Broadcast Mix audio description. To enable this setting, you need to set the Descriptive Video Service (DVS) for Audio Description (AD) flags in your HLS variant playlist in order to use a CMAF output group.

Screenshot of the MediaConvert console showing how to configure DVS for CMAF output. Here under the CMAF container settings, in Advanced HLS specific settings the Descriptive Video Service value is set to ‘Flag’

Figure 4: Example of accessibility signaling configuration for audio output on MediaConvert.

When you set the Flag option, MediaConvert includes the parameter CHARACTERISTICS="public.accessibility.describes-video" in the EXT-X-MEDIA entry for this track.

In addition, with MediaConvert you can set the audio descriptor to indicate audio description and broadcaster mix to the player device consuming the stream.

Here under Audio Track Type select BROADCASTER_MIXED_AD when the input contains pre-mixed main audio as well as a second audio description (AD) as a stereo pair. The value for Audio Type will be set to signal to downstream systems that this stream contains “broadcaster mixed AD”.

Example of Audio configuration on MediaConvert. Here for an audio track being encoded to a CBR 48kHz 2.0 stereo 96kbps track we have specifically set the value of Audio Description Broadcaster Mix to Broadcaster Mixed AD so the downstream audio decoder presents and decodes this track correctly.

Figure 5: Example of Audio configuration on MediaConvert including AD descriptor.

AWS Elemental MediaPackage

AWS Elemental MediaPackage is a just in time packager prepares, protects, and distributes your video content to a broad range of connected devices. MediaPackage can process Video on Demand (VoD) as well as Live content.

VoD

For VoD, MediaPackage leverages the accessibility subtitle signaling in the source HLS manifest to both HLS and DASH outputs with both caption and audio accessibility signaling present. MediaPackage will automatically create the metadata on the output if the metadata is properly present in the input.

For HLS and CMAF endpoints/outputs it will pass through the #EXT-X-MEDIA CHARACTERISTICS attribute value for all subtitles tracks.

Here is an example of the resulting signaling for captions:

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="eng",CHARACTERISTICS="public.accessibility.describes-spoken-dialog,public.accessibility.describes-music-and-sound",URI="tearsofsteel_4ksubs.m3u8"

For audio HLS and CMAF endpoints and outputs. The source HLS variant playlist will include a CHARACTERISTICS="public.accessibility.describes-video" attribute for audio tracks and pass through the #EXT-X-MEDIA CHARACTERISTICS attribute value for all audio tracks as well as the #EXT-X-MEDIA:TYPE AUTOSELECT and DEFAULT attributes.

Concerning DASH endpoints/outputs, MediaPackage will add two elements to the AdaptationSet level, when when subtitles or audio tracks on the input present a CHARACTERISTICS="public.accessibility.describes-music-and-sound" attribute value or a public.accessibility.transcribes-spoken-dialog attribute value:

<Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle" />
<Accessibility schemeIdUri="urn:mpeg:dash:role:2011" value="caption" />

Live

For Live streams (MedaPackage Live v2) expectation that the CMAF ingest protocol should be used. Accessibility attributes corresponding to a given stream are signaled in MP4 KIND boxes carrying DASH accessibility information. Each KIND box consists of a (scheme, value) pair which will look like this:

{
   "scheme" : "urn:mpeg:dash:role:2011",
   "value" : "main"
}

As a result, you should expect the following metadata in the resulting DASH Manifest.

For captions:

<AdaptationSet id="sub1" startWithSAP="1" mimeType="application/mp4" lang="eng" codecs="stpp">
 <SupplementalProperty schemeIdUri="urn:dvb:dash:fontdownload:2014" value="1"
dvb:url="http://fonts.example.com/easilyreadablefont.woff" dvb:mimeType="application/font-woff"
dvb:fontFamily="easyread"/>
 <Role schemeIdUri="urn:mpeg:dash:role:2011" value="main"/>
 <Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="2"/>
 <SegmentTemplate startNumber="1" timescale="1000" duration="10000"
media="$RepresentationID$/$Number$" initialization="$RepresentationID$/IS" />
 <Representation id="subs" bandwidth="20000"/>
</AdaptationSet>

For audio:

</AdaptationSet>
    <AdaptationSet id="1313833636" contentType="audio" mimeType="audio/mp4" segmentAlignment="true" startWithSAP="1" bitstreamSwitching="true" lang="eng">
      <Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="6"/>
      <Role schemeIdUri="urn:mpeg:dash:role:2011" value="main"/>

For HLS manifest you should expect accessibility information to be associated with the CHARACTERISTIC property of the EXT-X-MEDIA tag. Here is an example of resulting signaling for captions:

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="eng",CHARACTERISTICS="public.accessibility.describes-spoken-dialog,public.accessibility.describes-music-and-sound, public.easy-to-read",URI="tearsofsteel_4ksubs.m3u8"

Correspondingly, for audio renditions, the only valid value is public.accessibility.describes-video.

Conclusion

Our accessibility series concludes with practical guidance for implementing accessibility features with AWS Elemental Media Services. These insights into packaging and player preferences provide you with the essential tools to create inclusive streaming experiences for all viewers.

By implementing accessibility features in AWS Elemental Media Services enables your content to reach viewers who might otherwise be excluded, while improving the overall viewing experience for everyone. By prioritizing accessibility, you’re contributing to a more inclusive digital media environment while ensuring your content serves the needs of all viewers.

Contact an AWS Representative to know how we can help accelerate your business.

AWS for M&E Blog

Back to basics: Accessibility signaling with AWS Elemental Media Services

AWS Elemental MediaLive

AWS Elemental MediaConvert

AWS Elemental MediaPackage

VoD

Conclusion

Further reading

Resources

Follow