AWS Public Sector Blog
Breaking barriers: How AWS is revolutionizing the accessibility of federal agency communications for people with visual disabilities
Over 7.2 million Americans with visual disabilities face barriers accessing critical government information. Equal access to public communications is both a legal requirement and essential service. AWS enables federal agencies to provide accessible communications through automated document-to-speech conversion. This solution combines HAQM Simple Storage Service (HAQM S3), HAQM Textract, and HAQM Polly to transform written government documents into high-quality audio content.
When agencies upload documents to HAQM S3, HAQM Textract extracts the text while HAQM Polly converts it to natural-sounding speech. This automated process maintains document privacy, while giving visually impaired citizens independent access to important information. The solution we will explore in this post addresses three critical needs: compliance with accessibility regulations, improved service delivery to visually impaired citizens, and efficient use of agency resources.
Federal agencies can implement this solution to meet legal accessibility requirements, serve citizens more effectively, reduce manual processing costs, and scale accessibility services. This document outlines the technical architecture, implementation approach, and expected outcomes for deploying automated document-to-speech conversion using AWS services.
AWS services for accessibility implementation
The solution we will discuss in this post uses AWS services to automate document processing and delivery. HAQM S3 stores and manages source documents, while AWS Lambda functions process documents and coordinate service interactions across the workflow. HAQM Textract extracts text from documents, and HAQM Polly converts this text to natural-sounding speech. AWS Step Functions manages the workflow orchestration, with HAQM Simple Queue Service (HAQM SQS) handling message queuing to ensure reliable processing. HAQM DynamoDB tracks document status and metadata throughout the process. Finally, HAQM Connect delivers the audio content to citizens.
This architecture makes government communications accessible through automated text-to-speech conversion, improving service delivery for all citizens.
Architecture
The following figure illustrates a serverless workflow through which text documents are processed by multiple AWS services. Documents stored in HAQM S3 trigger a processing pipeline that uses AWS Step Functions to coordinate HAQM Textract for text extraction and HAQM Polly for text-to-speech conversion. HAQM Connect provides the interface for citizens to access the audio output, while HAQM DynamoDB tracks the processing status.
Architecture workflow
Our accessibility solution processes documents through the following workflow:
- Federal agencies upload PDF notices to HAQM S3, initiating immediate processing. HAQM S3 implements versioning and server-side encryption, with IAM policies restricting bucket access.
- An AWS Lambda function, triggered by S3 event notifications, processes document metadata using standard PDF processing libraries. It creates a DynamoDB entry with a “RECEIVED” status and unique ID, then routes document details to an HAQM SQS queue.
- AWS Step Functions processes documents in batches of 200 or after five minutes through a Map state, handling up to 200 documents simultaneously.
- AWS Step Functions invokes HAQM Textract’s asynchronous API to extract text, forms, and tables from PDFs, capturing spatial information and confidence scores. HAQM Textract sends completion notifications through SNS.
- An AWS Lambda function processes extracted text using natural language processing techniques, detecting sentence boundaries, named entities, and normalizing text. It applies government-specific terminology rules, stores the processed text in HAQM S3, and updates the HAQM DynamoDB status to “PROCESSED.”
- AWS Step Functions sends the processed text with SSML (Speech Synthesis Markup Language) tags to HAQM Polly. SSML enhances the audio output quality by controlling aspects like pronunciation, volume, pitch, and pacing, creating more natural-sounding speech. For example, SSML can properly handle abbreviations, numbers, and specialized government terms. The system generates 24kHz MP3 audio using neural TTS voices matched to document language and content.
- The system stores audio files in a read-optimized HAQM S3 bucket using document IDs, updates the HAQM DynamoDB status to “AUDIO_READY,” and signals readiness through SQS.
- HAQM Connect retrieves recipient lists from HAQM DynamoDB and creates call flows handling busy signals, voicemail, and failures while respecting time zones.
- HAQM Connect makes outbound calls using official government agency caller IDs, streaming audio files from S3 to recipients. Recipients control playback using phone keypad commands.
- HAQM Connect records outcomes in DynamoDB, marking successful deliveries as “DELIVERED.” For failures, Lambda analyzes causes and schedules retries using exponential backoff.
Conclusion
The solution we discussed in this post uses AWS services to improve government communication accessibility and meet federal accessibility requirements. The serverless architecture automates document-to-speech conversion and delivers information to visually impaired citizens. The key features of the solution include secure document processing, text-to-speech conversion, and delivery tracking. The solution delivers value while maintaining flexibility to adapt to agency needs and citizen feedback.
Looking to the future, we plan to explore enhancements could improve the solution’s functionality. These enhancements could include:
- Implementing AWS X-Ray for distributed tracing, CloudWatch for comprehensive monitoring, and AWS CloudTrail for API auditing.
- Integrating HAQM Comprehend could to enable advanced text analysis, and HAQM Translate to support multi-language capabilities.
- Adding HAQM CloudWatch dashboards, cost allocation tags, and automated testing for audio quality to further enhance operational visibility and management.