AWS Big Data Blog
Accelerate data pipeline creation with the new visual interface in HAQM OpenSearch Ingestion
HAQM OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an HAQM OpenSearch Service domain or HAQM OpenSearch Serverless collection. OpenSearch Ingestion is capable of ingesting data from a wide variety of sources and has a rich ecosystem of built-in processors to take care of your most complex data transformation needs.
Today, we’re launching a new visual interface for OpenSearch Ingestion that makes it simple to create and manage your data pipelines from the AWS Management Console. With this new feature, you can build pipelines in minutes without writing complex configurations manually.
The new visual interface brings three key improvements to help streamline your workflow:
- A guided visual workflow that walks you through pipeline creation
- Automatic permission setup that eliminates manual AWS Identity and Access Management (IAM) policy management
- Real-time validation checks that help catch issues early
These enhancements make it straightforward to ingest, transform, enrich, and route your data, whether you’re setting up your first pipeline or architecting sophisticated data workflows with multiple transformations and sinks.
In this post, we walk through how these new features work and how you can use them to accelerate your data ingestion projects.
Automatic discovery
Before the visual interface, creating an OpenSearch Ingestion pipeline started with selecting a blueprint that provided a template with placeholders for sources and sinks. You would then need to manually modify this template to match your specific requirements.
The new visual interface improves this process by automatically discovering your sources and sinks as you build. Instead of modifying template code, you can simply select from available resources on the dropdown menus and watch your pipeline configuration build in real time.
This automatic discovery feature eliminates the need to switch between different service consoles to find your source and sink details. Previously, you had to navigate to services like HAQM Simple Storage Service (HAQM S3) or HAQM DynamoDB to copy resource details and HAQM Resource Name (ARN) values, then switch back to enter them into your template. This keeps you focused on your pipeline design, streamlining the entire creation process.
Automated IAM role management
With automatic permission creation, you no longer need to manually create IAM policies for your pipelines and the components involved. With the new UI, you can now create a unified IAM role automatically, granting the necessary permissions for all the components in your pipeline. This significantly reduces the complexity of security management and minimizes the risk of permission-related errors. You can also still use your existing roles if you have them defined already.
Real-time validation
The new interface introduces real-time validation capabilities that go far beyond basic syntax checking. Whereas previous versions only validated keyword syntax, the new interface executes your processor chain in real time, catching both configuration and runtime errors as you build. As you construct your pipeline, the interface continuously validates your entire configuration, helping you identify and resolve potential issues like processor misconfigurations, data type mismatches, or transformation errors before deployment. This proactive, execution-based validation approach helps make sure your pipelines work as intended from the start, alleviating the need to wait until runtime to discover processing chain issues.
Now that we’ve covered the key features, let’s walk through the process of creating a pipeline using the new interface.
Create a pipeline in OpenSearch Ingestion
Getting started with the visual interface is straightforward — you can choose a blueprint as your pipeline foundation or start with a clean slate from a blank template. The interface then guides you through each step, using intelligent resource discovery and automatic population features to simplify the entire creation process. For this post, we use the “Zero-ETL with DynamoDB” blueprint.
The visual interface streamlines source configuration by presenting your DynamoDB tables on an easy-to-navigate dropdown menu. After you select a table, the interface handles all the technical details, including automatically retrieving and configuring the ARN. This same functionality extends to HAQM S3 export configuration, where you can choose Browse S3 to select your bucket and folders directly within the pipeline creation workflow.
After your source is configured, you can enhance your pipeline with processors to transform your data. The processor configuration panel starts with a search field where you can find and select the processor you need. You can choose Add to include processors also then arrange them in the desired order. This flexibility allows you to build complex data transformation workflows by combining different processors in the sequence you need.
If there are any issues, such as missing required fields, the interface displays clear error messages, allowing you to address problems before moving forward. This validation at each step makes sure your pipeline is properly configured before deployment.
The following screen capture shows an example of the visual interface.
The interface’s real-time validation capabilities extend to processor configuration, helping you identify and resolve potential issues before they impact your pipeline. Each processor’s configuration is validated as you build your pipeline, with clear error messages guiding you toward proper setup. This proactive validation approach makes sure your data transformation logic is sound before moving to the next stage of pipeline creation.
The sink configuration panel offers flexibility in choosing your OpenSearch destination. You can select between a managed cluster or serverless option, depending on your specific needs. For added convenience, we’ve integrated the ability to create a new OpenSearch domain directly from this interface, streamlining the end-to-end pipeline setup process.
The sink configuration provides options for both dynamic and custom mapping. Dynamic mapping automatically handles data type detection and mapping creation, whereas custom mapping gives you precise control over your data structure. To maintain data reliability, you can enable a dead-letter queue (DLQ)—a holding area for messages that couldn’t be processed successfully—to capture and manage any failed events.
As you make choices in the visual interface, the corresponding YAML/JSON configuration updates in real time. This immediate feedback helps you understand how your selections translate into technical configurations, from index naming to mapping options and advanced settings like flush timeout and document versioning.
Security configuration is now seamless with automated IAM role management. The interface intelligently handles the creation and management of permissions across all pipeline components. You can either create a new service role or use an existing one, and the interface automatically generates a unified IAM role that provides the precise permissions needed across pipeline components—from your source to HAQM S3 components needed for the DLQ and OpenSearch/HAQM S3 sinks. This automation not only saves time but also reduces the risk of permission-related errors that could occur when managing access controls across multiple resources. The following screen capture shows an example.
By consolidating resource selection into a single interface, we’ve eliminated the need to navigate between multiple AWS services. This saves time and reduces the potential for errors that could occur when manually copying resource identifiers. Once a pipeline is created using the visual interface, you can also edit a pipeline using the same visual interface to quickly alter pipeline configuration.
Conclusion
The new visual interface for OpenSearch Ingestion introduces guided visual workflows that simplify pipeline creation, automatic discovery of resources, automated IAM role management, real-time validation, and dynamic configuration previews. These enhancements collectively streamline the pipeline creation process, reduce the potential for errors, and provide a more intuitive experience for users of all skill levels.
Ready to get started? Visit the OpenSearch Service console today and begin building your first visual pipeline. With this new interface, you can transform your data ingestion workflows and unlock new insights from your data more quickly and efficiently than ever before.
About the authors
Sam Selvan is a Principal Specialist Solution Architect with HAQM OpenSearch Service.
Jagadish Kumar (Jag) is a Senior Specialist Solutions Architect at AWS focused on HAQM OpenSearch Service. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.