AWS Big Data Blog
Use the default IAM role in HAQM Redshift to simplify accessing other AWS services
HAQM Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL. HAQM Redshift offers up to three times better price performance than any other cloud data warehouse, and can expand to petabyte scale. Today, tens of thousands of AWS customers use HAQM Redshift to run mission-critical business intelligence dashboards, analyze real-time streaming data, and run predictive analytics jobs.
Many features in HAQM Redshift access other services, for example, when loading data from HAQM Simple Storage Service (HAQM S3). This requires you to create an AWS Identity and Access Management (IAM) role and grant that role to the HAQM Redshift cluster. Historically, this has required some degree of expertise to set up access configuration with other AWS services. For details about IAM roles and how to use them, see Create an IAM role for HAQM Redshift.
This post discusses the introduction of the default IAM role, which simplifies the use of other services such as HAQM S3, HAQM SageMaker, AWS Lambda, HAQM Aurora, and AWS Glue by allowing you to create an IAM role from the HAQM Redshift console and assign it as the default IAM role to new or existing HAQM Redshift cluster. The default IAM role simplifies SQL operations that access other AWS services (such as COPY, UNLOAD, CREATE EXTERNAL FUNCTION, CREATE EXTERNAL SCHEMA, CREATE MODEL, or CREATE LIBRARY) by eliminating the need to specify the HAQM Resource Name (ARN) for the IAM role.
Overview of solution
The HAQM Redshift SQL commands for COPY, UNLOAD, CREATE EXTERNAL FUNCTION, CREATE EXTERNAL TABLE, CREATE EXTERNAL SCHEMA, CREATE MODEL, or CREATE LIBRARY historically require the role ARN to be passed as an argument. Usually, these roles and accesses are set up by admin users. Most data analysts and data engineers using these commands aren’t authorized to view cluster authentication details. To eliminate the need to specify the ARN for the IAM role, HAQM Redshift now provides a new managed IAM policy HAQMRedshiftAllCommandsFullAccess
, which has required privileges to use other related services such as HAQM S3, SageMaker, Lambda, Aurora, and AWS Glue. This policy is used for creating the default IAM role via the HAQM Redshift console. End-users can use the default IAM role by specifying IAM_ROLE with the DEFAULT keyword. When you use the HAQM Redshift console to create IAM roles, HAQM Redshift keeps track of all IAM roles created and preselects the most recent default role for all new cluster creations and restores from snapshots.
The HAQM Redshift default IAM role simplifies authentication and authorization with the following benefits:
- It allows users to run SQL commands without providing the IAM role’s ARN
- It avoids the need to use multiple AWS Management Console pages to create the HAQM Redshift cluster and IAM role
- You don’t need to reconfigure default IAM roles every time HAQM Redshift introduces a new feature, which requires additional permission, because HAQM Redshift can modify or extend the AWS managed policy, which is attached to the default IAM role, as required
To demonstrate this, first we create an IAM role through the HAQM Redshift console that has a policy with permissions to run SQL commands such as COPY, UNLOAD, CREATE EXTERNAL FUNCTION, CREATE EXTERNAL TABLE, CREATE EXTERNAL SCHEMA, CREATE MODEL, or CREATE LIBRARY. We also demonstrate how to make an existing IAM role the default role, and remove a role as default. Then we show you how to use the default role with various SQL commands, and how to restrict access to the role.
Create a new cluster and set up the IAM default role
The default IAM role is supported in both HAQM Redshift clusters and HAQM Redshift Serverless (preview). To create a new cluster and configure our IAM role as the default role, complete the following steps:
- On the HAQM Redshift console, choose Clusters in the navigation pane.
This page lists the clusters in your account in the current Region. A subset of properties of each cluster is also displayed.
- Choose Create cluster.
- Follow the instructions to enter the properties for cluster configuration.
- If you know the required size of your cluster (that is, the node type and number of nodes), choose I’ll choose.
- Choose the node type and number of nodes.
If you don’t know how large to size your cluster, choose Help me choose. Doing this starts a sizing calculator that asks you questions about the size and query characteristics of the data that you plan to store in your data warehouse.
- Follow the instructions to enter properties for database configurations.
- Under Associated IAM roles, on the Manage IAM roles menu, choose Create IAM role.
- To specify an S3 bucket for the IAM role to access, choose one of the following methods:
- Choose No additional S3 bucket to create the IAM role without specifying specific S3 buckets.
- Choose Any S3 bucket to allow users that have access to your HAQM Redshift cluster to also access any S3 bucket and its contents in your AWS account.
- Choose Specific S3 buckets to specify one or more S3 buckets that the IAM role being created has permission to access. Then choose one or more S3 buckets from the table.
- Choose Create IAM role as default.
HAQM Redshift automatically creates and sets the IAM role as the default for your cluster.
- Choose Create cluster to create the cluster.
The cluster might take several minutes to be ready to use. You can verify the new default IAM role under Cluster permissions.
You can only have one IAM role set as the default for the cluster. If you attempt to create another IAM role as the default for the cluster when an existing IAM role is currently assigned as the default, the new IAM role replaces the other IAM role as default.
Make an existing IAM role the default for your new or existing cluster
You can also attach your existing role to the cluster and make it default IAM role for more granular control of permissions with customized managed polices.
- On the HAQM Redshift console, choose Clusters in the navigation pane.
- Choose the cluster you want to associate IAM roles with.
- Under Associated IAM roles, on the Manage IAM roles menu, choose Associated IAM roles.
- Select an IAM role that you want make the default for the cluster.
- Choose Associate IAM roles.
- Under Associated IAM roles, on the Set default menu, choose Make default.
- When prompted, choose Set default to confirm making the specified IAM role the default.
- Choose Confirm.
Your IAM role is now listed as default.
Make an IAM role no longer default for your cluster
You can make an IAM role no longer the default role by changing the cluster permissions.
- On the HAQM Redshift console, choose Clusters in the navigation pane.
- Choose the cluster that you want to associate IAM roles with.
- Under Associated IAM roles, select the default IAM role.
- On the Set default menu, choose Clear default.
- When prompted, choose Clear default to confirm.
Use the default IAM role to run SQL commands
Now we demonstrate how to use the default IAM role in SQL commands like COPY, UNLOAD, CREATE EXTERNAL FUNCTION, CREATE EXTERNAL TABLE, CREATE EXTERNAL SCHEMA, and CREATE MODEL using HAQM Redshift ML.
To run SQL commands, we use HAQM Redshift Query Editor V2, a web-based tool that you can use to explore, analyze, share, and collaborate on data stored on HAQM Redshift. It supports data warehouses on HAQM Redshift and data lakes through HAQM Redshift Spectrum. However, you can use the default IAM role with any tools of your choice.
For additional information, see Introducing HAQM Redshift Query Editor V2, a Free Web-based Query Authoring Tool for Data Analysts.
First verify the cluster is using the default IAM role, as shown in the following screenshot.
Load data from HAQM S3
The SQL in the following screenshot describes how to load data from HAQM S3 using the default IAM role.
Unload data to HAQM S3
With an HAQM Redshift lake house architecture, you can query data in your data lake and write data back to your data lake in open formats using the UNLOAD command. After the data files are in HAQM S3, you can share the data with other services for further processing.
The SQL in the following screenshot describes how to unload data to HAQM S3 using the default IAM role.
Create an ML model
Redshift ML enables SQL users to create, train, and deploy machine learning (ML) models using familiar SQL commands. The SQL in the following screenshot describes how to build an ML model using the default IAM role. We use the Iris dataset from the UCI Machine Learning Repository.
Create an external schema and external table
Redshift Spectrum is a feature of HAQM Redshift that allows you to perform SQL queries on data stored in S3 buckets using external schema and external tables. This eliminates the need to move data from a storage service to a database, and instead directly queries data inside an S3 bucket. Redshift Spectrum also expands the scope of a given query because it extends beyond a user’s existing HAQM Redshift data warehouse nodes and into large volumes of unstructured S3 data lakes.
The following SQL describes how to use the default IAM role in the CREATE EXTERNAL SCHEMA command. For more information, see Querying external data using HAQM Redshift Spectrum
The default IAM role requires redshift
as part of the catalog database name or resources tagged with the HAQM Redshift service tag due to security considerations. You can customize the policy attached to default role as per your security requirement. In the following example, we use the AWS Glue Data Catalog name redshift_data
.
Restrict access to the default IAM role
To control access privileges of the IAM role created and set it as default for your HAQM Redshift cluster, use the ASSUMEROLE privilege. This access control applies to database users and groups when they run commands such as COPY and UNLOAD. After you grant the ASSUMEROLE privilege to a user or group for the IAM role, the user or group can assume that role when running these commands. With the ASSUMEROLE privilege, you can grant access to the appropriate commands as required.
Best practices
HAQM Redshift uses the AWS security frameworks to implement industry-leading security in the areas of authentication, access control, auditing, logging, compliance, data protection, and network security. For more information, refer to Security in HAQM Redshift and Security best practices in IAM.
Conclusion
This post showed you how the default IAM role simplifies SQL operations that access other AWS services by eliminating the need to specify the ARN for the IAM role. This new functionality helps make HAQM Redshift easier than ever to use, and reduces reliance on an administrator to wrangle these permissions.
As an administrator, you can start using the default IAM role to grant IAM permissions to your Redshift cluster and allow your end-users such as data analysts and developers to use default IAM role with their SQL commands without having to provide the ARN for the IAM role.
About the Authors
Nita Shah is an Analytics Specialist Solutions Architect at AWS based out of New York. She has been building data warehouse solutions for over 20 years and specializes in HAQM Redshift. She is focused on helping customers design and build enterprise-scale well-architected analytics and decision support platforms.
Evgenii Rublev is a Software Development Engineer on the AWS Redshift team. He has worked on building end-to-end applications for over 10 years. He is passionate about innovations in building high-availability and high-performance applications to drive a better customer experience. Outside of work, Evgenii enjoys spending time with his family, traveling, and reading books.
Debu Panda, a Principal Product Manager at AWS, is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world. Debu has published numerous articles on analytics, enterprise Java, and databases and has presented at multiple conferences such as re:Invent, Oracle Open World, and Java One. He is lead author of the EJB 3 in Action (Manning Publications 2007, 2014) and Middleware Management (Packt).