Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
Now, let’s validate consumer cluster access to the producer cluster datashare and objects. And then run queries from the consumer cluster against objects stored in producer cluster.
Connect to the consumer cluster in us-west-1
to perform below steps:
SELECT * FROM svv_datashares;
You can see MarketingShare is an INBOUND type data share type.
SELECT * FROM svv_datashare_objects;
You can see a list of objects and types (schema, table etc.) shared, and all of them are as INBOUND share type.
To consume the shared data, each consumer cluster administrator creates an Amazon Redshift database from the datashare:
us-east-1
and note down cluster namespace from the Amazon Redshift cluster details page:CREATE DATABASE consumer_marketing FROM DATASHARE MarketingShare of NAMESPACE 'replace_with_your_producer_cluster_namespace';
Users and groups can list the shared objects as part of the standard metadata queries by viewing the following metadata system views and can start querying data immediately:
SELECT * FROM SVV_REDSHIFT_DATABASES;
You can see the database type is “shared” for the “consumer_marketing” database.
SELECT * FROM SVV_REDSHIFT_SCHEMAS WHERE database_name = 'consumer_marketing';
You can see the schema type is “shared” for the schemas shared via the “consumer_marketing” database.
SELECT * FROM SVV_REDSHIFT_TABLES WHERE database_name = 'consumer_marketing';
You can now run queries on the data in producer cluster, without having data stored locally on the consumer cluster. You can also run queries by joining tables from your local database, and tables shared from the producer cluster.
SELECT COUNT(*) FROM consumer_marketing.public.lab_event;
The above query fetches the total events from the lab_event table stored in the producer cluster in us-east-1
via the datashare created earlier. The table is not stored locally on the consumer cluster in us-west-1
, so it reduced the data storage by half. You can also join consumer cluster locally stored tables with producer cluster shared tables in SQL queries.
This is in line with our Sustainability improvement goal for optimizing data patterns by removing unneeded or redundant data, and minimizing data movement across networks.
One key consideration is to note here is that, during the query execution, it did transfer the query processed result dataset over network, but limited to the result set (whereas the previous deployment transfers 10% of the total dataset every night part of the refresh cycle). Depending on the use case, trade-off analysis should be performed comparing daily refresh data transfer versus all queries execution data transfer over network.
We have now validated that the consumer cluster can access the data shared by the producer cluster, and ran queries against the producer database. Next, we will revisit metrics and KPIs to measure the sustainability optimization achieved by implementing the Amazon Redshift Data Sharing feature.
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool