We will use AWS Glue and setup a scheduled Crawler, which will run each day. This crawler will scan the CUR files and create a database and tables for the delivered files. If there are new versions of a CUR, or new months delivered - they will be automatically included.
We will use Athena to access and view our CUR files via SQL. Athena is a serverless solution to be able to execute SQL queries across very large amounts of data. Athena is only charged for data that is scanned, and there are no ongoing costs if data is not being queried, unlike a traditional database solution.
Go to the Glue console:
Click on Get started if you have not used Glue before
Ensure you are in the region where your CUR files are delivered, click on Crawlers and click Add crawler:
Enter a Crawler name starting with Cost, and click Next:
Select Data stores, and click Next:
Ensure you select Specified path in another account, and enter the S3 path of your bucket s3://(CUR bucket), expand Exclude patterns, enter the following patterns one line at a time and click next:
**.json, **.yml, **.sql, **.csv, **.gz, **.zip
Add another data store, click Next:
Select Create an IAM role, enter a role name of Cost_MasterCrawler, and click Next:
Click the Down arrow, and select a Daily Frequency:
Enter in a Start Hour and Start Minute, then click Next:
Click Add database:
Enter a Database name of costmaster, and click Create:
Review the crawler and click Finish:
Select the checkbox next to the crawler, click Run crawler:
You will see the Crawler was successful and created a table:
Select the costmaster database that Glue created:
Click Tables in costmaster:
Click the table name:
Verify the recordCount is not zero, if it is - go back and verify the steps above:
Go to the Athena Console:
Select the drop down arrow, and click on the new database:
A new table will have been created (named after the CUR), we will now load the partitions. Click on the 3 dot menu and select Load partitions:
You will see it execute the command MSCK REPAIR TABLE, and in the results it may add partitions to the metastore for each month that has a billing file:
NOTE: It may or may not add partitions and show the messages above.
If you are using the supplied files for this lab, check:
We will now preview the data. Click on the 3 dot menu and select Preview table:
It will execute a Select * from query, and in the results you will see the first 10 lines of your CUR file:
You have successfully setup your CUR file to be analyzed. You can now query your usage and costs via SQL.