Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
In lab 2 you learned how to extend your AWS Cost & Usage Report data by additional data & assumptions in Amazon Athena views. The manual approach from the previous lab is tedious and not feasible if you need to provide larger data sets. In this lab you learn to add more data in CSVs via an Infrastructure as Code (IaC) approach.
You will customize the library introduced in lab 1.4 for your own needs.
This lab utilises the AWS CDK . If you do not already have the AWS CDK installed in your environment, please follow the prerequisites and installation guide from the AWS CDK getting started page .
git clone git@github.com:aws-samples/aws-usage-queries.git
npm install
.npm run watch
. Using run watch allows for continuous compilation as you make further changes to files later in the lab.cdk deploy
:cdk deploy \
--parameters CurBucketName=<bucket name> \
--parameters ReportPathPrefix=<path without leading or trailing slash> \
--parameters ReportName=<report name> \
--databaseName=<optional databasename override, default: aws_usage_queries_database>
Note If you still have the serverless application repository app deployed from lab 1.4
, you will need to delete the stack named serverlessrepo-aws-usage-queries
from AWS CloudFormation console before deploying via the CDK command to remove resource conflicts.
Note If you have not previously used AWS CDK in your AWS account, you may need to bootstrap the account by running cdk bootstrap aws://ACCOUNT-NUMBER/REGION
. Please review the AWS CDK getting started page
for more information.
Now you have the AWS CDK stack deployed in your account, the next step is to bring your own assumptions. You can use the example data to start, but you are encouraged to bring your own assumptions.
referenceData
directory. Within this directory, you will see the instanceTypes data used in earlier labs.instanceFamilyPoints
. You should end up with ../referenceData/instanceFamilyPoints
.data.csv
. You should end up with ../referenceData/instanceFamilyPoints/data.csv
.instance_family,points
t3,2
t2,2
t3a,1
m5,2
../referenceData/regionPoints/data.csv
and populate with the following example assumptions.region,points
us-east-1,2
eu-central-1,1
Next we will modify the ../lib/aws-usage-queries.ts
file to include our new assumption data. The file already includes some constructs we can use to ingest our assumption data defined above.
referenceInstanceTypes
. A complete file with the new code added can be found here.const regionPoints = new Table(this, "regionPoints", {
columns: [
{ name: "region", type: Schema.STRING },
{ name: "points", type: Schema.DOUBLE }
],
database: database,
tableName: "region_points",
s3Prefix "regionPoints"
bucket: referenceDataBucket,
dataFormat: {
outputFormat: OutputFormat.HIVE_IGNORE_KEY_TEXT,
inputFormat: InputFormat.TEXT,
serializationLibrary: SerializationLibrary.OPEN_CSV,
}
});
setSerdeInfo(regionPoints, csvProperties);
const instanceFamilyPoints = new Table(this, "instanceFamilyPoints", {
columns: [
{ name: "instance_family", type: Schema.STRING },
{ name: "points", type: Schema.DOUBLE }
],
database: database,
tableName: "instance_family_points",
s3Prefix "instanceFamilyPoints"
bucket: referenceDataBucket,
dataFormat: {
outputFormat: OutputFormat.HIVE_IGNORE_KEY_TEXT,
inputFormat: InputFormat.TEXT,
serializationLibrary: SerializationLibrary.OPEN_CSV,
}
});
setSerdeInfo(instanceFamilyPoints, csvProperties);
Let’s explore these objects further by diving into the regionPoints object. Two columns are defined as region and points as a string and double data types respectively. The table name is created as region_points and we provide the prefix of the S3 location where the data is stored. This should match the name of your local directory where the region points data.csv file is located. The dataFormat section is essentially telling Amazon Athena that the data type is text.
aws-usage-queries.ts
. Then, in your npm run watch
terminal, you should see the file change detected and an incremental compilation.cdk deploy
in the terminal. This will take a few moments whilst the data is copied to the S3 bucket and the new tables are created in Athena.../instanceFamilyPoints/data.csv
data file.
Now the new tables have been created, they can be used in the same query used in lab 2
SELECT instance_family,
region,
account_id,
purchase_option,
SUM(vcpu_hours) vcpu_hours,
year,
month,
SUM(f.points * r.points * vcpu_hours) points
FROM monthly_vcpu_hours_by_account
JOIN region_points r
USING (region)
JOIN instance_family_points f
USING (instance_family)
GROUP BY instance_family, region, account_id, purchase_option, year, month
ORDER BY 8 DESC
If you don’t return any results, try troubleshooting your data. Do you have any instances of the defined types in the regions you have given points?
Congratulations! You have now brought your own assumptions, defined them in infrastructure code, and deployed them with AWS CDK. Defining assumptions as IaC brings the benefit of change tracking and makes the reporting more maintainable. Make sure to follow the clean up instructions to remove unrequired resources after running the lab.
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool