[AWS][Python]使用Python自动生成CloudFormation等实现批量自动化部署AWS Canaries
需求
业务需要,想要监控一些API Endpoints(部分在AWS内网)。工具选型可参看[AWS][Elasticsearch]API Synthetics Monitoring工具的简单比较。最后选择的是AWS Cloudwatch Synthetics。用ApiCanaries探测,并且就success rate和duration两个metrics设置报警阈值,通知SNS。希望使用CloudFormation来Codify整个过程。
任务拆解与分析
Canary的实现
经过对手动生成的Canary的观察,发现Canary是借由lambda实现的,在console手动添加Canary后,会生成一个js的canary lambda, 而另一个相关配置的canary script (也是js, 可以自动生成或者自己写)会以layer的形式加载在js canary lambda之上,让js canary lambda来call它。
// Call customer's execution handler
let customerCanaryFilename = '/opt/nodejs/node_modules/' + fileName;
Js Canary lambda:
const log = require('SyntheticsLogger');
const synthetics = require('Synthetics');
exports.handler = async (event, context) => {
const PASS_RESULT = 'PASSED';
const FAIL_RESULT = 'FAILED';
const NO_RESULT = 'ERROR';
let canaryResult = NO_RESULT;
let canaryError = null;
let startTime = null;
let endTime = null;
let returnValue = null;
let resetTime = null;
let setupTime = null;
let launchTime = null;
try {
let startCanaryLogString = "Start Canary";
console.log(startCanaryLogString);
resetTime = new Date();
await log.reset();
log.write(startCanaryLogString);
await synthetics.reset();
resetTime = new Date().getTime() - resetTime.getTime();
setupTime = new Date();
synthetics.setEventAndContext(event, context);
await synthetics.beforeCanary();
setupTime = new Date().getTime() - setupTime.getTime();
launchTime = new Date();
await synthetics.launch();
launchTime = new Date().getTime() - launchTime.getTime();
} catch (ex) {
startTime = new Date();
endTime = startTime;
returnValue = await synthetics.afterCanary(canaryResult, canaryError, startTime, endTime, resetTime, setupTime, launchTime);
let endCanaryLogString = "End Canary. Result: " + canaryResult;
console.log(endCanaryLogString);
log.write(endCanaryLogString);
return context.fail(returnValue);
}
try {
log.info('Start executing customer steps');
startTime = new Date(); // Lambdas use UTC time
let customerCanaryHandler = event.customerCanaryHandlerName;
let fileName, functionName;
if (customerCanaryHandler) {
// Assuming handler format : fileName.functionName
fileName = customerCanaryHandler.substring(0, customerCanaryHandler.indexOf("."));
functionName = customerCanaryHandler.substring(customerCanaryHandler.indexOf(".") + 1);
log.info(`Customer canary entry file name: ${JSON.stringify(fileName)}`);
log.info(`Customer canary entry function name: ${JSON.stringify(functionName)}`);
}
// Call customer's execution handler
let customerCanaryFilename = '/opt/nodejs/node_modules/' + fileName;
log.info(`Calling customer canary: ${customerCanaryFilename}.handler()`);
let customerCanary = require(customerCanaryFilename);
let response = await customerCanary.handler();
log.info(`Customer canary response: ${JSON.stringify(response)}`);
endTime = new Date();
log.info('Finished executing customer steps');
canaryResult = PASS_RESULT;
} catch (error) {
endTime = new Date();
canaryResult = FAIL_RESULT;
canaryError = error;
log.error('Canary execution exception.', canaryError);
}
returnValue = await synthetics.afterCanary(canaryResult, canaryError, startTime, endTime, resetTime, setupTime, launchTime);
let endCanaryLogString = "End Canary. Result: " + canaryResult;
console.log(endCanaryLogString);
log.write(endCanaryLogString);
await log.deleteLogFile();
return context.succeed(returnValue);
};
同时,Canary script基本上是可以复用的,将hostname, method, path,port替换成所需值即可,如需headers则另外加入,其他不变。这些就为自动化提供了条件。
最后,用CloudFormation生成Canaries时,可以将脚本直接嵌入CF(不推荐),也可以上传至s3bucket.此处注意打包zip时将要js放在nodejs/node_modules文件结构下,才会成功运行,原因正如上面解释的,它会被放在layer中并由let customerCanaryFilename = '/opt/nodejs/node_modules/' + fileName
来消费。
还有此处Reference也提到了:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries_WritingCanary.html
Python自动化思路
- 先将要监控的API,hostname,path等信息以csv格式准备好。数据整备此处略过。
- 读取csv后,基于baseline.js和cvs中的信息,生成使用新名字与赋值的js们。此处用到了shutil.copy(src,dst) 做文本拷贝,还有fileinput来做文本中的内容替换。
- 使用zipfilelibrary打包并用boto3打包上传至S3bucket.
- 准备canary_base.yml。AWS::Synthetics::Canary和AWS::CloudWatch::Alarm的诸多参数可以从其他CF中依葫芦画瓢。注意资源的Naming, 因为有些扮演占位符,之后会进行string替代。
- 生成Bot CF的开头,直至"Resources:"行。
- 再次读取csv, 基于canary_base.yaml和cvs中的信息,循环生成CF内容,并将诸如“SyntheticsCanaryName”“HandlerName”等进行赋值替换,然后attach在bot生成的CF下面。(注意添加换行符)
- 上传CF,一气呵成。CF会在S3的zip中寻找到对应的js,并用其handler生成canaries。而Alarms则会根据Namespace和Dimensions里的CanaryName找到对应的检测指标,观察预警。
最终效果如下:
要注意的点(和一些坑)
- 这个CF只包括了Canaries和Alarms,而打包储存js canary script的S3,储存运行结果的S3, 运行Canary的ExecutionRole(包括相关policies,比如能写cloudwatchlogs),报警SNS Topic,则是先前手动创建的。当然你也可以放在这CF里生成,或者用python加入CF,但是边际效应实在太小,且容易出错。
- 暂时没有定义Canary所在的VPC。
- 这个项目我也使用了direnv,已经自动导入了AWS_PROFILE,所以boto3.session时并没有输入AWS KeyID.
- 【缺点】自动部署有个缺点,当你更新了Canary Script后,此时Canary依旧是由旧的脚本生成,它并不会检测新的Canary script并刷新资源。
- 【缺点】目前CF并不能检测到关于Canary的drift。尝试过删除个别Canary, 但CF Stack依旧显示In-Sync。
- Canary的path,别忘了在最前面加"/",比如"/v1/blahblah",不然会导致检测失败。
- Canary的命名只允许21个字符,之前太长了导致失败。
- Canary有quota,"100 per Region per account in the following Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo). 20 per Region per account in all other Regions." 可以问aws申请要更多。
- 【局限】因为打包在zip里,所以在Canary的界面里无法直接查看canary script,手动生成的可以。
- 【局限】CF生成的Canary无法显示关联的Alarm资源,手动生成的可以。
实现
最后附上文件夹结构与Python代码示例:
canaryBot.py:
import shutil
import csv
import fileinput
import boto3
import os
import zipfile
#make a copy of the invoice to work with
src="baseline.js"
SCRIPT_BUCKET_NAME = 'xxxxxx-canary-script'
YOUR_AWS_ACCOUNT_REGION = 'ap-southeast-1'
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as f:
for line in f:
new_line = line.replace(search_text, new_text)
print (new_line, end='')
def upload_files(path):
session = boto3.Session(
region_name="%s" %YOUR_AWS_ACCOUNT_REGION
)
s3 = session.resource('s3')
bucket = s3.Bucket(SCRIPT_BUCKET_NAME)
for subdir, dirs, files in os.walk(path):
for file in files:
full_path = os.path.join(subdir, file)
with open(full_path, 'rb') as data:
bucket.put_object(Key=full_path[len(path)+1:], Body=data)
def upload_one_file(file):
s3 = boto3.client('s3')
s3.upload_file(file, SCRIPT_BUCKET_NAME, file)
def zipdir(path, ziph):
# ziph is zipfile handle
for root, dirs, files in os.walk(path):
for file in files:
ziph.write(os.path.join(root, file))
def main():
# Generate js files based on baseline js and CSV input
with open('xxxxx Endpoints - Sheet2.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
filename = row[2].lower()
hostname = row[3]
path = row[4]
method = row[6]
#print (method)
dst="./nodejs/node_modules/%s.js" %filename
shutil.copy(src,dst)
search_text = 'const requestOptions = {"hostname":"xxxxx.xxxxx.com","method":"GET","path":"/v1/yyyy/zzzz/qqqq","port":443}'
new_text = 'const requestOptions = {"hostname":"%s","method":"%s","path":"%s","port":443}' %(hostname,method,path)
replace_in_file(dst, search_text, new_text)
# Zip js and upload to s3
zipf = zipfile.ZipFile('canaryscripts.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir('./nodejs', zipf)
zipf.close()
upload_one_file('canaryscripts.zip')
# Generate cloudformation template
with open("canary_bot_gened.yaml", "w") as file_object:
file_object.write("---\nAWSTemplateFormatVersion: '2010-09-09'\nDescription: 'Canary for API Endpoint Monitoring '\nResources:\n")
with open('xxxxx Endpoints - Sheet2.csv', 'r') as file1:
reader = csv.reader(file1)
for row in reader:
filename = row[2].lower()
hostname = row[3]
path = row[4]
method = row[6]
with open('canary_base.yml', 'r') as file2:
data = file2.read()
# print(type(data))
# print(data)
data = data.replace("SyntheticsCanaryName", filename)
data = data.replace("CanaryExecutionRoleArn", "arn:aws:iam::xxxxxxx:role/service-role/CloudWatchSyntheticsRole-yyyyy-103-a2zzzzzzz1737d")
data = data.replace("HandlerName", filename)
data = data.replace("S3CanaryResultArn", "s3://xxxxxx-canary-result")
data = data.replace("CanaryNameSuccessPercentAlarm", filename+"SuccessPercentAlarm")
data = data.replace("CanaryNameDurationAlarm", filename+"DurationAlarm")
data = data.replace("CanaryAlarmSNSArn", "arn:aws:sns:ap-southeast-1:0000000000:xxxxxx-Alarm-test")
#print(data)
with open("canary_bot_gened.yaml", "a") as file_object:
file_object.write(data)
file_object.write('\n')
if __name__ == '__main__':
main()
References:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-synthetics-canary.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries_WritingCanary.html