[AWS][Python]使用Python自动生成CloudFormation等实现批量自动化部署AWS Canaries

[AWS][Python]使用Python自动生成CloudFormation等实现批量自动化部署AWS Canaries

需求

业务需要,想要监控一些API Endpoints(部分在AWS内网)。工具选型可参看[AWS][Elasticsearch]API Synthetics Monitoring工具的简单比较。最后选择的是AWS Cloudwatch Synthetics。用ApiCanaries探测,并且就success rate和duration两个metrics设置报警阈值,通知SNS。希望使用CloudFormation来Codify整个过程。

任务拆解与分析

Canary的实现

经过对手动生成的Canary的观察,发现Canary是借由lambda实现的,在console手动添加Canary后,会生成一个js的canary lambda, 而另一个相关配置的canary script (也是js, 可以自动生成或者自己写)会以layer的形式加载在js canary lambda之上,让js canary lambda来call它。

// Call customer's execution handler
let customerCanaryFilename = '/opt/nodejs/node_modules/' + fileName;

WX20200703-235944@2x

Js Canary lambda:

const log = require('SyntheticsLogger');
const synthetics = require('Synthetics');

exports.handler = async (event, context) => {
    const PASS_RESULT = 'PASSED';
    const FAIL_RESULT = 'FAILED';
    const NO_RESULT = 'ERROR';

    let canaryResult = NO_RESULT;
    let canaryError = null;
    let startTime = null;
    let endTime = null;
    let returnValue = null;
    let resetTime = null;
    let setupTime = null;
    let launchTime = null;

    try {
        let startCanaryLogString = "Start Canary";
        console.log(startCanaryLogString);

        resetTime = new Date();
        await log.reset();
        log.write(startCanaryLogString);

        await synthetics.reset();
        resetTime = new Date().getTime() - resetTime.getTime();
        
        setupTime = new Date();
        synthetics.setEventAndContext(event, context);
     
        await synthetics.beforeCanary();
        setupTime = new Date().getTime() - setupTime.getTime();
       
        launchTime = new Date();
        await synthetics.launch();
        launchTime = new Date().getTime() - launchTime.getTime();

    } catch (ex) {
        startTime = new Date();
        endTime = startTime;
        returnValue = await synthetics.afterCanary(canaryResult, canaryError, startTime, endTime, resetTime, setupTime, launchTime);
        let endCanaryLogString = "End Canary. Result: " + canaryResult;
        console.log(endCanaryLogString);
        log.write(endCanaryLogString);
        return context.fail(returnValue);
    }

    try {
        log.info('Start executing customer steps');

        startTime = new Date(); // Lambdas use UTC time

        let customerCanaryHandler = event.customerCanaryHandlerName;
        let fileName, functionName;
        if (customerCanaryHandler) {
            // Assuming handler format : fileName.functionName
            fileName = customerCanaryHandler.substring(0, customerCanaryHandler.indexOf("."));
            functionName = customerCanaryHandler.substring(customerCanaryHandler.indexOf(".") + 1);
            log.info(`Customer canary entry file name: ${JSON.stringify(fileName)}`);
            log.info(`Customer canary entry function name: ${JSON.stringify(functionName)}`);
        }

        // Call customer's execution handler        
        let customerCanaryFilename = '/opt/nodejs/node_modules/' + fileName;
        
        log.info(`Calling customer canary: ${customerCanaryFilename}.handler()`);
        let customerCanary = require(customerCanaryFilename);
        let response = await customerCanary.handler();
        log.info(`Customer canary response: ${JSON.stringify(response)}`);

        endTime = new Date();
        log.info('Finished executing customer steps');
        canaryResult = PASS_RESULT;
    } catch (error) {
        endTime = new Date();
        canaryResult = FAIL_RESULT;
        canaryError = error;
        log.error('Canary execution exception.', canaryError);
    }

    returnValue = await synthetics.afterCanary(canaryResult, canaryError, startTime, endTime, resetTime, setupTime, launchTime);
    let endCanaryLogString = "End Canary. Result: " + canaryResult;
    console.log(endCanaryLogString);
    log.write(endCanaryLogString);
    await log.deleteLogFile();
    return context.succeed(returnValue);
};

同时,Canary script基本上是可以复用的,将hostname, method, path,port替换成所需值即可,如需headers则另外加入,其他不变。这些就为自动化提供了条件。

最后,用CloudFormation生成Canaries时,可以将脚本直接嵌入CF(不推荐),也可以上传至s3bucket.此处注意打包zip时将要js放在nodejs/node_modules文件结构下,才会成功运行,原因正如上面解释的,它会被放在layer中并由let customerCanaryFilename = '/opt/nodejs/node_modules/' + fileName来消费。
还有此处Reference也提到了:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries_WritingCanary.html
WX20200704-005948@2x

Python自动化思路
  1. 先将要监控的API,hostname,path等信息以csv格式准备好。数据整备此处略过。
  2. 读取csv后,基于baseline.js和cvs中的信息,生成使用新名字与赋值的js们。此处用到了shutil.copy(src,dst) 做文本拷贝,还有fileinput来做文本中的内容替换。
  3. 使用zipfilelibrary打包并用boto3打包上传至S3bucket.
  4. 准备canary_base.yml。AWS::Synthetics::Canary和AWS::CloudWatch::Alarm的诸多参数可以从其他CF中依葫芦画瓢。注意资源的Naming, 因为有些扮演占位符,之后会进行string替代。
  5. 生成Bot CF的开头,直至"Resources:"行。
  6. 再次读取csv, 基于canary_base.yaml和cvs中的信息,循环生成CF内容,并将诸如“SyntheticsCanaryName”“HandlerName”等进行赋值替换,然后attach在bot生成的CF下面。(注意添加换行符)
  7. 上传CF,一气呵成。CF会在S3的zip中寻找到对应的js,并用其handler生成canaries。而Alarms则会根据Namespace和Dimensions里的CanaryName找到对应的检测指标,观察预警。

最终效果如下:
WX20200704-012637@2x
WX20200704-012736@2x-1
WX20200704-013055@2x

要注意的点(和一些坑)

  1. 这个CF只包括了Canaries和Alarms,而打包储存js canary script的S3,储存运行结果的S3, 运行Canary的ExecutionRole(包括相关policies,比如能写cloudwatchlogs),报警SNS Topic,则是先前手动创建的。当然你也可以放在这CF里生成,或者用python加入CF,但是边际效应实在太小,且容易出错。
  2. 暂时没有定义Canary所在的VPC。
  3. 这个项目我也使用了direnv,已经自动导入了AWS_PROFILE,所以boto3.session时并没有输入AWS KeyID.
  4. 【缺点】自动部署有个缺点,当你更新了Canary Script后,此时Canary依旧是由旧的脚本生成,它并不会检测新的Canary script并刷新资源。
  5. 【缺点】目前CF并不能检测到关于Canary的drift。尝试过删除个别Canary, 但CF Stack依旧显示In-Sync。
  6. Canary的path,别忘了在最前面加"/",比如"/v1/blahblah",不然会导致检测失败。
  7. Canary的命名只允许21个字符,之前太长了导致失败。
  8. Canary有quota,"100 per Region per account in the following Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo). 20 per Region per account in all other Regions." 可以问aws申请要更多。
  9. 【局限】因为打包在zip里,所以在Canary的界面里无法直接查看canary script,手动生成的可以。
  10. 【局限】CF生成的Canary无法显示关联的Alarm资源,手动生成的可以。

WX20200704-012226@2x

实现

最后附上文件夹结构与Python代码示例:
WX20200704-011031@2x
canaryBot.py:

import shutil
import csv
import fileinput
import boto3
import os
import zipfile

#make a copy of the invoice to work with
src="baseline.js"
SCRIPT_BUCKET_NAME = 'xxxxxx-canary-script'
YOUR_AWS_ACCOUNT_REGION = 'ap-southeast-1' 

def replace_in_file(file_path, search_text, new_text):
    with fileinput.input(file_path, inplace=True) as f:
        for line in f:
            new_line = line.replace(search_text, new_text)
            print (new_line, end='')

def upload_files(path):
    session = boto3.Session(
        region_name="%s" %YOUR_AWS_ACCOUNT_REGION
    )
    s3 = session.resource('s3')
    bucket = s3.Bucket(SCRIPT_BUCKET_NAME)
 
    for subdir, dirs, files in os.walk(path):
        for file in files:
            full_path = os.path.join(subdir, file)
            with open(full_path, 'rb') as data:
                bucket.put_object(Key=full_path[len(path)+1:], Body=data)

def upload_one_file(file):
    s3 = boto3.client('s3')
    s3.upload_file(file, SCRIPT_BUCKET_NAME, file)

def zipdir(path, ziph):
    # ziph is zipfile handle
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file))


def main():
    # Generate js files based on baseline js and CSV input
    with open('xxxxx Endpoints - Sheet2.csv', 'r') as file:
        reader = csv.reader(file)
        for row in reader:
            filename = row[2].lower()
            hostname = row[3]
            path = row[4]
            method = row[6]
            #print (method)
            dst="./nodejs/node_modules/%s.js" %filename
            shutil.copy(src,dst)
            search_text = 'const requestOptions = {"hostname":"xxxxx.xxxxx.com","method":"GET","path":"/v1/yyyy/zzzz/qqqq","port":443}'
            new_text = 'const requestOptions = {"hostname":"%s","method":"%s","path":"%s","port":443}' %(hostname,method,path)
            replace_in_file(dst, search_text, new_text)            
    # Zip js and upload to s3
    zipf = zipfile.ZipFile('canaryscripts.zip', 'w', zipfile.ZIP_DEFLATED)
    zipdir('./nodejs', zipf)
    zipf.close()
    upload_one_file('canaryscripts.zip')

    # Generate cloudformation template
    with open("canary_bot_gened.yaml", "w") as file_object:
        file_object.write("---\nAWSTemplateFormatVersion: '2010-09-09'\nDescription: 'Canary for API Endpoint Monitoring '\nResources:\n")
    with open('xxxxx Endpoints - Sheet2.csv', 'r') as file1:
        reader = csv.reader(file1)
        for row in reader:
            filename = row[2].lower()
            hostname = row[3]
            path = row[4]
            method = row[6]
            with open('canary_base.yml', 'r') as file2:
                data = file2.read()
                # print(type(data))
                # print(data)
                data = data.replace("SyntheticsCanaryName", filename) 
                data = data.replace("CanaryExecutionRoleArn", "arn:aws:iam::xxxxxxx:role/service-role/CloudWatchSyntheticsRole-yyyyy-103-a2zzzzzzz1737d")
                data = data.replace("HandlerName", filename) 
                data = data.replace("S3CanaryResultArn", "s3://xxxxxx-canary-result")
                data = data.replace("CanaryNameSuccessPercentAlarm", filename+"SuccessPercentAlarm")
                data = data.replace("CanaryNameDurationAlarm", filename+"DurationAlarm")
                data = data.replace("CanaryAlarmSNSArn", "arn:aws:sns:ap-southeast-1:0000000000:xxxxxx-Alarm-test")
                #print(data)
            with open("canary_bot_gened.yaml", "a") as file_object:
                file_object.write(data)
                file_object.write('\n')
                
if __name__ == '__main__':
    main()

References:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-synthetics-canary.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries_WritingCanary.html

Subscribe to 隅

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe