自写Synthetics check工具的一些思考

数据库选型

InfluxDB vs TimescaleDB

前端考虑

D3? AntV? Grafana

一些设计

  • 几种获得SSL certificate的方式:
    • curl -v google.com
    • openssl s_client -connect google.com:443
    • python内ssl.get_server_certificate查看
  • 检验证书过期
import ssl,OpenSSL
import datetime

domain = 'google.com'

# get SSL Cert info
cert = ssl.get_server_certificate((domain, 443))
x509 = OpenSSL.crypto.load_certificate(OpenSSL.crypto.FILETYPE_PEM, cert)
x509info = x509.get_notAfter().decode("utf-8")

expiry_date = datetime.datetime.strptime(x509info[0:8], '%Y%m%d').date()
today_date = datetime.date.today()
expire_window = (expiry_date - today_date).days

print("SSL Certificate for domain", domain, "will be expired in", expire_window, "days, on", expiry_date)
  • 多线程ThreadPoolExecutor,同时把future里的报错提取出来
def job():
    # We can use a with statement to ensure threads are cleaned up promptly
    with concurrent.futures.ThreadPoolExecutor() as executor:
        #results = [executor.submit(syntheticscheck, URL) for URL in URL_LIST]
        # Start the syntheticscheck operations and mark each future with its URL
        future_to_url = {executor.submit(syntheticscheck, URL):URL for URL in URL_LIST}
        for future in concurrent.futures.as_completed(future_to_url):
            url = future_to_url[future]
            try:
                data = future.result()
            except Exception as exc:
                logger.error('%r generated an exception: %s' % (url, exc))
    memoryUse = ps.memory_info().rss
    print (pretty_memory_size(memoryUse))
  • schedule library解决定时查询的问题 (记得启动时schedule.clear())
  • tldextract library解决域名parsing问题
  • logging同时输出在console和日志文件
logger = logging.getLogger()
today_date = datetime.date.today()

#Format log
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
#Create Console handler, add formatter and add to logger
ch = logging.StreamHandler()
ch.setFormatter(formatter)
logger.addHandler(ch)
#Create File handler, add formatter and add to logger
fh = logging.FileHandler(r'./log-%s.txt' %today_date)
fh.setFormatter(formatter)
logger.addHandler(fh)
  • 获取脚本的内存使用状况以及Human-readable
#内存获取
import os, psutil

pid = os.getpid()
ps = psutil.Process(pid)

memoryUse = ps.memory_info().rss
pretty_memory_size(memoryUse)

#阅读友善
def pretty_memory_size(nbytes):
  metric = ("B", "kB", "MB", "GB", "TB")
  if nbytes == 0:
    return "%s %s" % ("0", "B")
  nunit = int(math.floor(math.log(nbytes, 1024)))
  nsize = round(nbytes/(math.pow(1024, nunit)), 2)
  return '%s %s' % (format(nsize, ".2f"), metric[nunit])   
  • psycopg2读写数据库, 如果是写,记得commit()(select命令不需要),同时sanitize input防止SQL注入
#读
import psycopg2

with psycopg2.connect(
    host = "localhost",
    database = "nyc_data", 
    user = "postgres", 
    password = "password"
) as conn:
    with conn.cursor() as cur:
        cur.execute("select * from rides LIMIT 10")
        rows = cur.fetchall()        
        for r in rows:
            print (r)

连接数据库时,也可以用过connection string来连接:

connection_string = "host=localhost user=postgres password=password dbname=nyc_data"
with psycopg2.connect(connection_string) as conn:

写数据库时,可以使用execute_values

import psycopg2
import datetime
from psycopg2.extras import execute_values

connection_string = "host=localhost user=agent password=agentPassword dbname=synchecks"

url_name = 'Microservices BT Prod Home domain'
url = 'https://newsapi.sphdigital.com/v1/feed/home/bt'
c_dt = '2020-03-10 10:29:52.890076'
r_c = '200'
r_d = '242.889'
c_l = 'singapore'

with psycopg2.connect(connection_string) as conn:
    with conn.cursor() as cur:
        values = []
        values.append((url_name, url, c_dt, r_c, r_d, c_l))
        execute_values(
            cur,
            """
            INSERT INTO synchecks (url_name, url, check_datetime, response_code, response_duration, check_location)
            VALUES %s
            """,
            values,
        )
        conn.commit()

遇到的一些问题:

  • [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure
  • grafana时间显示有点问题

可以继续挖掘的方向:

  • agent众多,缓存,streaming策略
  • timescaleDB的HA
  • 放去分布式的Elasticsearch上

References:
https://www.youtube.com/watch?v=2PDkXviEMD0&