自写Synthetics check工具的一些思考
数据库选型
InfluxDB vs TimescaleDB
前端考虑
D3? AntV? Grafana
一些设计
- 几种获得SSL certificate的方式:
- curl -v google.com
- openssl s_client -connect google.com:443
- python内ssl.get_server_certificate查看
- 检验证书过期
import ssl,OpenSSL
import datetime
domain = 'google.com'
# get SSL Cert info
cert = ssl.get_server_certificate((domain, 443))
x509 = OpenSSL.crypto.load_certificate(OpenSSL.crypto.FILETYPE_PEM, cert)
x509info = x509.get_notAfter().decode("utf-8")
expiry_date = datetime.datetime.strptime(x509info[0:8], '%Y%m%d').date()
today_date = datetime.date.today()
expire_window = (expiry_date - today_date).days
print("SSL Certificate for domain", domain, "will be expired in", expire_window, "days, on", expiry_date)
- 多线程ThreadPoolExecutor,同时把future里的报错提取出来
def job():
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor() as executor:
#results = [executor.submit(syntheticscheck, URL) for URL in URL_LIST]
# Start the syntheticscheck operations and mark each future with its URL
future_to_url = {executor.submit(syntheticscheck, URL):URL for URL in URL_LIST}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
logger.error('%r generated an exception: %s' % (url, exc))
memoryUse = ps.memory_info().rss
print (pretty_memory_size(memoryUse))
- schedule library解决定时查询的问题 (记得启动时
schedule.clear()
) - tldextract library解决域名parsing问题
- logging同时输出在console和日志文件
logger = logging.getLogger()
today_date = datetime.date.today()
#Format log
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
#Create Console handler, add formatter and add to logger
ch = logging.StreamHandler()
ch.setFormatter(formatter)
logger.addHandler(ch)
#Create File handler, add formatter and add to logger
fh = logging.FileHandler(r'./log-%s.txt' %today_date)
fh.setFormatter(formatter)
logger.addHandler(fh)
- 获取脚本的内存使用状况以及Human-readable
#内存获取
import os, psutil
pid = os.getpid()
ps = psutil.Process(pid)
memoryUse = ps.memory_info().rss
pretty_memory_size(memoryUse)
#阅读友善
def pretty_memory_size(nbytes):
metric = ("B", "kB", "MB", "GB", "TB")
if nbytes == 0:
return "%s %s" % ("0", "B")
nunit = int(math.floor(math.log(nbytes, 1024)))
nsize = round(nbytes/(math.pow(1024, nunit)), 2)
return '%s %s' % (format(nsize, ".2f"), metric[nunit])
- psycopg2读写数据库, 如果是写,记得
commit()
(select命令不需要),同时sanitize input防止SQL注入
#读
import psycopg2
with psycopg2.connect(
host = "localhost",
database = "nyc_data",
user = "postgres",
password = "password"
) as conn:
with conn.cursor() as cur:
cur.execute("select * from rides LIMIT 10")
rows = cur.fetchall()
for r in rows:
print (r)
连接数据库时,也可以用过connection string来连接:
connection_string = "host=localhost user=postgres password=password dbname=nyc_data"
with psycopg2.connect(connection_string) as conn:
写数据库时,可以使用execute_values
import psycopg2
import datetime
from psycopg2.extras import execute_values
connection_string = "host=localhost user=agent password=agentPassword dbname=synchecks"
url_name = 'Microservices BT Prod Home domain'
url = 'https://newsapi.sphdigital.com/v1/feed/home/bt'
c_dt = '2020-03-10 10:29:52.890076'
r_c = '200'
r_d = '242.889'
c_l = 'singapore'
with psycopg2.connect(connection_string) as conn:
with conn.cursor() as cur:
values = []
values.append((url_name, url, c_dt, r_c, r_d, c_l))
execute_values(
cur,
"""
INSERT INTO synchecks (url_name, url, check_datetime, response_code, response_duration, check_location)
VALUES %s
""",
values,
)
conn.commit()
遇到的一些问题:
- [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure
- grafana时间显示有点问题
可以继续挖掘的方向:
- agent众多,缓存,streaming策略
- timescaleDB的HA
- 放去分布式的Elasticsearch上
References:
https://www.youtube.com/watch?v=2PDkXviEMD0&