[AWS][Log]Jsonify Apache日志

目的是日志结构化,方便推送到ES。

References

https://httpd.apache.org/docs/2.4/logs.html
http://httpd.apache.org/docs/current/mod/mod_log_config.html
https://www.loggly.com/ultimate-guide/apache-logging-basics/

AWS上的应用:

https://aws.amazon.com/premiumsupport/knowledge-center/elb-capture-client-ip-addresses/
https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/x-forwarded-headers.html#x-forwarded-for

操作实例:

Raw cloudwatch logs:

xxx.xx.x.x:80 xx.xx.x.xx - - [04/Jun/2020:06:45:56 +0000] "GET /wang-gungwu-even-if-west-has-lost-its-way-china-may-not-be-heir-apparent HTTP/1.1" 200 31476 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Mobile/15E148 Safari/604.1"

  1. 在Apache configuration中添加000-default.conf:
<VirtualHost *:80>
	DocumentRoot /var/www/html/public
	LogFormat "{ \"time\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"process\":\"%D\",\"filename\":\"%f\", \"remoteip\":\"%a\", \"clientip\":\"%{True-Client-IP}i\", \"forwarded_for\":\"%{X-Forwarded-For}i\", \"host\":\"%V\", \"request\":\"%U\",\"query\":\"%q\",\"method\":\"%m\", \"status\":\"%>s\",\"userAgent\":\"%{User-agent}i\",\"referer\":\"%{Referer}i\"}" cloudwatch
	CustomLog ${APACHE_LOG_DIR}/access.log cloudwatch
</VirtualHost>
  1. Dockerfile中load此conf:
    COPY 000-default.conf /etc/apache2/sites-enabled/000-default.conf

After log formatter example

{
    "time": "2020-06-04T15:25:35.494Z",
    "process": "108874",
    "filename": "/var/www/html/public/index.php",
    "remoteip": "1x.7x.0.2xx",
    "clientip": "-",
    "forwarded_for": "121.6.200.186",
    "host": "xxxx.zxxbxx.com.sg",
    "request": "/info/amlogout",
    "query": "",
    "method": "GET",
    "status": "200",
    "userAgent": "Mozilla/5.0 (Linux; Android 10; SAMSUNG SM-G975F) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/11.2 Chrome/75.0.3770.143 Mobile Safari/537.36",
    "referer": "https://acc-reg.xxxdigital.com/RegAuth2/xxxLogout.html?logoutdest=https://epaper.xxxxx.sg"
}

详解

本次操作

LogFormat "{ \"time\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"process\":\"%D\",\"filename\":\"%f\", \"remoteip\":\"%a\", \"clientip\":\"%{True-Client-IP}i\", \"forwarded_for\":\"%{X-Forwarded-For}i\", \"host\":\"%V\", \"request\":\"%U\",\"query\":\"%q\",\"method\":\"%m\", \"status\":\"%>s\",\"userAgent\":\"%{User-agent}i\",\"referer\":\"%{Referer}i\"}" cloudwatch

另一个formatter例子:

LogFormat "{ \"time\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"process\":\"%D\", \"filename\":\"%f\", \"remoteIP\":\"%a\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":\"%{User-agent}i\", \"referer\":\"%{Referer}i\" }," combined

Apache Official Documentation:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

LogFormat "%h %l %u %t \"%r\" %>s %b" common

%h IP of the client(remote host), by default, will capture the IP of the Loadbalancer (the last proxy server that the traffic came from).
-(%l) “-”表示信息不存在,原位置是RFC 1413 identity of the client determined by identd on the clients machine. 非常unreliable.
%u userid of the person requesting the document as determined by HTTP authentication.
%t time
\"%r\" request line, 可以分解成the format string "%m %U%q %H" will log the method, path, query-string, and protocol, resulting in exactly the same output as "%r".
%>s status code server sends back to the client
%b size of the object returned to the client
\"%{User-agent}i\" user agent
\"%{Referer}i\" referrer
%a Client IP address of the request (see the mod_remoteip module).
%V The server name according to the UseCanonicalName setting.
%D The time taken to serve the request, in microseconds.
%f Filename
%U The URL path requested, not including any query string.

By default, the apache access log and error log will not log “X-Forwarded-For” information, so that if the client is connecting via a proxy, the log might only contain the proxy server’s IP address.

By adding X-Forwarded-For information to log files, we will be able to tell the possible real IP address of the client.

Apache Module mod_proxy Documentation:

Be careful when using these headers on the origin server, since they will contain more than one (comma-separated) value if the original request already contained one of these headers. For example, you can use %{X-Forwarded-For}i in the log format string of the origin server to log the original clients IP address, but you may get more than one address if the request passes through several proxies.

AWS Documentation:

For Application Load Balancers and Classic Load Balancers with HTTP/HTTPS listeners, you must use X-Forwarded-For headers to capture client IP addresses. Then, you must print those client IP addresses in your access logs.

Resolution

Application Load Balancers and Classic Load Balancers with HTTP/HTTPS Listeners (Apache)

  1. Open your Apache configuration file in your preferred text editor. The location varies by configuration, such as /etc/httpd/conf/httpd.conf for Amazon Linux and RHEL, or /etc/apache2/apache2.conf for Ubuntu.
  2. In the LogFormat section, add %{X-Forwarded-For}i as follows:
    LogFormat "%{X-Forwarded-For}i %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
    LogFormat "%h %l %u %t \"%r\" %>s %b" common
  1. Save your changes.
  2. Reload the Apache service.