Apache access.logs: example of good and bad log entries

Apache and PHP install for testing

First we need to install Apachen and PHP module for users folder /public_html with commands:

sudo a2enmod userdir
sudo apt-get intall apache2
sudo apt-get install libapache2-mod-php

PHP module must be turned on from foldel /etc/apache2/mods-available tiedosto php7.2.conf use comment mark # to remove lines that prevent PHP being used in user folders.

Then I made a folder /home/user/public_html and file index.php. After that I restarted Apache with command:

sudo systemctl restart apache2

I added the PHP test 1+1 to the index.php file and the remote code execution function to the index.php file as shown below.

!! Do not use this for any publicly available servers as it allows remote code execution.

!! Do not use this for any publicly available servers as it allows remote code execution.

Making few log entries

With command “tail -f access.log” I can follow new entries to access.log.

First I opened the page normally from URL: localhost/~joni/

# LOG ENTRY 1
::1 - - [29/Mar/2020:14:47:04 +0300] "GET /~joni/ HTTP/1.1" 200 204 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/80.0.3987.87 Chrome/80.0.3987.87 Safari/537.36"

Then I tried working remote code execute function on a page with URL: localhost/~joni/?fun=cat%20/etc/passwd

# LOG ENTRY 2
::1 - - [29/Mar/2020:14:48:30 +0300] "GET /~joni/?fun=cat%20/etc/passwd HTTP/1.1" 200 1169 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/80.0.3987.87 Chrome/80.0.3987.87 Safari/537.36"

Tried a function that doesn’t work on the page from URL: localhost/~joni/?hello

# LOG ENTRY 3
::1 - - [29/Mar/2020:14:49:59 +0300] "GET /~joni/?hello HTTP/1.1" 200 204 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/80.0.3987.87 Chrome/80.0.3987.87 Safari/537.36"

Lastly, I tried to view page which does not exist on server from URL: localhost/~joni/testing

# LOG ENTRY 4
::1 - - [29/Mar/2020:15:44:14 +0300] "GET /~joni/testing HTTP/1.1" 404 488 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/80.0.3987.87 Chrome/80.0.3987.87 Safari/537.36"

Comparison

1. "GET /~joni" "200 204" --> NORMAL
2. "GET /~joni/?fun=cat%20/etc/passwd" "200 1169" --> ANOMALY
3. "GET /~joni/?hello" "200 204" --> NORMAL
4. "GET /~joni/testing" "404 488" --> NORMAL

If the HTTP status code is 200, meaning that the page was successfully loaded and has done something which can be seen from a high number of bits “1169”, then probably something abnormal has happened to the server.

If HTTP status code is 404 it can be directly stated as normal because it doesn’t even load the page.

There are a lot more bad examples like Method POST and HTTP 200 if server is not meant for storage this could be bad. Also HTTP status code 3xx range that redirects network traffic through your server. We are going to try to catch these anomaly entries with machine learning.

Apache2 access.log formatting

Easiest way to use Apache2 logs in Machine learning model is to reformat logs directly from Apache2 folder /etc/apache2/apache2.conf. Log formats for access.log and other_vhost_access.log is located on lines 212-213 changed to match new format as shown in picture below.

A) /var/log/apache2/other_vhosts_access.log – These logs are used for the virtual host page.

B) /var/log/apache2/access.log – These are for Apache2 default page logs.

New log format:

LogFormat "\"%{%d/%b/%Y/%T}t\" \"%h\" \"%>s\" \"%B\" \"%D\" \"%m\" \"%U/%q\" \"%H\"" vhost_combined
LogFormat "\"%{%d/%b/%Y/%T}t\" \"%h\" \"%>s\" \"%B\" \"%D\" \"%m\" \"%U/%q\" \"%H\"" combined

New log tags explained:

%{%d/%b/%Y/%T}t = Timestamp
%h = IP address
%>s = Status code
%B = Bytes sent
%D = Request time taken
%m = Method
%U%q = Path and query
%H = Protocol

Sample of new log format:

 “11/Apr/2020/10:13:18” “192.168.10.61” “200” “12” “157” “GET” “/index.html/?asd asd” “HTTP/1.1”

You could remove more tags that anomaly detection model does not need but that would make access.logs unusable to manual inspection of log information.

Sources:

List of HTTP status codes: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s