Anomalies in Apache access logs

The topic and the picture tell the task at hand. Here’s where we are at:

Basic functionality

The program we are making is going to be hooked into a log file, and new appearing logs are fed to a pre-trained model, which knows how a normal log file line looks like.

…new appearing logs are fed…

The first part was solved in no time: python has ‘generators’, which allow for kind of monitoring loops:

Screenshot_2020-04-19_09-00-22

The key here is the ‘yield’ keyword, which actually creates the generator. The while-loop in the ‘follow’ function reads lines from a file, and sleeps for a second if no new lines are to be found. Every time a new line is found, the function ‘yields’ the line, and the last 2 lines of the code print it out. This functionality enables use of infinite loops in functions, and has a plethora of possibilities, including data pipelining…

Basics: https://realpython.com/introduction-to-python-generators/
A very detailed presentation about the possibilities: https://www.dabeaz.com/generators/

…a pre-trained model which knows what a normal log line looks like

This is the main problem. We have restricted it to Apache web server logs, and the more we are looking into it, the more it starts to look like that a general system log anomaly detector needs a lot more time and expertise than we have available.

How to build it?

Here is a graphical presentation of the general idea: the flowchart is work in progress. I’ll try to explain it later once I’m more familiar with the concepts employed here…

apache_log_anomaly

How to improve this? I’m going to print it and grab a pencil 😉

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s