Master’s thesis
in Engineering Physics
Ericsson R&D & Lund University
During the fall of 2020 and spring of 2021, I conducted my master’s thesis at Ericsson R&D, in collaboration with the Department of Automatic Control at Lund University. The project was titled Automatic Log Based Anomaly Detection in Cloud Operations Using Machine Learning.
- You can find my thesis here:
Automatic Log Based Anomaly Detection in Cloud Operations using Machine Learning - Scope: 30 hp (full semester)
- Collaborated with industry engineers and academic supervisors to address real world cloud operation challenges
- Conducted research on machine learning methods for anomaly detection in large scale cloud systems using system log data
- Processed and analyzed high volume text based log datasets in Python
- Designed, implemented and evaluated multiple machine learning models in Python
Abstract
For modern large scale cloud services a fast and reliable anomaly detection is of utmost importance. Traditionally developers perform simple keyword search, for keywords such as “error” or “fail” in the log data, one of the main data sources that depicts the state of the system. In today’s large-scale systems however several TB of log messages can be output every day making manual search highly ineffective.
To address the problem there have been many anomaly detection methods based on the few publicly available log data sets. In this thesis we present a unique data collection method using a virtualized OpenStack cloud system to collect log data from six simulated anomaly scenarios. Three different detection methods are presented using both the dynamic and static parts of the individual log messages.
An investigation of the impact of parameters such as time window size is done by an evaluation of the various anomaly types. Among the four conventional machine learning models based on the static parts gave a good performance of a 50% detection rate with a 0.35% false alarm rate. In addition the results show a better LSTM model performance when using the dynamic rather than the static parts. For the LSTM using dynamic parameters the results depended on the anomaly type, and the parameter, with the best average scores around 55-65% detection rate with a false alarm rate around 0.5-1%.
Pipeline for automatic anomaly detection from log data