Improving Process Discovery Results by Filtering Out Outliers from Event Logs with Hidden Markov Models
Published in 2021 IEEE 23rd Conference on Business Informatics (CBI) - Vol. 1, 2021
Recommended citation: Z. Zhang, R. Hildebrant, F. Asgarinejad, N. Venkatasubramanian, S. Ren - "Improving Process Discovery Results by Filtering Out Outliers from Event Logs with Hidden Markov Models", 2021 IEEE 23rd Conference on Business Informatics (CBI), Bolzano, Italy, 2021, pp. 171-180, doi: 10.1109/CBI52690.2021.00028. https://ieeexplore.ieee.org/abstract/document/9610661
Process Mining is a technique for extracting process models from event logs. Event logs contain abundant explicit information related to events, such as the timestamp and the actions that trigger the event. Much of the existing process mining research has focused on discovering the process models behind these event logs. However, Process Mining relies on the assumption that these event logs contain accurate representations of an ideal set of processes. These ideal sets of processes imply that the information contained within the log represents what is really happening in a given environment. However, many of these event logs might contain noisy, infrequent, missing, or false process information that are generally classified as outliers. Extending beyond process discovery, there are many research efforts towards cleaning the event logs to deal with these outliers. In this paper, we present an approach that uses hidden Markov models to filter out outliers from event logs prior to applying any process discovery algorithms. Our proposed filtering approach can detect outlier behavior, and consequently, help process discovery algorithms return models that better reflect the real processes within an organization. Furthermore, we show that this filtering method outperforms two commonly used filtering approaches, namely the Matrix Filter approach and the Anomaly Free Automation approach for both artificial event logs and real-life event logs.