Using Domain Knowledge to Assist Process Scenario Discoveries

Published in 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), 2022

Recommended citation: Z Zhang, Z Zhu, R Hildebrant, N Venkatasubramanian, S Ren - Using Domain Knowledge to Assist Process Scenario Discoveries. In 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC) (pp. 226–288). IEEE. https://doi.org/10.1109/COMPSAC54236.2022.00047 https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9842728

Trace clustering techniques are often used to assist in discovering different process scenarios. Most existing trace clustering methods aim to partition the event log into different subsets of logs where event traces in the same subset most likely belong to the same scenario. Typically, the partitioning is based on certain patterns, the similarity between the traces, or done by discovering a process model for each cluster of traces. However, most algorithms achieve this by solely using an event log, without allowing the domain expert to influence the discovery in any way. The domain expert may have certain domain expertise which should be exploited to create better process scenario models. This paper presents a density-based trace clustering technique based on expert knowledge to assist process scenario discovery results. A real wastewater treatment process provided by a domain expert is used as a case study to investigate the effectiveness and validity of the approach. We also use five real-life event logs to compare the performance of the process approach for process scenario discoveries with the commonly used k-means clustering approach. To measure the validity of our results, we take the weighted average fitness, the weighted average precision, and the F1 score. The experiment data show that (1) the proposed approach is able to discover the process scenarios from event logs by incorporating domain knowledge and (2) the process models obtained with the proposed approach have higher fitness, precision, and F1 scores than the models obtained by the commonly used k-means clustering approaches.