Posts by Collection

portfolio

publications

Improving Process Discovery Results by Filtering Out Outliers from Event Logs with Hidden Markov Models

Process Mining is a technique for extracting process models from event logs. Event logs contain abundant explicit information related to events, such as the timestamp and the actions that trigger the event. Much of the existing process mining research has focused on discovering the process models behind these event logs. However, Process Mining relies on the assumption that these event logs contain accurate representations of an ideal set of processes. These ideal sets of processes imply that the information contained within the log represents what is really happening in a given environment. However, many of these event logs might contain noisy, infrequent, missing, or false process information that are generally classified as outliers. Extending beyond process discovery, there are many research efforts towards cleaning the event logs to deal with these outliers. In this paper, we present an approach that uses hidden Markov models to filter out outliers from event logs prior to applying any process discovery algorithms. Our proposed filtering approach can detect outlier behavior, and consequently, help process discovery algorithms return models that better reflect the real processes within an organization. Furthermore, we show that this filtering method outperforms two commonly used filtering approaches, namely the Matrix Filter approach and the Anomaly Free Automation approach for both artificial event logs and real-life event logs.

Recommended citation: Z. Zhang, R. Hildebrant, F. Asgarinejad, N. Venkatasubramanian, S. Ren - "Improving Process Discovery Results by Filtering Out Outliers from Event Logs with Hidden Markov Models", 2021 IEEE 23rd Conference on Business Informatics (CBI), Bolzano, Italy, 2021, pp. 171-180, doi: 10.1109/CBI52690.2021.00028. https://ieeexplore.ieee.org/abstract/document/9610661

Empirical Studies of Three Commonly Used Process Mining Algorithms

Process mining aims to extract useful process knowledge and provide valuable insights to better understand, monitor, and improve current business processes. The most critical learning task in process mining is process discovery. Process discovery takes an event log as an input and generates a process model as an output. In the last two decades, processing mining communities have proposed several process discovery algorithms. Many of these algorithms are based on or are extensions of three commonly used process mining algorithms. These algorithms are known as the α algorithm, the Heuristic algorithm and the Inductive algorithm. This study provides an evaluation of these three algorithms using both artificial event logs and real-life event logs. We study the impact of dependency patterns, noise, and complexity. Our work aims to provide clear guidelines for academics or business organizations that are interested in using process mining algorithms to discover their hidden process models and choose the most appropriate process discovery algorithm.

Recommended citation: W Peng, Z Zhang, R Hildebrant, S Ren - "Empirical Studies of Three Commonly Used Process Mining Algorithms", 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2021. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9658861

A Generalizable Approach for Determining The Sensitivity of A Trace within An Event Log

In this Research-In-Progress work, we present a potentially generalizable approach that can determine the sensitivity of an attribute or a set of attributes associated with an event in a given event log. This approach is based on the concept of an equivalence class that a given event or trace may form and associate its sensitivity with the size of its equivalence class. For a given event, different equivalent classes can be formed based on different attributes, the proposed approach provides researchers with a more granular tool to apply group based privacy to event logs.

Recommended citation: R Hildebrant, Z Zhang, S Ren - "A Generalizable Approach for Determining The Sensitivity of A Trace within An Event Log", EMISA Forum: Vol. 41, No. 1, 2021

Using Domain Knowledge to Assist Process Scenario Discoveries

Trace clustering techniques are often used to assist in discovering different process scenarios. Most existing trace clustering methods aim to partition the event log into different subsets of logs where event traces in the same subset most likely belong to the same scenario. Typically, the partitioning is based on certain patterns, the similarity between the traces, or done by discovering a process model for each cluster of traces. However, most algorithms achieve this by solely using an event log, without allowing the domain expert to influence the discovery in any way. The domain expert may have certain domain expertise which should be exploited to create better process scenario models. This paper presents a density-based trace clustering technique based on expert knowledge to assist process scenario discovery results. A real wastewater treatment process provided by a domain expert is used as a case study to investigate the effectiveness and validity of the approach. We also use five real-life event logs to compare the performance of the process approach for process scenario discoveries with the commonly used k-means clustering approach. To measure the validity of our results, we take the weighted average fitness, the weighted average precision, and the F1 score. The experiment data show that (1) the proposed approach is able to discover the process scenarios from event logs by incorporating domain knowledge and (2) the process models obtained with the proposed approach have higher fitness, precision, and F1 scores than the models obtained by the commonly used k-means clustering approaches.

Recommended citation: Z Zhang, Z Zhu, R Hildebrant, N Venkatasubramanian, S Ren - Using Domain Knowledge to Assist Process Scenario Discoveries. In 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC) (pp. 226–288). IEEE. https://doi.org/10.1109/COMPSAC54236.2022.00047 https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9842728

PACE: Preventing Attacks on Case Identities in Event Logs Through Attribute Generalizations

Process Mining is an emerging research field that looks at event logs to build graphical models and provides new insights to businesses that allow them to make process-driven decisions. While there are many benefits to process mining, some businesses and researchers have hesitations about adopting process mining in real applications because of sensitive attribute data contained in an event log. To deal with this issue, researchers have developed tools and frameworks that apply privacy to event-logs. In their work, they only consider attacking privacy from a control-flow perspective and do not fully address potential privacy leakages that can be created from attributes. In PACE, we introduce a privacy-enhancing framework that focuses on the generalization of attribute values based on different organizational perspectives. This privacy framework comprises three components: control-flow anonymization, heuristic-driven hierarchy selection for anonymizing attributes, and application of attribute generalizations based on a perspective. To assess our framework, we apply PACE to the BPIC 2013 Event Log and measure the retained precision of handovers, the effect of the logs on decision trees, and show a sensitivity analysis of our privacy logs. Additionally, we show that PACE’s results greatly outperforms a state-of-the-art differential privacy tool on the same organization mining tasks.

Recommended citation: R Hildebrant - "Pace: Preventing attacks on case identities in event logs through attribute generalizations". https://www.proquest.com/docview/2679654624?pq-origsite=gscholar&fromopenview=true

talks

teaching

Introduction to Computer Programming at San Diego State University - Fall 2021

Undergraduate course, San Diego State University, Department of Computer Science, 2021

Instructor for a class of 70 students where I instructed the principles of the Java programming language including variables, arrays, classes, data structures, etc. Responsibilities included created homework, exams, and in class lecture material. I was rated as the top student instructor and offered another position in the Spring of 2022.

Introduction to Computer Programming at San Diego State University - Spring 2022

Undergraduate course, San Diego State University, Department of Computer Science, 2022

Instructor for a class of 70 students where I instructed the principles of the Java programming language including variables, arrays, classes, data structures, etc. Responsibilities included created homework, exams, and in class lecture material.

Intermediate Computer Programming Lab at San Diego State University - Summer 2022

Undergraduate course, San Diego State University, Department of Computer Science, 2022

Instructor for a class of 25 students where I split them into project teams and tasked them with creating a java application for a coffee ordering system. Each week the students were required to show their progress and new features. At the end of the course, 12 groups had their own functioning java application. In this course, I taught them elements of class design, abstract data types, design patterns, and the basis of algorithmic design.

Graduate Operating Systems at University of California, Irvine - Fall 2022

graduate course, University of California, Irvine, Department of Computer Science, 2022

Teaching assistant for a class of 108 Master’s students. In this course, I was responsible for creating lecture material, homework assignments, and discussion materials. I presented 5 discussion sections and covered a wide range of topics in operating systems.

Intermediate Programming in Python at University of California, Irvine - Winter 2023

Undergraduate course, University of California, Irvine, Department of Computer Science, 2023

TA for two programming labs consisting of 110 students. I am responsible for answer questions related to projects that involve many advanced python concepts, i.e., OOP concepts, file reading/writing, database access, GUI creation, and more. Beyond my lab duties, I grade these projects and provide feedback.