A framework for optimisation based stochastic process discovery

Pierre Cry, András Horváth, Paolo Ballarini, Pascal Le Gall·June 16, 2024

Summary

This paper presents a novel stochastic process mining approach that enhances traditional Petri net models by assigning optimal weights to transitions based on trace frequencies in event logs. The method, using either maximum likelihood or Earth Mover's Distance, aims to improve model accuracy by capturing the stochastic nature of processes. It differentiates from existing work by focusing on trace probabilities and optimizing weights through an unfold directed acyclic graph. The authors demonstrate the framework through real-life log experiments, showing improved model fitness compared to non-optimized methods. The study also compares the performance of the optimization algorithm with other weight estimators, achieving lower distances between the model and event log stochastic languages. Future work involves enhancing evaluation complexity and exploring alternative distance measures.

Key findings

1

Paper digest

What problem does the paper attempt to solve? Is this a new problem?

The paper addresses the issue of accurately reproducing the stochastic behavior of a system based on observed data, specifically focusing on stochastic process discovery using Petri nets . The paper introduces a novel weight estimation framework that aims to find optimal weights for a stochastic Petri net to closely replicate the stochastic language of the observed log . This problem is not entirely new, as existing stochastic discovery approaches have utilized Petri net weight estimators, but these approaches have limitations in accurately capturing the stochastic nature of the system . The paper's contribution lies in proposing a new framework that seeks to overcome these limitations and generate more accurate models based on real-life event logs .


What scientific hypothesis does this paper seek to validate?

This paper aims to validate the scientific hypothesis related to stochastic process discovery through optimization-based weight estimation in order to derive stochastic Petri net models that closely replicate the stochastic language of observed executions in event logs . The research focuses on optimizing the weights of transitions in mined Petri net models to ensure that the stochastic language produced by these models closely matches the language of the event log, either through maximum likelihood optimization or earth mover's distance optimization . The goal is to improve the accuracy of stochastic process mining by finding optimal weights that enhance the resemblance between the stochastic language of the mined net and that of the event log, as demonstrated through experiments on real-life event logs .


What new ideas, methods, or models does the paper propose? What are the characteristics and advantages compared to previous methods?

The paper introduces a novel framework for optimisation-based stochastic process discovery that aims to improve the accuracy of process models derived from event logs . This framework focuses on weight estimation for stochastic Petri nets to better capture the probabilistic behavior of systems . Unlike existing approaches that use Petri net weight estimators based on summary statistics, the proposed framework utilizes an optimization scheme to search for optimal weights that closely replicate the stochastic language of the observed system . The framework offers the flexibility to choose between likelihood-based or earth mover's based distance metrics for optimization, with the former being computationally lighter and preferable for large logs .

One key aspect of the paper is the emphasis on evaluating the stochastic language of the Petri net model through the unfolding of the corresponding reachability graph and applying minimization with respect to a stochastic language distance metric . The framework also explores the use of divergence-based numerical weight optimization, which is a novel approach in the field . Additionally, the paper discusses the importance of considering other types of language distance measures, such as entropy-related ones, to handle infinite languages more effectively .

Furthermore, the paper outlines the methodology for discovering stochastic process models from event logs, highlighting the phases involved in the process . It describes how a stochastic workflow net (sWN) is constructed from an event log using a mining algorithm, and the stochastic language associated with the sWN is evaluated through an optimization engine to iteratively search for optimal weights for transitions . The framework aims to find a weight function that makes the stochastic language of the sWN closely resemble that of the event log, thereby improving the accuracy of the process models derived from the logs . The framework for optimization-based stochastic process discovery proposed in the paper introduces several key characteristics and advantages compared to previous methods .

Characteristics:

  • The framework focuses on weight estimation for stochastic Petri nets to capture the probabilistic behavior of systems accurately .
  • It utilizes an optimization scheme to search for optimal weights that closely replicate the stochastic language of the observed system .
  • The framework offers the flexibility to choose between likelihood-based or earth mover's based distance metrics for optimization, with the former being computationally lighter and preferable for large logs .
  • It evaluates the stochastic language of the Petri net model through the unfolding of the corresponding reachability graph and applies minimization with respect to a stochastic language distance metric .
  • The framework explores the use of divergence-based numerical weight optimization, which is a novel approach in the field .
  • It considers other types of language distance measures, such as entropy-related ones, to handle infinite languages more effectively .

Advantages:

  • The proposed framework aims to improve the accuracy of process models derived from event logs by searching for optimal weights that closely reproduce the log's stochastic language .
  • Unlike existing approaches that use Petri net weight estimators based on summary statistics, the novel framework offers a more robust and accurate approach to weight estimation .
  • By utilizing an optimization scheme and considering various distance metrics, the framework yields considerably more accurate models than alternative approaches, as demonstrated through experiments on real-life event logs .
  • The framework addresses the limitations of previous methods, such as the inability of existing weight estimators to accurately capture the stochastic behavior of observed systems .
  • It provides a structured methodology for discovering stochastic process models from event logs, enhancing the overall process of process mining and model accuracy .

Overall, the framework's emphasis on optimization-based weight estimation, evaluation of stochastic language, and consideration of various distance measures sets it apart from previous methods, offering improved accuracy and robustness in stochastic process discovery .


Do any related researches exist? Who are the noteworthy researchers on this topic in this field?What is the key to the solution mentioned in the paper?

Several related research works exist in the field of stochastic process discovery. Noteworthy researchers in this field include Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Artem Polyvyanyy, Adam Burke, Sander J. J. Leemans, Moe Thandar Wynn, Pierre Cry, András Horváth, Paolo Ballarini, Pascale Le Gall, Wil van der Aalst, Arya Adriansyah, Boudewijn van Dongen, and many others .

The key to the solution mentioned in the paper is an optimisation framework that searches for optimal weights to yield a stochastic Petri net closely reproducing the log's stochastic language. This framework assesses the net's stochastic language by unfolding the corresponding reachability graph and applies minimisation with respect to a stochastic language distance metric. Users can choose between a likelihood-based or an earth mover's based distance to drive the optimisation, with the likelihood-based optimisation being computationally lighter and preferred for large logs .


How were the experiments in the paper designed?

The experiments in the paper were designed to test the weight optimisation procedure outlined in Algorithm 2 via TraceProbabilities computation given in Algorithm 1 . The implementation of the optimisation procedure included a "convergence" parameter δ to control the ending of parameters search, stopping either after a specified number of iterations or when the minimised distance had converged within the δ threshold . Different real-life event logs, mainly from the Business Process Intelligence (BPI) challenge and freely accessible online, were considered for the experiments, each with varying complexity . The experiments involved mining the Weighted Net (WN) from each log using the inductive miner algorithm and then estimating transition weights to obtain a corresponding stochastic Weighted Net (sWN) . The results of the experiments were compared based on log-likelihood (LH), restricted Earth Mover's Distance (rEMD), and truncated Earth Mover's Distance (tEMD) distances obtained through the optimisation scheme with those obtained through other weight estimators and stochastic process discovery approaches .


What is the dataset used for quantitative evaluation? Is the code open source?

The dataset used for quantitative evaluation in the study is the Business Process Intelligence (BPI) challenge logs, which are freely accessible at https://data.4tu.nl/ . The code for the prototype software tool developed for the study is open source and publicly available on the Git repository at https://github.com/DocPierro/optimised_spd .


Do the experiments and results in the paper provide good support for the scientific hypotheses that need to be verified? Please analyze.

The experiments and results presented in the paper provide strong support for the scientific hypotheses that needed verification. The paper introduces a novel weight estimation framework for stochastic process discovery, which aims to accurately reproduce the stochastic behavior of observed systems . The experiments conducted in the paper demonstrate that this framework yields considerably more accurate models compared to alternative approaches when applied to real-life event logs . The results show that the optimisation scheme proposed in the paper, which includes likelihood-based and earth mover's based distance metrics, outperforms existing weight estimators and stochastic discovery approaches . Additionally, the paper discusses the convergence of distances between the net's and the log's stochastic languages during the iterative minimisation steps, providing insights into the effectiveness of the optimisation process . Overall, the experiments and results presented in the paper offer robust empirical evidence supporting the effectiveness and superiority of the proposed stochastic process discovery framework in achieving accurate models that closely resemble the observed system's stochastic behavior .


What are the contributions of this paper?

The contributions of the paper include:

  • Introducing a novel weight estimation framework for stochastic process discovery that optimizes weights to closely reproduce the log's stochastic language .
  • Providing an optimization scheme that searches for optimal weights in a stochastic Petri net model to accurately reflect the observed system's stochastic behavior .
  • Offering the choice between likelihood-based or earth mover's based distance metrics for optimization, with the former being computationally lighter for large logs .
  • Demonstrating through experiments on real-life event logs that the proposed optimization framework yields considerably more accurate models compared to alternative approaches .
  • Highlighting future developments such as improving the evaluation of the net's stochastic language complexity and considering other distance metrics like entropy-related ones for dealing with infinite languages .

What work can be continued in depth?

Further developments in the field of stochastic process discovery can focus on improving the evaluation of the net's stochastic language complexity, which is currently a bottleneck in the framework . Additionally, exploring other types of language distance metrics, such as entropy-related measures, could be beneficial for dealing with infinite languages more effectively . These advancements would contribute to enhancing the accuracy and efficiency of stochastic models derived from event logs.


Introduction
Background
Evolution of process mining techniques
Limitations of traditional Petri nets
Objective
Introduce the need for stochastic modeling in process mining
Goal: Enhance model accuracy with trace probabilities
Method
Data Collection
Event Logs
Source and characteristics of real-life event logs
Data preprocessing steps
Data Preprocessing
Cleaning and filtering event logs
Handling missing and incomplete data
Trace Frequency Analysis
Calculation of transition frequencies
Stochastic Model Enhancement
Maximum Likelihood Estimation
Formulation of the likelihood function
Earth Mover's Distance (EMD)
Definition and application in weight optimization
Unfold Directed Acyclic Graph (UDAG) approach
Weight Optimization
Algorithmic implementation
Comparison with existing weight estimators
Model Fitness Evaluation
Model fitness using stochastic languages
Improved model accuracy over non-optimized methods
Experiments and Results
Real-life Log Demonstrations
Case studies with event logs
Model fitness comparison
Performance Evaluation
Distance measures between model and log
Optimization algorithm's effectiveness
Complexity analysis
Future Research Directions
Enhancing evaluation complexity
Exploring alternative distance measures
Extensions and applications to other domains
Conclusion
Summary of key contributions
Implications for process mining practice and theory
Basic info
papers
databases
Advanced features
Insights
What two distance measures are employed in the proposed method to improve model accuracy?
How does the method assign weights to transitions in Petri net models?
What is the primary focus of the novel stochastic process mining approach described in the paper?
How does the framework presented in the paper differ from existing work in process mining?

A framework for optimisation based stochastic process discovery

Pierre Cry, András Horváth, Paolo Ballarini, Pascal Le Gall·June 16, 2024

Summary

This paper presents a novel stochastic process mining approach that enhances traditional Petri net models by assigning optimal weights to transitions based on trace frequencies in event logs. The method, using either maximum likelihood or Earth Mover's Distance, aims to improve model accuracy by capturing the stochastic nature of processes. It differentiates from existing work by focusing on trace probabilities and optimizing weights through an unfold directed acyclic graph. The authors demonstrate the framework through real-life log experiments, showing improved model fitness compared to non-optimized methods. The study also compares the performance of the optimization algorithm with other weight estimators, achieving lower distances between the model and event log stochastic languages. Future work involves enhancing evaluation complexity and exploring alternative distance measures.
Mind map
Improved model accuracy over non-optimized methods
Model fitness using stochastic languages
Unfold Directed Acyclic Graph (UDAG) approach
Definition and application in weight optimization
Formulation of the likelihood function
Calculation of transition frequencies
Data preprocessing steps
Source and characteristics of real-life event logs
Implications for process mining practice and theory
Summary of key contributions
Complexity analysis
Optimization algorithm's effectiveness
Distance measures between model and log
Model fitness comparison
Case studies with event logs
Model Fitness Evaluation
Earth Mover's Distance (EMD)
Maximum Likelihood Estimation
Trace Frequency Analysis
Event Logs
Goal: Enhance model accuracy with trace probabilities
Introduce the need for stochastic modeling in process mining
Limitations of traditional Petri nets
Evolution of process mining techniques
Conclusion
Performance Evaluation
Real-life Log Demonstrations
Weight Optimization
Stochastic Model Enhancement
Data Preprocessing
Data Collection
Objective
Background
Future Research Directions
Experiments and Results
Method
Introduction
Outline
Introduction
Background
Evolution of process mining techniques
Limitations of traditional Petri nets
Objective
Introduce the need for stochastic modeling in process mining
Goal: Enhance model accuracy with trace probabilities
Method
Data Collection
Event Logs
Source and characteristics of real-life event logs
Data preprocessing steps
Data Preprocessing
Cleaning and filtering event logs
Handling missing and incomplete data
Trace Frequency Analysis
Calculation of transition frequencies
Stochastic Model Enhancement
Maximum Likelihood Estimation
Formulation of the likelihood function
Earth Mover's Distance (EMD)
Definition and application in weight optimization
Unfold Directed Acyclic Graph (UDAG) approach
Weight Optimization
Algorithmic implementation
Comparison with existing weight estimators
Model Fitness Evaluation
Model fitness using stochastic languages
Improved model accuracy over non-optimized methods
Experiments and Results
Real-life Log Demonstrations
Case studies with event logs
Model fitness comparison
Performance Evaluation
Distance measures between model and log
Optimization algorithm's effectiveness
Complexity analysis
Future Research Directions
Enhancing evaluation complexity
Exploring alternative distance measures
Extensions and applications to other domains
Conclusion
Summary of key contributions
Implications for process mining practice and theory
Key findings
1

Paper digest

What problem does the paper attempt to solve? Is this a new problem?

The paper addresses the issue of accurately reproducing the stochastic behavior of a system based on observed data, specifically focusing on stochastic process discovery using Petri nets . The paper introduces a novel weight estimation framework that aims to find optimal weights for a stochastic Petri net to closely replicate the stochastic language of the observed log . This problem is not entirely new, as existing stochastic discovery approaches have utilized Petri net weight estimators, but these approaches have limitations in accurately capturing the stochastic nature of the system . The paper's contribution lies in proposing a new framework that seeks to overcome these limitations and generate more accurate models based on real-life event logs .


What scientific hypothesis does this paper seek to validate?

This paper aims to validate the scientific hypothesis related to stochastic process discovery through optimization-based weight estimation in order to derive stochastic Petri net models that closely replicate the stochastic language of observed executions in event logs . The research focuses on optimizing the weights of transitions in mined Petri net models to ensure that the stochastic language produced by these models closely matches the language of the event log, either through maximum likelihood optimization or earth mover's distance optimization . The goal is to improve the accuracy of stochastic process mining by finding optimal weights that enhance the resemblance between the stochastic language of the mined net and that of the event log, as demonstrated through experiments on real-life event logs .


What new ideas, methods, or models does the paper propose? What are the characteristics and advantages compared to previous methods?

The paper introduces a novel framework for optimisation-based stochastic process discovery that aims to improve the accuracy of process models derived from event logs . This framework focuses on weight estimation for stochastic Petri nets to better capture the probabilistic behavior of systems . Unlike existing approaches that use Petri net weight estimators based on summary statistics, the proposed framework utilizes an optimization scheme to search for optimal weights that closely replicate the stochastic language of the observed system . The framework offers the flexibility to choose between likelihood-based or earth mover's based distance metrics for optimization, with the former being computationally lighter and preferable for large logs .

One key aspect of the paper is the emphasis on evaluating the stochastic language of the Petri net model through the unfolding of the corresponding reachability graph and applying minimization with respect to a stochastic language distance metric . The framework also explores the use of divergence-based numerical weight optimization, which is a novel approach in the field . Additionally, the paper discusses the importance of considering other types of language distance measures, such as entropy-related ones, to handle infinite languages more effectively .

Furthermore, the paper outlines the methodology for discovering stochastic process models from event logs, highlighting the phases involved in the process . It describes how a stochastic workflow net (sWN) is constructed from an event log using a mining algorithm, and the stochastic language associated with the sWN is evaluated through an optimization engine to iteratively search for optimal weights for transitions . The framework aims to find a weight function that makes the stochastic language of the sWN closely resemble that of the event log, thereby improving the accuracy of the process models derived from the logs . The framework for optimization-based stochastic process discovery proposed in the paper introduces several key characteristics and advantages compared to previous methods .

Characteristics:

  • The framework focuses on weight estimation for stochastic Petri nets to capture the probabilistic behavior of systems accurately .
  • It utilizes an optimization scheme to search for optimal weights that closely replicate the stochastic language of the observed system .
  • The framework offers the flexibility to choose between likelihood-based or earth mover's based distance metrics for optimization, with the former being computationally lighter and preferable for large logs .
  • It evaluates the stochastic language of the Petri net model through the unfolding of the corresponding reachability graph and applies minimization with respect to a stochastic language distance metric .
  • The framework explores the use of divergence-based numerical weight optimization, which is a novel approach in the field .
  • It considers other types of language distance measures, such as entropy-related ones, to handle infinite languages more effectively .

Advantages:

  • The proposed framework aims to improve the accuracy of process models derived from event logs by searching for optimal weights that closely reproduce the log's stochastic language .
  • Unlike existing approaches that use Petri net weight estimators based on summary statistics, the novel framework offers a more robust and accurate approach to weight estimation .
  • By utilizing an optimization scheme and considering various distance metrics, the framework yields considerably more accurate models than alternative approaches, as demonstrated through experiments on real-life event logs .
  • The framework addresses the limitations of previous methods, such as the inability of existing weight estimators to accurately capture the stochastic behavior of observed systems .
  • It provides a structured methodology for discovering stochastic process models from event logs, enhancing the overall process of process mining and model accuracy .

Overall, the framework's emphasis on optimization-based weight estimation, evaluation of stochastic language, and consideration of various distance measures sets it apart from previous methods, offering improved accuracy and robustness in stochastic process discovery .


Do any related researches exist? Who are the noteworthy researchers on this topic in this field?What is the key to the solution mentioned in the paper?

Several related research works exist in the field of stochastic process discovery. Noteworthy researchers in this field include Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Artem Polyvyanyy, Adam Burke, Sander J. J. Leemans, Moe Thandar Wynn, Pierre Cry, András Horváth, Paolo Ballarini, Pascale Le Gall, Wil van der Aalst, Arya Adriansyah, Boudewijn van Dongen, and many others .

The key to the solution mentioned in the paper is an optimisation framework that searches for optimal weights to yield a stochastic Petri net closely reproducing the log's stochastic language. This framework assesses the net's stochastic language by unfolding the corresponding reachability graph and applies minimisation with respect to a stochastic language distance metric. Users can choose between a likelihood-based or an earth mover's based distance to drive the optimisation, with the likelihood-based optimisation being computationally lighter and preferred for large logs .


How were the experiments in the paper designed?

The experiments in the paper were designed to test the weight optimisation procedure outlined in Algorithm 2 via TraceProbabilities computation given in Algorithm 1 . The implementation of the optimisation procedure included a "convergence" parameter δ to control the ending of parameters search, stopping either after a specified number of iterations or when the minimised distance had converged within the δ threshold . Different real-life event logs, mainly from the Business Process Intelligence (BPI) challenge and freely accessible online, were considered for the experiments, each with varying complexity . The experiments involved mining the Weighted Net (WN) from each log using the inductive miner algorithm and then estimating transition weights to obtain a corresponding stochastic Weighted Net (sWN) . The results of the experiments were compared based on log-likelihood (LH), restricted Earth Mover's Distance (rEMD), and truncated Earth Mover's Distance (tEMD) distances obtained through the optimisation scheme with those obtained through other weight estimators and stochastic process discovery approaches .


What is the dataset used for quantitative evaluation? Is the code open source?

The dataset used for quantitative evaluation in the study is the Business Process Intelligence (BPI) challenge logs, which are freely accessible at https://data.4tu.nl/ . The code for the prototype software tool developed for the study is open source and publicly available on the Git repository at https://github.com/DocPierro/optimised_spd .


Do the experiments and results in the paper provide good support for the scientific hypotheses that need to be verified? Please analyze.

The experiments and results presented in the paper provide strong support for the scientific hypotheses that needed verification. The paper introduces a novel weight estimation framework for stochastic process discovery, which aims to accurately reproduce the stochastic behavior of observed systems . The experiments conducted in the paper demonstrate that this framework yields considerably more accurate models compared to alternative approaches when applied to real-life event logs . The results show that the optimisation scheme proposed in the paper, which includes likelihood-based and earth mover's based distance metrics, outperforms existing weight estimators and stochastic discovery approaches . Additionally, the paper discusses the convergence of distances between the net's and the log's stochastic languages during the iterative minimisation steps, providing insights into the effectiveness of the optimisation process . Overall, the experiments and results presented in the paper offer robust empirical evidence supporting the effectiveness and superiority of the proposed stochastic process discovery framework in achieving accurate models that closely resemble the observed system's stochastic behavior .


What are the contributions of this paper?

The contributions of the paper include:

  • Introducing a novel weight estimation framework for stochastic process discovery that optimizes weights to closely reproduce the log's stochastic language .
  • Providing an optimization scheme that searches for optimal weights in a stochastic Petri net model to accurately reflect the observed system's stochastic behavior .
  • Offering the choice between likelihood-based or earth mover's based distance metrics for optimization, with the former being computationally lighter for large logs .
  • Demonstrating through experiments on real-life event logs that the proposed optimization framework yields considerably more accurate models compared to alternative approaches .
  • Highlighting future developments such as improving the evaluation of the net's stochastic language complexity and considering other distance metrics like entropy-related ones for dealing with infinite languages .

What work can be continued in depth?

Further developments in the field of stochastic process discovery can focus on improving the evaluation of the net's stochastic language complexity, which is currently a bottleneck in the framework . Additionally, exploring other types of language distance metrics, such as entropy-related measures, could be beneficial for dealing with infinite languages more effectively . These advancements would contribute to enhancing the accuracy and efficiency of stochastic models derived from event logs.

Scan the QR code to ask more questions about the paper
© 2025 Powerdrill. All rights reserved.