bupaR: Business Process Analysis with R

Organizations are nowadays storing huge amounts of data related to various business processes. Process mining provides different methods and techniques to analyze and improve these processes. This allows companies to gain a competitive advantage. Process mining initiated with the discovery of work-flow models from event data. However, over the past 20 years, the process mining field has evolved  into a broad and diverse research discipline.

bupaR is an open-source suite for the handling and analysis of business process data in R. It was developed by the Business Informatics research group at Hasselt University, Belgium. The central package includes basic functionality for creating event log objects in R. It contains several functions to get information about an event log and also provides specific event log versions of generic R functions. Together with the related packages, each of which has their own specific purpose, bupaR aims at supporting each step in the analysis of event data with R, from data import to online process monitoring.

The table below shows an example event log. Each row is an event which belongs to a case (a patient). Different events together can form an activity instance, or execution (e.g. event 2-4 belong to surgery 2). Each event in such an execution will have a different transactional lifecycle status. Note that there can be different instances of a specific activity (e.g. there are two surgeries in the example). Furthermore, each event has a timestamp, indicating when it happened, and a resource, indicating who performed it.

Given that the data shown above is stored in a data.frame, it can be turned into an event log object by indicating all the relevant data fields.

data %>%
   case_id = "patient",
   activity_id = "activity",
   activity_instance_id = "activity_instance",
   lifecycle_id = "status",
   timestamp = "timestamp",
   resource_id = "resource"

Alternatively, event data can be read from XES-files. XES, eXtensible Event Stream notation, is the IEEE standard for storing and sharing event data. The xesreadR package, which is part of bupaR, provides the functions read_xes and write_xes as an interface between R and XES-files. The following statement shows how to read an event log from a xes-file, in this case with data on an order-to-cash (otc) process.

log_otc <- read_xes("otc.xes")

Event log objects can be visualized with processmapR. It allows the user to create a customizable dotted chart, showing all the events by time and case identifier in one graph. Precedence relations between activities can also be shown with a process map. Frequent traces, i.e. activity sequences, can be explored with the trace_explorer.

log_otc %>% 

log_otc %>%
 filter_trace_frequency(perc = 0.9) %>%

log_otc %>% 
  trace_explorer(coverage = 0.9)

edeaR stands for Exploratory and Descriptive Event-Data Analysis. This package provides several metric functions for in-depth analysis of event logs, as well as a diverse set of subsetting methods. The functions can be calculated at a varying number of granularity levels, allowing to drill-down in the data and focus on a specific part. Furthermore, all metrics are compatible with dplyr::group_by. The generic plot functions can be used to create predefined graphs, which can be customized using ggplot2.

The example below shows in how many cases each of the activities is present. This shows that in the given event log, there is a set of very common activities, and a set of very rare activities.

log_otc %>% activity_presence %>% plot

Next to the metrics, also a varied set of event-data specific subsetting methods are provided. All the functions are designed to work together with the piping symbol.

Next to the packages discussed above, there is also the eventdataR package which contains example event datasets and the processmonitR package which provides predefined dashboards for online process monitoring. For more information about bupaR, you can visit the website where you can also find a cheat sheet.

This entry was posted in Business, Process mining, R by Gert Janssenswillen. Bookmark the permalink.

About Gert Janssenswillen

Gert Janssenswillen obtained a Master of Business and Information Systems Engineering at Hasselt University. He is currently a PhD student at the Business Informatics research group at Hasselt University, where his focus lies in the field of business process management. In particular, his main interest goes to the quality measurement of discovered process models. His research has been presented at international conferences such as BPM and useR. Through the creation of R-packages such as edeaR, petrinetR and bupaR, he has put forth several efforts to enable process analysis using R - an increasingly popular environment for data science. His affection for R also emerges from his teaching activities, where he gives lectures on explorative and descriptive data analysis for students of Business Engineering.

Leave a Reply