Process Mining Online Assessment Data

Ekaterina Vasilyeva

Mykola Pechenizkiy

N. Trcka

P. De Bra

Wil Van Der Aalst

Proceedings of Educational Data Mining, 2009

2009 2009

Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of the underlying educational processes, for generating recommendations and advice to students, for improving management of learning objects, etc. However, most of the traditional data mining techniques focus on data dependencies or simple patterns and do not provide a visual representation of the complete educational (assessment) process ready to be analyzed. To allow for these types of analysis (in which the process plays the central role), a new line of data-mining research, called process mining, has been initiated. Process mining focuses on the development of a set of intelligent tools and techniques aimed at extracting process-related knowledge from event logs recorded by an information system. In this paper we demonstrate the applicability of process mining, and the ProM framework in particular, to educational data mining context. We analyze assessment data from recently organized online multiple choice tests and demonstrate the use of process discovery, conformance checking and performance analysis techniques.

"1. The process mining spectrum supported by ProM. 3 Case Studies. We studied different issues related to authoring and personalization of online assessment procedures within the series of the MCQ tests organized during the mid-term exams at Eindhoven University of Technology using Moodle2 (Quize module tools) and Sakai3 (Mneme testing component) open source LMSs. To demonstrate the applicability of process mining we use data collected during two exams: one for the Data Modeling and Databases (DB) course and one for the Human- Computer Interaction (HCI) course. In the first (DB) test students (30 in total) answered to the MCQs (15 in total) in a strict order, in which questions appeared one by one. Students after answering each question were able proceed directly to the next question (clicking â€œGo to the next questionâ€), or first get knowledge of correct response (clicking the â€œCheck the answerâ€) and after that either go the next question (â€œGo to the next questionâ€) or, before that, request a detailed explanation about their response (â€œGet Explanationsâ€). In the second (HCI) test students (65 in total) had the possibility to answer the MCQs (10 in total) in a flexible order, to revisit (and revise if necessary) the earlier questions and answers. Flexible navigation was facilitated by a menu page for quick jumps from one question to any other question, as well as by â€œnextâ€ and â€œpreviousâ€ buttons. In the MCQ tests we asked students to also include the confidence level of each answer. Our studies demonstrated that knowledge of the response certitude (specifying the studentâ€™s certainty or confidence of the correctness of the answer) together with response correctness helps in understanding the learning behavior and allows for determining what kind of feedback is more preferable and more effective for the students thus facilitating personalization in assessment [3]. For every student and for each question in the test we collected all the possible information, including correctness, certitude, grade (determined by correctness and certitude), time spent for answering the question, and for the DB test whether an answer was checked for correctness or not, whether detailed explanation was requested on not, and how much time was spent reading it, and for the HCI test whether a question was skipped, revisited, whether answer was revised or the certitude changed.4 In the remainder of this section we demonstrate how various ProM plug-ins supporting dotted chart analysis, process discovery (Heuristic Miner and Fuzzy Miner), conformance checking, and performance analysis [1][6] allow to get a significant better understanding of the assessment processes. 3.1 Dotted Chart Analysis. The dotted chart is a chart similar to a Gantt chart. It shows the spread of events over time by plotting a dot for each event in the log thus allowing to gain some insight in the complete set of data. The chart has three (orthogonal) dimensions: one showing the time of the event, and the other two showing (possibly different) components (such as instance ID, originator or task ID) of the event. Time is measured along the horizontal axis. The first component considered is shown along the vertical axis, in boxes. The second component of the event is given by the color of the dot. Figure 2 illustrates the output of the dot chart analysis of the flexible-order online assessment. All the instances (one per student) are sorted by the duration of the online assessment (reading and answering the question and navigation to the list of questions). In the figure on the left, points in the ochre and green/red color denote the start and the end (passed/failed) of the test. Triangles denote the moment when the student submits an answer or just navigates to another question. Green triangles denote correct responses with low (LCCR â€“ light green) and high (HCCR â€“ dark green) certainty, red triangles correspondingly â€“ wrong responses (light red â€“ LCWR, dark red â€“ HCWR), white triangles â€“ the cases when the student navigated to the next question without providing any response. The blue squares show the moments when the students navigated from the list of the questions (menu) to a question of the quiz (or just submitted the whole test). Figure 2. Two dotted charts extracted from the test with flexible order navigation; (1) the overall navigation and answering of questions (left chart), and (2) the effects of changes (right chart). We can clearly see from the figure that most of the students answered the questions one by one, and provided more correct answers for the first questions of the test than for the last questions. They used the possibility to flexibly navigate mainly at the end of the test: students navigating to the list of the questions and then to the different questions from the list. It can be also clearly seen that only few students read and skipped some questions, not providing their answers first, and then returning to those questions back to provide an answer. In the figure on the right, we can see the when students revisited the questions. Points in yellow correspond to the situations when correctness of the answers did not change, and points in red and green correspond accordingly to changes to wrong and correct answers. We can see that in a very few cases the correctness was changed, most changes do not affect correctness (e.g., a wrong answer was changed to another wrong answer). Moreover, changes from right to wrong or from wrong to write had similar frequencies, thus not significantly changing the end results. 3.2 Process discovery. In some cases, given a usage log we may have limited knowledge about the exact procedure of the assessment but want to discover it based on the data from the log. There exist several algorithms that can automatically construct a depiction of a process. This process representation typically comes in form of a (formal) mathematical model supporting concurrency, sequential and alternative behavior (like, e.g., the model of Petri nets, Heuristic or Fuzzy miner). Figure 3 illustrates for the DB test a part (for the first 3 questions) of the discovered process (left) as a Heuristic net, and animation of the same part after conversion to the Fuzzy model (middle), and for the HCI test the complete Heuristic net (right), abstracted from the type of the answer, but from which it is clear which jumps between the questions were popular. From the visualization of the DB test process we can see what possibilities students had, and what the main â€œflowsâ€ were globally or at a particular time. Figure 3. Heuristic nets of strict order (left) and flexible order tests (right) 3.3 Process analysis. In some cases, the goal is not to discover the real learning process but to analyze some normative or descriptive model that is given a-priori. For example, the Petri net shown in Figure 4 (formally) describes the generic pattern of answering questions in the DB test allowing for answer-checks and feedbacks. Now it is interesting to see whether this model conforms to reality (and vice versa) and augment it with additional information learned from the event logs. The advantage of having the answering pattern represented as a Petri net is that this allows for many different analysis techniques. ProM offers various plug-ins to analyze Petri nets (verification, performance analysis, conformance, etc.). Models like the one in Figure 4 can be discovered or made by hand. It is also possible to first discover a model and then refine it using the tool Yasper (incorporated into ProM). Figure 4 was constructed using Yasper and this was a one-time task for this test-type and in principle an authoring tool can be developed to facilitate an automatic translation of the multiple-choice tests with varying properties to Petri nets. As every question can be answered correctly or wrongly, and with either high or low confidence, there are four possibilities for the first step in the net from Figure 4. The transition HCCR, for example, denotes that the answer is given with high confidence and that it was correct; the other three starting transitions are similar. After answering the question the student can check his answer or just go the next question. The latter decision is modeled by an internal transition (painted in black) that goes to the final place of the net. In case the student has decided to check the answer, he can also ask for some feedback afterwards. Figure 4. A Petri net representing the question pattern. To illustrate the many analysis possibilities of ProM, we show some results obtained using the Conformance checker and the Performance Analysis with Petri net plugin. The purpose of conformance analysis is to find out whether the information in the log is as specified. This analysis may be used to detect deviations, to locate and explain these deviations, and to measure the severity of these deviations. We are mostly interested in the notion of fitness which is concerned with the investigation whether a process model is able to reproduce all execution sequences that are in the log, or, viewed from another angle, whether the log traces comply with the description in the model (the fitness is 100% if every trace in the log corresponds to a possible execution of the model). This notion is particularly useful for finding out whether (or how often) the students respected the specified order for answering questions (to discover frauds, for example). Figure 5 shows the result of conformance checking when applied on our log and the Petri net from Figure 4. In this, so-called log perspective of the result, each trace from the log has all its mismatched events colored in orange. In our case, however, there are no orange events, therefore there are no mismatches between the specified answering pattern and the actual exam data. Figure 5. Result of conformance checking showing a 100% fitness. Our next analysis is of a different kind. Instead of checking for the correctnes of the exam behavior, we provide a means to assess the performance of the answering process. The Performance analysis with Petri net plugin can extract the Key Performance Indicators from the log, summarizing them in an intuitive way, and graphically present them on a Petri net describing the process under consideration. For our purpose we apply the plugin with the exam data log and the answering pattern from Figure 6 (only for the first question of the test). Figure 6. Results of applying the Performance analysis with Petri net plug-in. The result of the analysis is shown in Figure 6. In the right panel different throughput- type metrics are displayed; from there we, e.g., see that the average duration of the test was 64.41 minute. The central panel shows the answering pattern, colored and annotated with performance information. The numbers on the arcs represent probabilities. As shown, 35% percent of the students answered the first question right and had high confidence. We could also see that almost all students checked their answers and asked for feedback afterwards. Places are colored with respect to their soujourn time, i.e., with respect to the time the process spends in this place. From the picture we can thus see that the answering time was short (the first question was easy), and that the students who answered with high confidence spent more time on the feedback (regardless on the correctness of the answer). 4 Conclusions and Future Work. Data mining techniques have been successfully applied to different types of educational data and have helped to address many issues by using traditional classification, clustering and association analysis techniques. Although the process perspective in educational domains has received some attention, most of the traditional intelligent data analysis approaches applied in the context of educational data mining do not consider the process as a whole (i.e., the focus is no data or simple sequential structures rather than full- fledged process models). In this paper, we illustrated some of the potential of process mining techniques applied to online assessment data where students in one of the tests were able to receive tailored immediate EF after answering each of the questions in the test one by one in a strict order, and in the other test â€“ to receive no feedback but to answer question in a flexible order. This data was of a sequential nature, i.e. it did not include concurrency. However, other educational processes have lots of concurrency and this can be discovered by ProM. Applying process mining techniques for other types of assessment data, e.g. grades for traditional examinations is therefore an interesting possibility. ProM 5.0 provides a plugable environment for process mining offering a wide variety of plug-ins for process discovery, conformance checking, model extension, model transformation. Our further work includes the development of EDM tailored ProM plug- ins. On the one hand, this would help bringing process mining tools closer to the domain experts (i.e. educational specialists and researchers), who not necessarily have all the technical background. On the other hand, this will help to better address some of the EDM specific challenges related to data preprocessing and mining. Besides this, the development of the authoring tools for assessment modules with specialized ProM plug- ins would allow to significantly simplify some of the processes for conformance analysis as e.g. a Petri net representing certain assessment procedure can be generated completely automatically. Acknowledgements. This work is supported by NWO (the Dutch Science Foundation). We would like to thank the many people involved in the development of ProM."

¿Cómo puedes configurar o deshabilitar tus cookies?

Process Mining Online Assessment Data

InProceedings