We provide and illustrate a methodology for taking into account data for a knowledge diagnosis method in orthopaedical surgery, using Bayesian networks and machine learning techniques. We aim to make the conception of the student model less time-consuming and subjective. A first Bayesian network was built like an expert system, where experts (in didactic and surgery) provide both the structure and the probabilities. However, learning the probability distributions of the variables allows going from an expert network toward a more data-centric one. We compare and analyze here various learning algorithms with regard to experimental data. Then we point out some crucial issues like the lack of data.
"1. THE STUDY. TELEOS1 (Technology Enhanced Learning Environment for Orthopaedical Surgery) is an Intelligent Tutoring System designed for the percutaneous orthopedic surgery [Vadcard and Luengo 2004]. A student model based on a Bayesian network was built after a long didactical analysis of the domain, as presented in Minh Chieu et al. [2010]. Bayesian networks for student modeling usually are expert system; in TELEOS, surgeons were implied for designing both the structure and the probability tables of the network. However it seems interesting to use a more automatic approach, as presented by Mayo and Mitrovic [2001]. First, the model includes a lot of variables and experts can roughly estimate all the parameters and are prone to error or approximation. Then surgeons sometimes work in a different way and taking into account their various points of view may be hard. In TELEOS, a robotic arm that records continuous data like the strength or the position is also used for the knowledge diagnosis. Surgeons are not used to deal with such data. Thus, our work aims to study some algorithms for learning the parameters of the Bayesian network. Given data, the parameters can be learned, i.e. computed from a base of observations. Indeed, we can find in the literature several algorithms designed for learning the probabilities of a Bayesian network. However, they present different characteristics and have both strong and weak points – no guarantee of results can be offered anyway. We studied three of them: ─ Maximum likelihood (ML) for counting facts in the database ─ Maximum A Priori (MAP), based on ML algorithm but taking into account prior knowledge on the domain ─ Estimation-Maximisation (EM) that can handle misses in the database These algorithms are well known in the literature; see for example [Heckerman 1995]. {1} The TELEOS project is supported by the CNRS and by the French research agency (ANR-06-BLAN-0243). We need quality data, in term of quantity, coverage and representativeness, for learning the parameters in a pertinent way. Data was collected at the Grenoble’s hospital in France. First, one surgeon and six interns realized a set of six exercises with TELEOS, each of them presenting various characteristics and difficulties. Then, the experimental team and a second surgeon both had to manually perform the knowledge diagnosis, based on the observation of the students. For each action, knowledge may be either brought into play in a valid way, in an invalid way, or not used whereas it should have been. We got at the end a database of only 3000 entries. Indeed collecting data is expensive in our domain, as we need at least one surgeon. 2. RESULTS. Data was used for learning the parameters and validating the network in two different manners. We first performed cross-validation in order to estimate the accuracy of the diagnosis (i.e. we trained the model with a part of the data and validated it on the other part). As the expert did the diagnosis only for almost 1/3 of the data, we partitioned the data in different ways for the cross-validation. In method I we blend expert and non- expert data (Table I), in method II we only used expert data for learning the parameters and non-expert data for the validation (Table II). Then we used a 3-folds method (Table III). Results are shown bellow. Table I. Prediction accuracy (method I). Table II. Prediction accuracy (method II). Table III. Accuracy with 3-folds cross validation. According to these results, the MAP estimation gives the best accuracy, probably due to a good prior distribution based on our knowledge of the domain. Since there is few misses in the database, EM algorithm gives almost the same results than the Maximum Likelihood. However, the accuracy is not really good. A first explanation may be the lack of data and the difference between expert and non-expert data. On the other hand various data reduces potential bias for the learning. To conclude, we compared different ways to learn the parameters of a Bayesian network for the knowledge diagnosis that keep the deep didactical analysis of the domain (here the structure). In future works, we want to evaluate this data-centric approach in a more qualitative way, with new experiments at the hospital. We also wish to bring out a methodology that takes into account both expert knowledge and data."
About this resource...
Visits 224
Categories:
0 comments
Do you want to comment? Sign up or Sign in