Predicting Correctness of Problem Solving from Low- level Log Data in Intelligent Tutoring Systems

InProceedings

Casey Hord

Luo Si

Suleyman Cetintas

Yan Ping Xin

Proceedings of Educational Data Mining, 2009

2009 2009

This paper proposes a learning based method that can automatically determine how likely a student is to give a correct answer to a problem in an intelligent tutoring system. Only log files that record studentsâ€™ actions with the system are used to train the model, therefore the modeling process doesnâ€™t require expert knowledge for identifying domain specific skills that are needed to solve the problem or studentsâ€™ possible solution methods etc. The model utilizes a set of performance features, problem features, time and mouse movement features and is compared to i) a model that utilizes performance and problem features, ii) a model that uses performance, problem and time features. In order to address data sparseness problem, a robust Ridge Regression algorithm is designed to estimate model parameters. An extensive set of experiment results demonstrate the power of using multiple types of evidence as well as the robust Ridge Regression algorithm.

"1. Details of the problem solving worksheets. The information of whether problems of a worksheet include i) diagram boxes can be seen under the Include Diagram Boxes column; ii) equation boxes can be seen under the Include Equation Boxes column; iii) unknown number to be solved for can be seen under the Include Unknown ""umber column. Shows Correct Answer column shows whether the correct answer to a question is shown to students after they submit their answer. Such partial skills for a problem include answers to i) diagram boxes which check studentâ€™s mapping of the information given in a problem into an abstract model; ii) equation box which checks whether a student can form a correct equation from the information given in a problem; iii) final answer box which checks whether a student can solve the asked unknown in a problem correctly. Details about which worksheets include which groups of boxes are shown in Table 1. In some of the worksheets, after the student answers the question, the correct answer is also shown to the student after each question in the worksheet (regardless of whether the studentâ€™s answer is correct or not) to make the student learn from his/her mistake. Details about which worksheets show correct answers are also shown in Table 1. The study with the tutoring system included 8 students which include 3 students with learning disabilities, 1 student with emotional disorder and 1 student with emotional disorder combined with mental retardation. Students used the tutor for several 30 minute class sessions (on average 18.1255 sessions per student with standard deviation of 3.4408 sessions) during which their interaction with the tutoring system was logged in a centralized database. A total of 1960 problems that were solved, 1291 of which were correctly solved and 669 of which were incorrectly solved. The average number of correctly solved problems per student is 161.37 (with a standard deviation of 42.07) and the average number of incorrectly solved problems per student is 83.62 (with a standard deviation of 27.07). Data from 4 students were used as training data to build the models for making predictions for the other 4 students (who are used as the test data). Details about the training and test splits are given in Table 2. 3 Methods: Least Squares and Ridge Regression. Data sparseness is an important problem which is caused by using limited training data to learn parameters of a model and leads to the common problem of over-fitting [9]. Over- fitting as the name implies is the problem of having an excellent fit to the training data which may not be a precise indicator of future test data especially in the case of data sparseness. Regularization is a technique to control the over-fitting problem by setting constraints on model parameters in order to discourage them from reaching large values that lead to over-fitting. We will briefly discuss the Least Squares technique followed by Ridge Regression technique that controls over-fitting [9]. The simplest linear model for regression involves a linear combination of input variables as follows: FORMULA_1. FORMULA_2. which can be minimized with a maximum likelihood solution that gives the Least Squares solution of the model parameters as follows: FORMULA_3. FORMULA_4. where + is the regularization coefficient that controls the relative importance of data- dependent error and the regularization term. The regularization coefficient in this work is learned with cross validation in the training phase (i.e. splitting the training data into smaller training and test datasets). The exact minimizer of the total error function can be found in closed form as follows: FORMULA_5. which is the Ridge Regression solution of the parameters of the model. 4 Modeling Approaches. This section describes the models that are used for evaluation: i) a model that considers performance and problem features, ii) another modeling approach that considers time features as well as performance and problem features, and finally iii) a more advanced model that incorporates mouse movement features with performance, problem and time related features. 4.1 Performance and Problem Based Modeling (PerfProb_Mod). Using performance and problem based features has been shown to be a useful approach for student modeling in the prior work [3, 14]. The idea of using performance features is quite intuitive since studentsâ€™ performance up to a certain problem is a good indicator of their performance for that problem. Similarly problem related features such as problem difficulty or number of sub skills (types of question boxes in this work) required etc., are very informative to see whether a current student can correctly answer a problem or not. In this work; 4 performance features are used. The set of 4 performance features are used as a measure of the probability that the student knew the skills asked in a question. The first feature is the # of correct answers so far in a problem solving worksheet. Each problem solving worksheet consists of 12 math word problems and a problem is counted as correct only if all question boxes for the problem are filled correctly. The number of correctly solved problems up to a current problem in a worksheet is a good indicator for studentâ€™s success for the current problem. Second, third and fourth performance features help to assess studentâ€™s partial skills that are needed for the solution of a problem when they canâ€™t give a full answer. Such partial skills for a problem include the abilities to give answers to, as mentioned before, i) diagram boxes which check studentâ€™s mapping of the information given in a problem into an abstract model; ii) equation box which checks whether a student can form a correct equation from the information given in a problem; iii) final answer box which checks whether a student can solve the asked unknown in a problem correctly. The corresponding features are percentage of correct diagram answers so far, percentage of correct equation box answers so far, percentage of final answers so far in a problem solving worksheet. They provide the percentage of correct answers given by a student for the associated partial skill boxes of all the solved problems of a current worksheet up to the current problem. In addition to the 4 performance features, 11 problem features are also used indicating which problem solving worksheet the current problem belongs to. In our model, there are 11 binary variables corresponding to 11 worksheets. If a current problem belongs to 5 th worksheet (i.e. MC Worksheet 1), then 5th binary variable will be 1 and all others will be 0. This encoding approach enables the model to associate each problem with the different characteristics of different worksheets. This encoding scheme is also mentioned in Beckâ€™s work as â€œone hotâ€ encoding [3]. Performance and problem based modeling in this work serves as the baseline for all other models and will be referred as PerfProb_Mod. 4.2 Performance, Problem and Time Based Modeling (PerfProbTime_Mod). Performance and problem based modeling approach is useful in many situations however there are lots of other possible data that can be good indicators of studentsâ€™ success for a current problem such as the time that a student spends while solving a problem. Although not all the prior work used time related features [14], it has been used as a feature by Beck [3]. In addition to the 15 performance and problem features mentioned before, this modeling approach also incorporates the time feature for student modeling. The time feature in this work is defined as the time a student spends while solving a problem. Performance, problem and time based modeling approach will be referred as PerfProbTime_Mod. 4.3 Performance, Problem, Time and Mouse Tracking Based Modeling (PerfProbTimeMouseT_Mod). Incorporation of the time feature into the performance and problem based modeling is an effective way of improving student modeling; however there is still more room to improve. Both performance & problem based modeling and performance, problem & time based modeling approaches ignore an important data source with which students are almost always in interaction while they are solving problems in a problem solving environment: the mouse. As far as we know there is no prior research on student modeling that utilize mouse tracking data. More details about the prior work on this modeling approach as well as utilizing mouse movement data can be found in the Introduction section. In addition to the 4 performance related features, 11 problem related features and 1 time feature that have been mentioned; this modeling approach incorporates 3 more features as mouse tracking data. The first feature is the maximum mouse off time in a problem which provides the knowledge of the biggest time interval (in seconds) in which mouse is not used for a current problem. Second and third mouse tracking features are the average x movement and average y movement respectively. They basically assess average number of pixels the mouse is moved along the x and y axes in 0.2 second intervals. Performance, problem, time and mouse tracking based modeling that we propose will be referred as PerfProbTimeMouseT_Mod. 5 Experimental Methodology: Evaluation Metric. To evaluate the effectiveness of the off-task behavior detection task, we use the common 1F measure, which is the harmonic mean of precision and recall [2,13]. Precision (p) is the ratio of the correct categorizations by a model divided by all the categorizations of that model. Recall (r) is the ratio of correct categorizations by a model divided by the total number of correct categorizations. FORMULA_6. 6 Experiment Results. This section presents the experimental results of the methods that are presented in Methods section. All the methods were evaluated on the dataset described in Data section. Table 3. Results of PerfProbTimeMouseT_Mod method is shown in comparison to PerfProb_Mod and PerfProbTime_Mod methods for high level student modeling to detect whether a student can correctly solve a given problem. Note that the results for each model for the technique of least squares are shown under the Least Squares column, and the results for each model for the technique of Ridge Regression are shown under the Ridge Regression column. The percentages in the parenthesis show the relative improvements of each method with respect to the Least Squares version of the PerfProb_Mod model. An extensive set of experiments are conducted to address the following questions: â€¢ How effective is the PerfProbTime_Mod method that utilizes performance, problem and time features with respect to PerfProb_Mod method that utilizes performance and problem features? â€¢ How effective is the PerfProbTimeMouseT_Mod method that utilizes mouse tracking data as well as performance, problem and time features with respect to PerfProb_Mod and PerfProbTime_Mod methods? â€¢ How effective is the approach of utilizing the Ridge Regression technique to estimate the model parameters? 6.1 The Performance of Performance, Problem and Time Based Modeling (PerfProbTime_Mod). The first set of experiments was conducted to measure the effect of including the time feature in the PerfProb_Mod model. The details about this approach are given in detail in Section 4.1. More specifically, PerfProbTime_Mod model is compared with PerfProb_Mod and their performances are shown in comparison to each other in Table 3. It can be seen that the PerfProbTime_Mod model outperforms PerfProb_Mod model. The lesson to learn from this set of experiments is that time feature is helpful when it is combined with performance and problem related features for the task of predicting whether a student will be able to correctly answer a current problem. This explicitly demonstrates the power of incorporating the time feature into the performance and problem related based modeling. 6.2 The Performance of Performance, Problem, Time and Mouse Tracking Based Modeling (PerfProbTimeMouseT_Mod). The second set of experiments was conducted to measure the effect of including the mouse tracking data in the PerfProbTime_Mod model. The details about this approach are given in detail in Section 4.2. More specifically, PerfProbTimeMouseT_Mod method is compared with the other two models and its performance is shown in comparison to the other two models in Table 3. It can be seen that the PerfProbTimeMouseT_Mod model outperforms both PerfProbTime_Mod and PerfProb_Mod models. This set of experiments show that mouse movement features are helpful when they are combined with performance and problem related features along with the time feature for high level student modeling. This explicitly demonstrates the power of incorporating the mouse tracking features into performance, problem and time based modeling. 6.3 The Performance of Utilizing the Robust Ridge Regression Technique. The last set of experiments was conducted to measure the effect of utilizing the technique of Ridge Regression for learning the model parameters for each of the models. The details about this approach are given in detail in Section 3. More specifically, Ridge Regression learned models are compared to Least Squares learned models. The performance of Ridge Regression versions of each model is shown in comparison to Least Squares versions in Table 3. It can be seen that the Ridge Regression version of each model outperforms Least Squares versions with its regularization framework. This confirms that Ridge Regression models better solve the data sparseness problems in this application. 7 Conclusion and Future Work. This paper proposes a novel machine learning method for high-level student modeling (that doesnâ€™t require any expert knowledge of the domain to extract skills, or possible solutions that students may follow) to detect if a student can correctly solve a current problem in a problem solving environment while using an intelligent tutoring system. This model relies only on the low-level log data that is available from the log files from studentsâ€™ actions within the software. The proposed model makes use of a set of evidence such as performance, problem, time and mouse movement features and is compared to i) a model that utilizes performance and problem related features, ii) a model that uses performance, problem and time features together. To address data sparseness problem, the proposed model utilizes a robust Ridge Regression technique to estimate model parameters. An extensive set of empirical results show that the proposed method that automatically detects whether a student will be able to correctly answer a problem substantially outperforms the model that uses performance and problem related features as well as the model that utilizes performance, problem and time features together. Furthermore empirical results show that the proposed model attains a better performance by utilizing the technique of Ridge Regression over the standard Least Squares Regression technique. There are several possibilities to extend the research. For example, different students have different types of characteristics for solving problems (e.g. using more or less time to solve the problems; having difficulties with particular types of questions and/or problems or different mouse usage types etc.). Therefore, personalized models tend to provide more accurate detection results than a single model for all students. Future research work will be conducted mainly in this direction. Acknowledgements. This research was partially supported by the NSF grants IIS-0749462 and IIS-0746830. Any opinions, findings, conclusions, or recommendations expressed in this paper are the authors', and do not necessarily reflect those of the sponsor."

Acerca de este recurso...

Visitas 142

Guardar en Mi espacio personal
Enviar enlace

Categorías:

Educational Data Mining (EDM)

Etiquetas:

0 comentarios

¿Quieres comentar? Regístrate o inicia sesión

¿Cómo puedes configurar o deshabilitar tus cookies?

Predicting Correctness of Problem Solving from Low- level Log Data in Intelligent Tutoring Systems

InProceedings