Using Logistic Regression to Trace Multiple Subskills in a Dynamic Bayes Net

InProceedings

Jack Mostow

Proceedings of Educational Data Mining, 2011

2011 2011

A challenge in estimating studentsâ€™ changing knowledge from sequential observations of their performance arises when each observed step involves multiple subskills. To overcome this mismatch in grain size between modelled skills and observed actions, we use logistic regression over each stepâ€™s subskills in a dynamic Bayes net (LR-DBN) to model transition probabilities for the overall knowledge required by the step. Unlike previous methods, LR-DBN can trace knowledge of the individual subskills without assuming they are independent. We evaluate how well it fits childrenâ€™s oral reading fluency data logged by Project LISTENâ€™s Reading Tutor, compared to other methods.

"1. INTRODUCTION. Dynamic Bayes nets are often used to model skill acquisition, e.g., in Knowledge Tracing (KT) [Corbett and Anderson, 1995]. They model a skill as a hidden state of knowledge, and estimate the changing probability of this state by observing successive attempts to use the skill. However, KT does not model multiple subskills used in such an attempt. Previous research has explored various approaches to this problem. One approach is to use a conjunctive model [Cen et al., 2008; Gong et al., 2010], which assumes that a student must master all of the subskills in order to perform the step correctly. If the subskills are independent, the probability of knowing them all can be estimated by multiplying the estimated probabilities of knowing the individual subskills. However, this product typically underestimates the probability of knowing all the subskills. The minimum of their estimated probability provides a less pessimistic estimate based on the assumption that the likelihood of a correct answer is dominated by the studentâ€™s weakest subskill [Gong et al., 2010]. Alternatively, Koedinger et al. [2011] use techniques from Bayesian nets to avoid blaming each subskill equally in conjunctive models. By pre- specifying the relationship between a step and its subskills, all of these approaches make strong assumptions about independence of subskills. Performance Factors Analysis [Pavlik Jr. et al., 2009b] and Learning Factors Analysis [Cen et al., 2006; Pavlik Jr. et al., 2009a] use non-linear regression to estimate multiple subskills without making such assumptions, but statically: that is, they do not trace subskills over time. This paper presents LR-DBN, a method that uses logistic regression to trace multiple subskills in a dynamic Bayes net student model without assuming they are independent. Section 2 explains how LR-DBN works, Section 3 evaluates it, and Section 4 concludes. 2. LOGISTIC REGRESSION IN A DYNAMIC BAYES NET. In a KT model, and in other dynamic Bayes net models for student modelling [Chang et al., 2006], we estimate the probability of a student knowing, learning, or forgetting the skill(s) required to perform each observed step. So, we use a latent variable to model a hidden knowledge state that changes over time, and infer it from sequential observations of the studentâ€™s performance. If we know (or assume) which set of subskills a step requires, it makes sense to estimate overall knowledge of the step as some function of the estimated knowledge of each individual subskill it requires. Accordingly, we propose to model the probabilities of transitions between successive knowledge states using logistic regression over all of the subskills. We later prove that this approach is equivalent to modeling the knowledge probabilities themselves using logistic regression, but it provides a more convenient way to trace the individual subskills. Fig. 1. A dynamic Bayes Architecture with Logistic Regression. Fig. 1 shows a dynamic Bayes architecture of KT framework model with binary variables: Sj(n): known indicator variable; 1 if step n requires subskill j, 0 otherwise. K(n): hidden; true iff the student has the knowledge step n requires. P(n): observed; true iff the student performs step n correctly. Besides, we denote already know at initial state. Then we use logistic regression to model the (1 â€“ already know) and transition probabilities from the knowledge state K (n-1) at step n-1 to the knowledge state K (n) at step n over m subskills: FORMULA_1. FORMULA_2. FORMULA_3. Here is the coefficient fit for skill j at the initial state, where, and and represent skill jâ€™s respective contributions (when involved) to the transition probabilities in (2) and (3) from to . We assume that and do not vary over time. These equations imply that the more subskills a step requires, or the harder they are to learn, the lower the probability of knowing or learning the step. We now show how the model can trace individual subskills, first in a simple case and then in the general case. Tracing subskills in a simple case: Consider a simple scenario in which a student repeatedly practices a single step that involves multiple subskills. The transition probabilities in equations (1), (2), and (3) correspond to (1 â€“ already know), (1 â€“ learn), and forget in KT. For now we assume forget is 0, a standard assumption in KT. Thus: FORMULA_4. FORMULA_5. FORMULA_6. Note that the variables Sjâ€™s that indicate which subskills are used in a step do not have superscripts of n because their assignments are determined by the step, so they remain constant for repeated practice of the same step. Now we compute P(K(1)= true ): FORMULA_7. FORMULA_8. FORMULA_9. FORMULA_10. FORMULA_11. FORMULA_12. To save space, we only showed here how to update subskill knowledge independent of observed performance on the step. To condition on performance, we can further derive , and therefore derive in (9) and (12). 3. EXPERIMENTS. We implemented LR-DBN in the Bayes Net Toolbox for Matlab [Murphy, 2006]. Specifically, we defined the knowledge K given m subskills Sj as a â€œsoftmaxâ€ node in the toolbox. We used LR-DBN to model the growth of childrenâ€™s oral reading fluency, where performance P denotes whether the student read a word fluently. Our data was recorded by Project LISTENâ€™s Reading Tutor [Mostow and Aist, 2001] during the 2005- 2006 school year. We scored each word as fluent if read without help or hesitation and accepted by the automated speech recognizer. We assume that whether a student read a word fluently depended on whether the student knew the grapheme-to-phoneme mappings in the word. So in our experiment, the subskills required in a studentâ€™s reading word step are the wordâ€™s grapheme-to-phoneme mappings. We modeled 27 children who read a total of 5,078 distinct word types with 332 unique grapheme-phoneme mappings. To evaluate our models, we fit them separately for each student on the first half of all of that studentâ€™s data, tested on the second half, and averaged the test results across students. The test set contains a total of 32,122 read words, out of which 23,222 were fluent. For comparison, we also applied the original KT model and estimated the probability of knowing a word as the minimum probability of knowing all of its grapheme-to-phoneme mappings, based on assuming that the studentâ€™s weakest subskill determined whether he read the word fluently. Table I shows the results. The values in parentheses show 95% confidence intervals based on standard error calculated from the unbiased weighted sample variance of individual studentsâ€™ accuracies. Since the data is unbalanced (72.3% of the words were fluent), we also report within-class accuracies. Table I shows that LR-DBN significantly outperformed the weakest-subskill KT model, especially on non-fluent words. High within-class accuracy on unbalanced data is often hard for KT [Zhang et al., 2008]. Table I. LR-DBN vs. KT Models of Childrenâ€™s Reading Fluency Growth. 4. CONCLUSION. This paper describes and evaluates LR-DBN, a novel student modeling method to trace hidden subskills by using logistic regression within a dynamic Bayes net. We used oral reading data from 27 children to compare LR-DBN to conventional knowledge tracing of weakest subskills. LR-DBN performed significantly better overall, thanks to similar accuracy (92.0% Â± 2.7% vs. 94.5% Â± 5.8%) on fluent words combined with 4-fold higher accuracy on disfluent words (80.5% Â± 9.5% vs. 19.3% Â± 8.4%). We later [Xu and Mostow, 2011] tested both models on a published data set [Koedinger et al., 2010] from 123 students working on a geometry area unit of the Bridge to Algebra Cognitive Tutor Â® . LR-DBN fit this data significantly better too, with only half as many prediction errors on unseen data. Future tests should use data from more students and tasks, and compare LR- DBN to other baselines, such as conjunctive modeling [Koedinger et al., 2011]."

Acerca de este recurso...

Visitas 155

Guardar en Mi espacio personal
Enviar enlace

Categorías:

Educational Data Mining (EDM)

Etiquetas:

0 comentarios

¿Quieres comentar? Regístrate o inicia sesión

¿Cómo puedes configurar o deshabilitar tus cookies?

Using Logistic Regression to Trace Multiple Subskills in a Dynamic Bayes Net

InProceedings