Mining Student Behavior Models in Learning-by-Teaching Environments

InProceedings

Gautam Biswas

Hogyeong Jeong

Proceedings of Educational Data Mining, 2008

2008 2008

This paper discusses our approach to building models and analyzing student behaviors in different versions of our learning by teaching environment where students learn by teaching a computer agent named Betty using a visual concept map representation. We have run studies in fifth grade classrooms to compare the different versions of the system. Studentsâ€™ interactions on the sys- tem, captured in log files represent their performance in generating the causal concept map structures and their activities in using the different tools provided by the system. We discuss methods for analyzing student behaviors and linking them to student performance. At the core of this approach is a hidden Markov model methodology that builds studentsâ€™ behavior models from data collected in the log files. We discuss our modeling algorithm and the interpretation of the models.

"1.1 Learning by teaching: The Bettyâ€™s Brain system. The Bettyâ€™s Brain system is illustrated in Fig. 1. The teaching process is implemented as three primary activities: (i) teach Betty concepts and links using a concept map representation [6]; (ii) query Betty to find out how she an- swers questions using what she has been taught; and (iii) quiz Betty on a set of predefined questions generat- ed by the mentor agent to see how she performs. Betty uses qualitative reasoning me- thods to reason through chains of links to answer questions, and, if asked, explains her reasoning using text and animation schemes. She al- so provides feedback that reflects the studentsâ€™ teaching behaviors. The goal is to get the students to adopt metacognitive strategies in their learning tasks [11]. Students reflect on Bettyâ€™s answers and her explanations, and revise their own knowledge as they make changes to the concept maps to teach Betty better. 1.2 Experimental Design. Our participants were 56 students in two 5th grade science classrooms taught by the same teacher. Data for three students was dropped from the main study, and three more were dropped from the transfer study due to excessive absences. Students were assigned to one of three conditions using stratified random assignment based on standardized test scores. The students first created concept maps on river ecosystem concepts and causal relations in the main phase (seven 45-minute sessions) of the study. Depending on their assigned group, students utilized one of three versions of the system: a) a learning by teaching (LBT) version in which students taught Betty, b) a self-regulated learning by teaching (SRL) version in which students taught Betty and received metacognitive prompts from Betty, and c) an intelligent coaching system (ICS) version in which students created a map for themselves with guidance from the mentor agent. The ICS system served as our control condition [3]. After an eight-week delay, students participated in the transfer Figure 1: Betty's Brain System with Query Window. phase (five 45-minute sessions) in which they learned about a new domain, the land- based nitrogen cycle. During this phase, all students used a stripped-down version of the LBT system in which all of the feedback provided by Betty or the mentor was removed. 1.3 Results. We scored the studentsâ€™ final main and transfer concept maps to identify correct inclusions of concepts and links based on the resources that were provided to the students. Table 1 shows the average concept map scores by condition for the two phases of the study. Students who taught developed more complete and interconnected concept maps than students who created maps for themselves. The differences in map scores are statistically significant (SRL > LBT, ICS; LBT > ICS; p < 0.05) in the main phase, and the difference between SRL and ICS persisted during the transfer phase. Table 1. Concept map score: main and transfer phases. We also performed preliminary analyses of studentsâ€™ behavior sequences to shed light on their learning processes [1]. We found that students who generated better concept maps used balanced learning strategies: they read resources, built and edited their concept maps, asked queries to probe the correctness of their maps, and used quiz questions in equal measure to learn about the domain. On the other hand, students who generated low scoring concept maps adopted a myopic learning strategy that overly focused on getting their quiz answers correct. They relied mostly on the quiz questions, and did not seem to read resources or probe their maps by asking queries very often. 1.4 Model Selection. To investigate this relation between learning performance and the use of strategies by group, it became important to go beyond frequency counts or proportions of individual activities, and examine how these activities came together as larger behavior patterns and strategies. To this end, we found the hidden Markov models (HMMs) to be the most ap- propriate, as they allow us to identify some of the studentsâ€™ general behavior patterns from sequences of their interactions with the system.. 2 A Data Mining Approach to Analyzing Student Behaviors. Our approach to analyzing student behaviors in the main and transfer phases of the study involves four steps that appear in most data mining applications [12]: (i) devise a logging system that records student interactions with the system; (ii) perform data cleaning by parsing the generated log files and splicing the information into desired activity sequence data that will form the input to the HMM generating algorithm; (iii) construct the HMM models; and (iv) interpret generated models as student learning behaviors and compare models across conditions. We describe each of these steps in greater detail below. 2.1 Generating the Raw Data: The Logging System. The raw data corresponds to actions performed by students on the system and the res- ponses provided by Betty and the mentor agent. The agents and the student interact by communicating through the system environment agent. For example, if the student asks Betty a question, the request is broadcast to the environment agent, who routes it to other agents. The left side of Fig. 2 illustrates the message passing that takes place in the sys- tem. The signal first broadcast to the environment agent is routed to the teachable agent, who makes a request to the environment agent to use the current concept map to answer the question. This raw data is then processed in real-time and stored in log files as shown in the right side of Fig. 2. At the end of a session, the logs are sent to a server and consol- idated into a session by session sequence for each student into a database. Figure 2: Events and messages being turned to log files. 2.2 Parsing the Log Files. In this study, we derive studentsâ€™ behavior patterns by analyzing the sequence of their in- teractions with the system over multiple sessions. To simplify the interpretation task we lumped analogous student actions into one aggregate activity. Table 2: Student Activities and Related Actions. For example, all of the map creation and modification activities, i.e., adding concepts and links, deleting concepts and links, and editing nodes and links, were combined into one aggregate activity called Edit Map (EM). All student activities were expressed as the eight activities summarized in Table 2. Examples of the resultant sequences are shown in Fig. 3 below. Figure 3: Parsed Data for 3 students in session 1. 2.3 Constructing the HMMs. The first step in interpreting this behavior data was to generate hidden Markov models. A hidden Markov model consists of hidden states that are not directly visible, and are go- verned by three sets of parameters: initial probability vector Ï€, transition probability ma- trix, A, and output probability matrix, B [7]. By representing concise models of student activity patterns, a hidden Markov model has the potential of providing us with a global aggregated view of how students approach the learning task [3]. The algorithm that constructs HMMs given a set of observation sequences derives an op- timal set of these parameters (Ï€, A, B) that maximizes the likelihood of the input se- quences. Further, simpler models are easier to interpret (Occamâ€™s razor principle), so we apply an algorithm developed by Li and Biswas [5] that uses the Bayesian information criterion (BIC) to trade off simplicity of the model against information provided by the model. BIC, defined as log(L) â€“ d/2log(N), uses the log likelihood of the model (log(L)), model size (d), and the number of observations (N) to find the model that strikes a bal- ance between high likelihood and low complexity [5]. Finding the optimal HMM parameters from data is an optimization problem. Two com- mon iterative convergence optimization schemes are the Baum-Welch [7] and the seg- mental K-Means [4] algorithms. In this paper, we use the segmental K-Means algorithm in conjunction with BIC for iterative segmentation and optimization steps to achieve the optimal model parameters, which include (Ï€, A, B) and the number of states in the model, d. The segmentation step uses the Viterbi algorithm for sequential decoding, while the optimization step finds a new set of model parameters as dictated by the K-Means me- thod [4]. A chief advantage of the K-Means algorithm is its faster execution time gained by setting a restricted optimization objective. In the future, the faster speed may allow on- line computation of the behavior models to provide guided feedback as the student works on the system. A concern, however, for the segmental K-Means algorithm is the likelihood of its con- vergence to local maxima. In this work, we ran the algorithm one hundred times with random initializations (by sampling the initial parameter values from uniform distribu- tions), all of which converged to the same configuration. We also compared the BIC values generated by the Baum-Welch and the segmental K-Means algorithms, and found that the K-Means algorithm produced consistently better results (see Fig. 4). While these empirical results do not conclusively dispel the fact that our solutions may converge to local maxima, they nevertheless show the algorithm to be quite robust. The parsed activity sequences were used to derive two sets of three hidden Markov mod- els for the three conditions using the above algorithm. 3 Analysis of the HMMs. The behavior sequence models for the ICS, LBT, and SRL groups in the main and trans- fer study created using our HMM algorithm are shown in Fig. 5. Each model is made up of a set of states, the activity patterns (the output probability) associated with each state, and the transition probabilities between states. For example, the model predicts that students in the ICS condition RQ(75%)CE(25%) state requested the quiz 75% of the time, and asked for a continued explanation 25% of the time. The transition probability associated with a link between two states indicates the likelihood of the student transitioning from the current state to the indicated state. For example, the HMM model states student in the ICS condition in state RQ(75%)CE(25%) of the main phase would demonstrate an 6% likelihood of transition- ing to state EM, a 19% likelihood of remaining in the same state, and 71% likelihood of transitioning to state QT. Likelihoods less than 2% were not represented in the figure, ex- plaining why these numbers do not sum to 100%. HMMs are so named because their states are hidden. That is, they are not directly observed in the input sequences, but pro- vide an aggregated description of the studentsâ€™ interactions with the system. Sequences of states may be interpreted as the studentsâ€™ learning behavior patterns. We investigate fur- ther by interpreting these models in terms of cognitive learning behaviors of the students. In looking at these HMMs, we find several interpretable patterns that present themselves through high intrastate transition probabilities and low interstate transition probabilities. For example, we see that the transition probabilities from states with Request Explanation (RE) to Continue Explanation (CE) are strong (â‰¥49%). Also, we see that these states are quite isolated, as the transition probabilities into these two states from other states are typically small (only 7% in the transfer SRL model). We combine these observations with knowledge of patterns relevant to interactive metacognition to find three patterns: basic map building, map probing, and map tracing [3]. Basic map building is a pattern characterized by editing the map, submitting the map for a quiz, and occasionally referring to the reading resources. The pattern reflects a basic and important metacognitive strategy. Students work on their maps, check the map by taking a quiz to see if there are flaws, and occasionally refer to the readings. Map probing is defined by students asking questions of their map to check for specific re- lations between two concepts (e.g., if fish increase, what happens to algae?). This pattern exhibits a more proactive, conceptually driven strategy, because students are targeting specific relations rather than relying on the quiz to identify errors. Students also need to formulate their own questions to do so. Figure 4: K-Means and Baum-Welch-generated BIC values for the ICS Model. Figure 5: HMM Models for the three conditions in the main and transfer phase. Map tracing pattern reflects students asking Betty or the mentor (depending on the sys- tem) to explain the reasoning step by step. When Betty or the mentor initially answers a question, they state that a change in one entity causes a change in another entity and high- light the paths they followed to reach their answer. To follow the details of the inference chain, students had to ask Betty or the mentor agent to explain their reasoning. The agents did so by hierarchically decomposing the chain of inference; for each explanation request, they showed how a particular path within the larger chain contributed to the final answer. Receiving more details about the reasoning process is particularly useful when maps become complex, and there are multiple paths between two concepts. To build reduced versions of HMMs that incorporate these patterns, we first built aggre- gate states that represented the patterns of its individual behaviors. For instance, Edit Map, Request Quiz, Quiz Taken, Quiz Denied, and Resource Access were combined into the basic map building state; Ask Query was treated as map probing state; and Request and Continue Explanation were combined into a map tracing state. The transitions were then constructed in accord to a proportional weighing of the individual behaviorâ€™s statio- nary probabilities. 3.1 Initial Analysis. Our preliminary analysis consisted of examining the prevalence of each behavior in the resultant stationary probabilities. The derived stationary probability values are listed in Table 3. In a sense, this analysis is equivalent to the frequency count analysis that we have performed in other studies [3], and indicates an estimate of the relative time spent in each state. Similar results in both studies help to validate our methods and results. Table 3: Grouped Stationary Probabilities. In the main phase, we see that the differences in stationary probabilities among the groups are quite pronounced. For example, we see that there exists a significant drop-off in the probabilities of the studentsâ€™ edit map behaviors between successive conditions. Meanwhile, we see proportional increases in activities belonging to higher-level patterns, such as requesting explanation, and continuing explanation. The studentsâ€™ use of continue explanation is especially pronounced. In the transfer phase when the students operate in a common environment, we see that the differences become smaller. In all three groups, we see a great spike in the number of re- source accesses relative to the main phase. At the same time, we observe a decrease in the occurrence of some of the higher-level patterns. This may be due to the students learning about a new domain with a simpler expert map that contains fewer interdependences be- tween concepts, and being given few sessions than in the main phase (five versus seven). This also could be due to the students internalizing the reasoning mechanism, therefore, a reduced need to perform such activities in the transfer phase [3]. 3.2 Analysis of the HMM Patterns. Our next level of analysis consisted of examining the interactions among the metacogni- tive states and their transitions in our models (Fig. 6). These interactions inform us about studentsâ€™ typical learning behavior patterns by condition that we have identified. In the main phase, we find that the students in the ICS group tend to stay mainly in the basic map building state, while the SRL students tend to stay more in the higher-level states once they get there. For example, the self-loop vector for the map tracing state is much larger in the SRL group (23%) than in the ICS or the LBT groups (9% and 5% per- cent, respectively). Also, while the differences between ICS and LBT seem to be small, the ICS students seem spend most of their effort in basic map building, while the LBT students do spend a little more time in map probing, a higher level metacognitive activity. In the transfer phase, the difference between the ICS and the LBT group becomes harder to discern. However, like the main phase, the LBT students seem more likely to enter map tracing state than the ICS students (5% as opposed to 3%), but are more likely to leave once they get there. However, unlike in the main phase, the LBT students now seem to be more confined to the basic map building state than the ICS students (83% as opposed to 81%). However, the SRL students still perform more map probing than the other groups. 4 Conclusions and future work. Our data mining approach to building HMM models of student behaviors from log files have been very revealing. They have helped us establish that learning by teaching pro- vides better opportunities for learning even among 5th grade students. Further, metacogni- tive prompts while learning enable students to develop higher level learning strategies that they retain even when the feedback prompts are removed. In the future, we will fur- ther refine our data mining techniques and algorithms to set up a framework for designing adaptive learning environments, where the learning support and feedback provided to students will be guided by the information derived from the student behavior models. We will also work further towards developing a more quantitative way of analyzing and comparing the models (these may involve using distance metrics and more comprehen- sive cognitive learning patterns). Figure 6: HMM Patterns showing ICS, LBT and SRL behaviors in the main and transfer study. Acknowledgements. This work has been supported by a Dept. of Education IES grant #R305H060089 and NSF REESE Award #0633856. The authors wish to acknowledge the help provided by John Wagster and Rod Roscoe in the data collection, data analysis, and data interpretation tasks."

Acerca de este recurso...

Visitas 159

Guardar en Mi espacio personal
Enviar enlace

Categorías:

Educational Data Mining (EDM)

Etiquetas:

0 comentarios

¿Quieres comentar? Regístrate o inicia sesión

¿Cómo puedes configurar o deshabilitar tus cookies?

Mining Student Behavior Models in Learning-by-Teaching Environments

InProceedings