A Response Time Model for Bottom-Out Hints as Worked Examples

InProceedings

Benjamin Shih

Kenneth R. Koedinger

Richard Scheines

Proceedings of Educational Data Mining, 2008

2008 2008

Students can use an educational systemâ€™s help in unexpected ways. For example, they may bypass abstract hints in search of a concrete solution. This behavior has traditionally been labeled as a form of gaming or help abuse. We propose that some examples of this behavior are not abusive and that bottom-out hints can act as worked examples. We create a model for distinguishing good student use of bottom-out hints from bad student use of bottom-out hints by means of logged response times. We show that this model not only predicts learning, but captures behaviors related to self-explanation.

"1. Notice that the reflection time for the second transaction is part of the logged time for the third transaction. Under the TER model, the reflection time for one transaction is indistinguishable from the thinking and entry time associated with the next transaction. Nevertheless, we need an estimate for the Think and Reflect times to understand student learning from bottom-out hints. The full problem, including external factors, is illustrated in Table 2, which shows a series of student transactions on a pair of problem steps, along with hypothetical, unobserved student cognition. Entries in italics are observed in the log while those in normal face are unobserved, and ellipses represent data not relevant to the example. The time the stu- dent spends thinking and reflecting on the bottom-out hint is about 6 seconds, but the only observed durations are 0.347, 15.152, and 4.944. In a case like this, the log dataâ€™s ob- served response times includes a mixture of Think and Reflect times across multiple steps. Table 3: TER Model With Estimators. Unfortunately, while the reflection time is important for properly estimating HINTt, it is categorized incorrectly. The reflection time for transaction t is actually part of the logged time for transaction (t+1). Teasing those times apart requires estimating the studentâ€™s time not spent on the hint. The first piece of our model separates out two types of bottom-out hint cognition: Think and Reflect. Thinking is defined as all hint cognition before entering the answer; reflecting is all hint cognition after entering the answer. Let Think time be denoted Kt and Reflect time be denoted Rt. We define HINTt = Kt + Rt. The task then reduces to estimating Kt and Rt. As shown earlier, this can be difficult for an arbitrary transaction. However, we focus only on bottom-out hints. Table 3 provides an example of how bottom-out hints differ from other transactions. Note the absence of a Reflect time Rtâˆ’1 in the bottom-out case. Except for time spent on answer entry and time spent off-task, the full time between receiving the hint and entering the answer is Kt. A similar, but slightly more complicated result applies to Rt. For now, assume off-task time is zero - it will be properly addressed later. Let the answer entry time be denoted Et. Let the total time for a transaction be Tt. Then the equation for HINTt becomes FORMULA_1. where Tt and Tt+1 are observed in the log data. The first term consists of replacing Kt with measured and unmeasured times from before the answer is submitted. The second term consists of times from after the answer is submitted. If we have an estimate for Et, we can now estimate Kt. Similarly, if we have an estimate for Kt+1 and Et+1, we can estimate Rt. Constructing reliable estimates for any of the above values is impossible on a per transac- tion basis. However, if we aggregate across all transactions performed by a given student, then the estimators become more reasonable. There are two other important points regard- ing the estimators we will use. First, response times, because of their open-ended nature, are extremely prone to outliers. For example, the longest recorded transaction is over 25 minutes in length. Thus, we will require our estimators be robust. Second, some students have relatively few (â‰ˆ 10) bottom-out hint transactions that will fit our eventual criteria. Thus, our estimators must converge quickly. Now we need some new notation. We will be using the s subscript, where s represents a student. We will also use the EË†s notation for estimators and the m(Et) notation for medians. You can think of m(Et) as approximating the mean, but we will always be using the median because of outliers. Es, the per student estimator, will represent some measure of the â€usualâ€ Et for a student s. Also let A be the set of all transactions and As be the set of all transactions for a given student. Let A1s be the set of all correct answer transactions by a student s where the transaction immediately follows a bottom-out hint. Similarly, let A2s be the set of all transactions that follow a transaction t âˆˆ A1s. For convenience, we will let T 1s = mtâˆˆA1s(Tt) be the median time for transactions t âˆˆ A1s and T 2s = mtâˆˆA1s(Tt+1) be the median time for transactions t âˆˆ A2s. These two types of transactions are generalizations of the last two transactions shown in Table 3. This gives an equation for our estimator Ë†HINTs, Ë†HINTs = (T 1s âˆ’ Es) + (T 2s âˆ’ (K2s + Es)) (4) Here, K2s = mtâˆˆA2s(Kt) is the thinking time that takes place for transaction t âˆˆ A2s. Consider EË†s, the median time for student s to enter an answer. It always takes time to type an answer, but the time required is consistently short. If we assume that the variance is small, then EË†s â‰ˆ mintâˆˆAs(Et). That is, because the variance is small, Et can be treated as a constant. We use the minimum rather than a more common measure, like the mean, because we cannot directly observe Et. Instead, note that if Kt â‰ˆ 0, then the total time spent on a post-hint transaction is approximately Et. Thus, the minimum time student s spends on an answer step is a good approximation of mintâˆˆAs(Et). In practice, the observed EË†s is about 1 second. With EË†s, we can now estimate Kt for t âˆˆ A1s. To isolate the reflection time Rs, we need an approximation for K2s , the thinking time for transactions t âˆˆ A2s. Unfortunately, K2s is difficult to estimate. Instead, we will estimate a value related to K2s . The key observation is, if a student has already thought through an answer on their own, without using any tutor help, they presumably engage in very little reflection after they enter their solution. To put it mathematically, let Ns be the set of transactions for student s where they do not use a bottom-out hint. We assume that Rt â‰ˆ 0, âˆ€t âˆˆ Ns. We can now use the following estimator to isolate Rs, FORMULA_2. where the change from line 6 to line 7 derives from the assumption Rt â‰ˆ 0, âˆ€t âˆˆ Ns. This is the last estimator we require: Rs is approximately m(Tt âˆ’m(Tv)(uâˆˆNs,v=u+1))tâˆˆAs . Table 4: Indicator Correlations in the Control Condition. That is, we use the median time for the first transaction on a step where the prior step was completed without worked examples. This approach avoids directly estimating K2s and estimates the sum (K2s + Es) instead. There is still the problem of off-task time. We have so far assumed that off-task time is approximately zero. We will continue to make that assumption. While students engage in long periods of off-task behavior, we assume that for most transactions, students are on-task. That implies that transactions with off-task behaviors are rare, albeit potentially of long duration. Since we use medians, we eliminate these outliers from consideration entirely, and thus continue to assume that on any given transaction, off-task time is zero. A subtle point is that the model will not fit well for end-of-problem transactions. At the end of a problem there is a â€doneâ€ step, where the student has to decide to hit â€doneâ€. Thus, the model no longer accurately represents the studentâ€™s cognitive process. These transactions could be valuable to an extended version of the model, but for this study, all end-of-problem transactions will be dropped. 5 Results. We first run the model for students in the control condition. These students were not required to do any formal reasoning steps. The goal is to predict the adjusted pre-post gain, max( (postâˆ’pre) (1âˆ’pre) , (postâˆ’pre) (pre) ). We will not use the usual Z-scores because the pre-test suf- fered from a floor effect and thus the pre-test scores are quite non-normal (Shapiro-Wilks: p < 0.005). Two students were removed from the population for having fewer than 5 bottom-out hint requests, bringing the population down to 18. The results are shown in Table 4. The first result of interest is that none of the indicators have statistically significant corre- lations with the pre-test. This suggests that they measure some state or trait of the students that is not well captured by the pre-test. The second result of interest is that all three indicators correlate strongly with both the post-test and learning gain. Notably, HINTs, our main indicator, has a correlation of about 0.5 with both the post-test and the learning gain. To the extent that HINTs does distinguish between â€goodâ€ versus â€badâ€ bottom-out hint behaviors, this correlation suggests that the two types of behavior should indeed be distinguished. Itâ€™s possible that these indicators might only be achieving correlations comparable to time- on-task or average transaction time. As Table 5 shows, this is clearly not the case. Table 5: Time-on-Task Correlations in the Control Condition. Table 6: Correlations in the Experimental Condition. All three hint time indicators out-perform the traditional time-on-task measures. Nevertheless, these results still do not show whether the indicator HINTs is actually mea- suring what it purports to measure: self-explanation on worked examples. For that, we use the experimental condition of the data. In the experimental condition, students are asked to justify their correct solutions by providing the associated theorem. This changes the basic pattern of transactions we are interested in from HINT-GUESS-GUESS to HINT- GUESS-JUSTIFY-GUESS. We can now directly measure Rs using the time spent on the new JUSTIFY steps. Rs is now the median time students spend on a correct justification step after a bottom-out hint, subtracting the minimum time they ever spend on correct jus- tifications. We use the minimum for reasons analogous to those of EË†s - we only want to subtract time spent entering the reason. In this condition, there were sufficient observations for all 19 students. The resulting correlations are shown in Table 6. There is almost no correlation between our indicators and the pre-test score, again showing that our indicators are detecting something not effectively measured by the pre-test. Also, the correlations with the post-test and learning gain are high for both Rs and HINTs. While Rs by itself has a statistically significant correlation at p < 0.10, Ks and Rs combined demonstrate a statistically significant correlation at p < 0.05. This suggests that while some students think about a bottom-out hint before entering the answer and some students think about the hint only after entering the answer, for all students, regardless of style, spending time thinking about bottom-out hints is beneficial to learning. The corollary is that at least some bottom-out hints are proving beneficial to learning. Thus far, we have shown that the indicator HINTs is robust enough for strong correlations with learning gain despite being measured in two different ways across two separate condi- tions. The first set of results demonstrated that HINTs can be measured without any direct observation of reasoning steps. The second set of results showed that direct observation of HINTs was similarly effective. Our data, however, allows us access to two other interesting questions. First, does prompting students to explain their reasoning change their bottom- out hint behavior? Second, do changes in this behavior correlate with learning gain? Table 7: Changes in Behavior in the Experimental Condition. To answer both questions, we look at the indicators trained on only the first 20% of each studentâ€™s transactions. For this, we use only the experimental condition because, when 80% of the data is removed, the control condition has too few remaining students and too few observations. Even in the experimental condition, only 15 remaining students still have more than 5 bottom-out hint requests that meet our criteria. The results are shown in Table 7, with âˆ†HINTs representing the difference between HINTs trained on the first 20% of the data and HINTs trained on the full data. To answer the first question, the change in HINTs is not statistically different from zero. The prompting does not seem to encourage longer response times in the presence of bottom- out hints, so this mechanism does not explain the experimental results of Aleven et. al.â€™s study[2]. However, some of the students did change their behaviors. As shown in Table 7, students who increased their HINTs times demonstrated higher learning gain. The evidence is substantial that HINTs measures an important aspect of student reasoning. 6 Conclusions and Future Work. In this study, we presented evidence that some bottom-out hint use can be good for learning. The correlations between our indicators and pre-post learning gain represent one form of evidence; the correlations between changes in our indicators and pre-post learning gain represent another. Both sets of results show that thinking about bottom-out hints predicts learning. However, extending our results to practical use requires additional work. Our indicators provide estimates for student thinking about bottom-out hints. However, these estimates are aggregated across transactions, providing a student level indicator. While this is useful for categorizing students and offering them individualized help, it does not provide the level of granularity required to choose specific moments for tutor interven- tion. To achieve that level of granularity, a better distributional understanding of student response times would be helpful, as would an indicator capable of distinguishing between students seeking worked examples versus engaging in gaming. Exploring how the distri- bution of response times differs between high learning bottom-out hint students and low learning bottom-out hint students would go a long way to solving both problems. That issue aside, our indicators for student self-explanation time have proven remarkably effective. They not only predict learning gain, they do so better than traditional time-on- task measures, they are uncorrelated with pre-test scores, and changes in our indicators over time also predict learning gain. These indicators achieve this without restrictive assump- tions about domain or system design, allowing them to be adapted to other educational systems in other domains. Whether the results transfer outside of geometry or to other systems remains to be seen, but they have so far been robust. The inclusion of two conditions, one without justification steps and one with justification steps, allowed us to show that the indicators do measure something related to reasoning or self-explanation. We estimated the indicators for the two conditions in different ways, yet both times, the results were significant. This provides a substantial degree of validity. However, one direction for future work is to show that the indicators correlate with other measures of self-explanation or worked example cognition. One useful study would be to compare these indicators with human estimates of student self-explanation."

About this resource...

Visits 180

Save to My personal space
Send link

Categories:

Educational Data Mining (EDM)

Tags:

0 comments

Do you want to comment? Sign up or Sign in

¿Cómo puedes configurar o deshabilitar tus cookies?

A Response Time Model for Bottom-Out Hints as Worked Examples

InProceedings