Identifying High-Level Student Behavior Using Sequence-based Motif Discovery

InProceedings

Proceedings of Educational Data Mining, 2010

2010 2010

We describe a data mining technique for the discovery of student behavior patterns while using a tutoring system. Student actions are logged during tutor sessions. The actions are categorized, binned and symbolized. The resulting symbols are arranged sequentially, and examined by a motif discovery algorithm to detect repetitive patterns, or motifs, that describe frequent tutor events. These motifs are examined and categorized as student behaviors. The categorized motifs can be used in real-time detection of student behaviors in the tutor system.

"1.2 sec, (j) 1.2 to 2.9 sec, (k) greater than 2.9 sec. First, there are two categorical bins, skip and solve on first attempt. These are each determined from an indicator in the log data for that problem. Skipping a problem implies only that students never clicked on a correct answer; they could have worked on the problem and then given up, or immediately skipped to the next problem with only a quick look. Solved on first attempt indicates correctly solving the problem. If neither of the first two bins are indicated in the logs, then the secOther metric measures the mean time for all attempts after the first. The divisions of 1.2 sec and 2.9 sec for the latter three bins were obtained using the mean and one standard deviation above the mean for all tutor usage; (i) less than 1.2 seconds would indicate guessing, (j) would indicate normal attempts, and (k) would indicate a long time between attempts. numIncorrect â€“ (o, p, q) - Each problem has four or five possible answer choices, that we divide into three groups: (o) zero incorrect attempts, indicates either solved on first attempt, skipped problem, or last hint solves problem (defined by the other metrics); (p) indicates choosing the correct answer in the second or third attempt, and (q) obtaining the answer by default in a four answer problem or possibly guessing when there is five answer problem. Some of the bins have dependencies that effect motif discovery and analysis. For example, last hint solves (c) precludes solved on first (h) and skipped (g); solved on first (h) requires zero incorrect attempts (o). In addition, by binning skip (g) in the secOther group, the timing of incorrect attempts is lost when the problem is skipped. 479 student strings representing 3762 total problems were constructed and concatenated into a 15048 character input string for motif discovery. The first 160 characters of the string (40 words/problems, separated at problems for clarity) are: afkq bfho cekp bfho aeho cekq bfiq bfkq bfip aeho aeho aeip aeho aeip afip cekp aeho afip cfho aeho cfho bfkp bekp bekp aeho aekp beho bekp aeho cfho bfkq aekp ceho cfkp bfjp aeip bfkp aeho afip afho In the first problem, afkq indicates no hints used, greater than 30 seconds to first attempt, over 2.9 mean seconds in other attempts, and most or all choices were made to find the solution. In the next problem, coded bfho, the student asks for one or two hints, greater than 30 seconds to first attempt, then solves the problem on first attempt. The sequence based motif discovery algorithm searches the input string in step two of the process. A detailed description follows. 5 Word-skipping sequence-based motif discovery algorithm. Our sequence-based motif discovery algorithm is a modification of the PROJECTION algorithm [10]. It is similar to the Chiu et al. [5] projection algorithm, except that our sliding window moves per word rather than per character. The PROJECTION algorithm is an efficient way to find planted strings in a long sequence of characters, and our modification to the algorithm allows us to apply it in a multivariate fashion where each character of the word represents a variable. In order to illustrate the algorithm, we first present an example, and then present a formal description of the algorithm. 5.1 Example word skipping projection. Take as input a string words, four characters each, with no separation. Construct a matrix S as a 40 character sliding window that slides 4 characters (1 word) per row. The below 6 x 40 character matrix (Figure 1a) represents the first 64 characters or 16 words. Randomly select 10 columns in the S matrix as the projection highlighted below. Project the selected columns to a new matrix (Figure 1b). If there is a string match between any pair of rows, then add a collision to the collision matrix (Figure 1c). The highlighted rows (3 and 4) match and thus the 3rd row and the 4th column has a collision. 5.2 Random projection multivariate motif discovery algorithm. Function Name: wordMotif Inputs: word sequence (t), motif size in characters (n), word size (w), projection length (p), number of iterations (m), max character distance (d), number of motifs to find (c). Outputs: c motif lists with start index and strings of length n for each motif example. Figure 1. Steps of word skipping projection. (a) the S matrix of 40 character substrings highlighted with a random projection (b) the projected matrix with 2 matching substrings (c) the resulting collision added to the collision matrix. wordMotif(t,n,w,p,m,d,c). 1. Construct S matrix of size (t/w * n) by sliding an n sized window w characters at a time. 2. Create collision matrix using project(S,p,m) 3. Compare pair of examples (A,B) with highest collision value 4. If (A,B) do not overlap and are within a hamming distance of d characters, then test them against other members of the S matrix, adding all members that are within hamming distance d to the motif set, and removing the collision values from the collision matrix. 5. Repeat 3 and 4 until c motifs are found or the collision matrix is exhausted. 6. Return the lists of up to c motifs project(S,p,m). 1. let k be the number of rows in S 2. construct an empty k x k collision matrix 3. repeat m times a. make a k x p matrix based on a random mask of p columns of the S matrix b. add a collision for each pair whose string is equivalent. 4. Return collision matrix The algorithm outputs a matrix of indices into the string, with a column for each discovered motif. For this study we found thirty motifs. The first two indices of the first 10 motifs are shown in Table 1. Table 1. The first 2 indices of the first 10 motifs. Each value in the matrix is an index to the first character of the motif. We can index into the original data to determine the symbols in each of these motifs. The first instance of the motif (in row 1) is representative of the pattern, and the other instances (in row 2, etc.) will be identical or within 10 characters of both the first and the second instance. Table 2 shows the first instance of the first ten motifs. Table 2. Ten of the motifs The index of the first character is followed by the 40 character motif with a separation each word to see problem characteristics. 6 Discovered motifs. In the third step, we examine and analyze the discovered motifs to determine the high level behavior that each motif discovered. We group motifs with similar behaviors. For this study we used the algorithm to detect thirty motifs. Chiu et al. [5] suggest a need to eliminate degenerate motifs, which are motifs that have no informational content, such as a sequence of repeated characters. In the case of the tutor data, repeated words are not degenerate because they inform us that the student is repeating a particular behavior. Repetition of undesired behavior is an important feature to capture, so we do not remove such motifs as degenerate. In the 30 discovered motifs, there were a number of repeated motifs. This is because a number of motifs were essentially straight, i.e. repetitions of a motif word, so there are cases of motifs that have essentially the same structure; these would overlap with each other, so they are considered distinct patterns. By grouping these exact match motifs, and also by comparing the different motifs by eye, we grouped the 30 motifs into 7 distinct meaning groups coded g, f, F, k, r, z, n: Game-like (g). adgo (10), adip (10), or adiq (10) â€“ Student is not reading the problems and either skipping or making quick guesses. Frustration (guess) (f). adiq (1) adgo (9) â€“ Guessing to find solution, then skipping the next 9 problems. This could be an indication of frustration. Frustration (hints) (F). cdgo (1) adgo (9) â€“ Using the hints to find solution, then skipping the next 9 problems. This also could indicate frustration, and was grouped together with the previous motif. Not challenged (k). a[ef]ho (10) â€“ Solving the problem on first attempt, not using hints, not guessing. This student is using the tutor appropriately, but not being challenged. Too difficult (r). ceho (10) - student is taking time to read the problem, then using hints to find the answer. These students could be working but the material is too difficult for them to solve the problem themselves. Skipping (z). adgo (5) aeho (5) â€“ Student skips 5 problems, then solves 5 normally on first attempt. Taking time to read all problems, then answering or skipping depending on whether the answer is known. On-task(n) aeiq aeho aeho aekp aeho aeiq aeho aeip aeho aeip - This is the most complex motif found and seems to indicate â€œon task learningâ€ tutor usage; the student is always reading the problem and making a good first attempt, with a mixture of solving on first attempt (aeho), solving after some attempts (aekp, aeip) and guessing (aeiq). The student is not using the hints nor skipping. With these groups coded as a single character, we can look a the progress of one student by converting the string of words to a string of motif groups for quicker interpretation. For example, in the too difficult (r) grouping, these problems show the student is taking 5 to 30 seconds before taking any action. However, if the student initially chose the wrong answer, subsequent attempts were quick guesses or gaming hints. In contrast, in the game-like (g) grouping the student was not reading the problem, simply skipping, or making quick guesses. In the frustration (guess) (f) grouping the student was skipping after an attempt. This could just be an indication of skipping problems, or possibly the error in the first attempt triggering a frustration response to skip the next 9 problems. And the on-task(n) grouping appears to be a mixture of responses for on task tutor use. 7 Evaluating student interaction. The final step in the process is to apply the 7 meaning groups to individual students. To do this we can convert each student string into a student motif string. This is done by scanning the student string and replacing each problem string with a dash (-) until a motif is detected; at this point, the problem string is replaced with the meaning group code associated with the motif. The meaning group code is only placed at the final problem of each motif (the last four characters). The conversion is shown below for two students. Using the meaning group code, we generated the student number followed by patterns for each student representing their tutor interaction. We examine two students below. For student 4362 the not challenged (k) motif indicates solving of problems on the first attempt without using hints. (The (-) patterns are those which did not match any motif.) The behavior is a lagging indicator, representing the current and previous 9 problem. Student 4363 started by gaming the tutor, game-like (g) either skipping or guessing quickly to find answers. This is followed by not challenged (k), a string problems solved on first attempt followed by more tutor gaming. The too difficult (r) string of râ€™s are where the student began using the hint facility but in a manner to find answers. After about 20 of these too difficult problems he/she returned to skipping or guessing. This conversion process illustrates how our discovered motifs can be used in a real-time application. A tutor can detect these patterns and respond based on the meaning group in order to have a more personalized interaction with the student. With student 4362 a tutor could increase problem difficulty. For student 4363, a tutor could intervene at problem 15 where gaming was detected, perhaps by introducing the hint system, or directing the student to a teaching video. 8 Discussion and future work. This paper describes a novel method for determining student behavior without linking the behavior to performance outcomes. We have shown a case study where a number of meaningful behaviors were discovered using a combination of hand chosen features and automatic pattern discovery. The features that have been found can be used to detect student behaviors so that the tutor can react in real time. However, these outcome of our study needs to be verified in future work. A number of future directions are discussed below. We will validate that the discovered motifs accurately represent student behavior by implementing them in the tutor. Upon motif detection, the corresponding meaning group will be verified by a person in real time. The student or a teacher observer will verify the meaning group by responding to a query during the tutoring session, e.g.. â€œAre you skipping problems because...â€ These responses will be compared with the predicted behaviors for validation. We will study automatic data categorization as modifications to our process. The data binning is the most user intensive part of this process. Finding methods to automate it would allow for broader use of these methods. We will compare a number of different motif window sizes in order to understand the time scale of problem behavior patterns. The value used in this paper, ten problems, is sufficient to describe behavior, and it yielded a manageable number of informative motifs. However, other window levels may yield motifs of different quality and quantity."

About this resource...

Visits 153

Save to My personal space
Send link

Categories:

Educational Data Mining (EDM)

Tags:

0 comments

Do you want to comment? Sign up or Sign in

¿Cómo puedes configurar o deshabilitar tus cookies?

Identifying High-Level Student Behavior Using Sequence-based Motif Discovery

InProceedings