Conditional Subspace Clustering of Skill Mastery: Identifying Skills that Separate Students

InProceedings

Proceedings of Educational Data Mining, 2009

2009 2009

In educational research, a fundamental goal is identifying which skills stu- dents have mastered, which skills they have not, and which skills they are in the process of mastering. As the number of examinees, items, and skills increases, the estimation of even simple cognitive diagnosis models becomes difficult. We adopt a faster, simpler approach: cluster a capability matrix estimating each studentâ€™s individ- ual skill knowledge to generate skill set profile clusters of students. We complement this approach with the introduction of an automatic subspace clustering method that first identifies skills on which students are well-separated prior to clustering smaller subspaces. This method also allows teachers to dictate the size and separation of the clusters, if need be, for practical reasons. We demonstrate the feasibility and scalabil- ity of our method on several simulated datasets and illustrate the difficulties inherent in real data using a subset of online mathematics tutor data.

"1. Let Î³N = Î»i âˆ’ Î» j be the total descent gradient from a peak (Bin i) to a valley (Bin j). Let Î³P = Î»i âˆ’ Î» j be the total ascent gradient from a valley (Bin i) to a peak (Bin j). Let Lm be the location of the mode preceding the current valley (scanâ€™s startpoint). Let Lv be the location of the lowest height of the current valley. Initialize Lm = Lv = Bin 1. 1) Scan Î³i,i+1 until Î³i,i+1 < 0. If no such gradient exists, there are no remaining valleys. 2) Else, scan Î³i,i+1 until Î³i,i+1 â‰¥ 0 (end of valley) or out of bins; compute Î³N. If |Î³N | > Ï„d, have found a â€œsignificantâ€ descent. Set Lv = Bin i + 1. 3) Scan Î³i,i+1 until Î³i,i+1 < 0 (end of peak) or out of bins; compute Î³P. If |Î³P| > Ï„d, we have found a â€œsignificantâ€ ascent. Find valley width w. If w > Ï„w, significant valley; store mode locations. Else, do not store. In either case, set Lm = Lv = Bin i + 1. Scan for next valley (return to 1). Else, have not found significant ascent. Scan Î³i,i+1 until Î³i,i+1 â‰¥ 0 (end of next valley) or out of bins. If Î»i+1 < Î»Lv, current valley is lower than valley at Lv. Set Lv = Bin i + 1. (return to 3) Else, current valley is higher than valley at Lv; have â€œhiccup modeâ€. (return to 3) Else, have not found a significant descent. Scan Î³i,i+1 until Î³i,i+1 < 0 (end of next peak) or out of bins. If Î»i+1 > Î»Lm , current peak is higher than peak at Lm. Set Lm = Bin i + 1. Scan for next valley (return to 1). Else, current peak is lower than peak at Lm; have â€œhiccup modeâ€. (return to 2) Figure 2: Marginal Skill Distributions: Illustrative Example, Three Assistment Skills. The spirit of our algorithm is similar to mode-hunting (e.g. [12]) excepting that we only want to identify modes that are separated by a valley of substantial depth and width. In a sense, we are â€œvalley-huntingâ€. For example, if while searching for a descent of substantial depth we find a â€œhiccup modeâ€ where the marginal distribution slightly increases and then continues to decrease, the algorithm does not view that small valley to be important. (A â€œhiccup modeâ€ might similarly be found when searching for a substantial ascent.) Figure 2a contains an example marginal distribution of Skill k, a histogram with bin width = 0.10. For example, say a teacher will only adapt classroom strategies for groups of students who are at least 10% of the class and whose capability values are separated by at least 20%. Given Ï„d = 0.1, Ï„w = 0.2, we start at Bin 1 and immediately find a descent of 0.14 (1.5 Â· 0.10 âˆ’ 0.1 Â· 0.10). We know that there is at least one bin in the preceding mode with at least 10% of the students (our depth threshold). We continue scanning to find a total ascent of 0.135 (1.45 Â· 0.10 âˆ’ 0.1 Â· 0.10) at Bin 4, evidence that the next mode also has at least 10% of the students. As both gradients exceed Ï„d, we check that the valley is wide enough by measuring the distance between the two modes (0.0, 0.3). Since 0.3 > 0.2 = Ï„w, both modes are separated by at least 20% capability, and we have identified a â€œsignificant valleyâ€. Continuing to scan, we find another descent and valley at Bin 6. In this case, the descent is not large enough yet to indicate a well-separated group (Bin 7 is a â€œhiccup modeâ€). A large enough descent is eventually found between Bin 4 and Bin 8, followed by a significant ascent. The next significant valley is then from Bin 4 to Bin 10. We partition the skill at Bin 2 (0.15) and Bin 8 (0.75) to create three groups of students of size at least 10% of the class separated by at least 20% capability on Skill k. If our thresholds were Ï„d = .045, Ï„w = 0.10, four groups would have been found (cutpoints: 0.15, 0.55, 0.75). Figure 2 also includes the three Assistment skill marginal distributions. While Unit Con- version (Figure 2d) has three well-separated peaks, given reasonable depth/size thresholds, our algorithm would not partition this skill since two non-zero bin counts are very small (i.e. modes of trivial mass). We also would likely not partition the skewed Multiplication distribution. Given Ï„d=0.1, Ï„w=0.2, we do partition Evaluate Functions at 0.15, 0.75 for three groups of students and cluster the three subsequent two-dimensional subspaces. Fig- ure 3 shows the methodsâ€™ respective results. There is less cross-plane clustering in HC and k-means without partitioning Unit Conversion (Figures 3a,b). MBC again chose 14 total with similar results; however, the subspace clustering (including both finding the partitions and clustering the subspaces) took â‰ˆ 6 seconds (vs. 21) for computational savings of 71%. Figure 3: Cluster Assignments: a) HC, Complete G=3 Â· 22; b) K-means G=3 Â· 22; c) MBC G=14. 4 Recovering the True Skill Set Profiles. In this section, we simulate data from the DINA model, a common educational research model, to compare the methodsâ€™ ability to recover the studentsâ€™ true skill set profiles. The deterministic inputs, noisy â€œandâ€ gate model (DINA) is a conjunctive cognitive diagnosis model used to estimate student skill knowledge [10]. The DINA model item response form is P(yi j = 1 | Î·i j, s j, g j)= (1 âˆ’ s j)Î·i jg1âˆ’Î·i jj where Î±ik = I{Student i has skill k} and Î·i j =âˆK k=1 Î± q jk ik indicates if student i has all skills needed for item j; s j= P(yi j=0 | Î·i j=1) is the slip parameter; and g j= P(yi j=1 | Î·i j=0) is the guess parameter. If student i is missing any of the required skills for item j, P(yi j = 1) decreases due to the conjunctive assumption. Prior to simulating the yi j, we fix the skills to be of equal medium difficulty with an inter-skill correlation of either 0 or 0.25 and generate true skill set profiles Ci for each student. In our work thus far, only a perfect inter-skill correlation has a non-negligible effect on the results. These parameter choices evenly spread students among the 2K natural skill set profiles. We randomly draw our slip and guess parameters (s j âˆ¼ Unif(0,0.30); g j âˆ¼ Unif(0,0.15)). Given the true skill set profiles and slip/guess parameters, we generate the student response matrix Y . Then, using a fixed Q matrix, we calculate and cluster the corresponding B matrix. For the first three methods, no partitioning is done (HC, k-means: G = 2K; MBC: searches from 1 to G > 2K). In conditional subspace clustering, we initially use Ï„d= 0.1, Ï„w= 0.2 and then cluster the resulting subspaces (if any). To gauge performance, we calculate their agreement to the true profiles using the Adjusted Rand Index (ARI), a common measure of agreement between two partitions [9]. Under random partitioning, E[ARI] = 0, and the maximum value is one. Larger values indicate better agreement. Table 1 presents selected simulations for K = 3, 7, 10 for varying J, N. In the Cond (MBC) column, the first ARI corresponds to the partitioning alone, the second to the clustering of the partitioned subspaces (with MBC). We also vary the Q-matrix design to include only single skill items, only multiple skill items, or both. In addition, the Q-matrix was balanced (bal) or unbalanced (unbal). If balanced, all skills and skill combinations occur the same number of times. Unbalanced refers to uneven representation of or missing skills (miss). Table 1: Comparing Clustering Methods with the True Generating Skill Set Profiles via ARIs. Excepting the multiple unbalanced design, the subspace algorithm selected one or more skills for partitioning (in some cases, all skills were correctly selected). In almost all sim- ulations, MBC was comparable to or better than HC and k-means for true skill set profile recovery. The partitioning method coupled with using MBC on the reduced subspaces gave comparable or better results in all cases except the balanced single and multiple skill design. In addition, subspace partitioning/MBC was always faster than MBC alone. Table 2: Comparison of Depth, Width Thresholds. In addition, for the fourth K= 3, J= 30 Q matrix design, we vary the depth and width thresholds. Smaller values of Ï„d, Ï„w will find narrower, shallower separations; in addition, smaller isolated clusters will be found. In this particular example, we found that as we decreased the depth threshold, more skills were (correctly) selected, and the performance of the partitioning by itself improved. While the parameters are designed to be user-specified, we are currently exploring their behavior in order to make good default suggestions. 5 Thirteen Skill Assistment Example. Finally, we briefly look at a higher dimensional Assistment example with K=13 skills, N=344 students, and J=135 items. This data set included multiple skill items and a large amount of missing response data. HC and k-means are not appropriate choices; finding 213=8192 clusters is unreasonable (without, say, allowing for empty clusters as in [1]); MBC will largely depend on choosing an appropriate search range. The conditional sub- space clustering algorithm, however, searches the space for obvious separation and parti- tions 9 of the 13 skills for a total of 221 subspaces (1 sec). All subspaces contained â‰¤ 13 students and so could likely be used alone or as subspaces for further clustering if needed. 6 Conclusions. We presented a conditional subspace clustering algorithm for use with the capability matrix (or similar skill knowledge estimate). The method selects skills that separate students well and reduces dimensionality for subsequent clustering. Our work so far shows that for most Q-matrix designs, the recovery of true skill set profiles is comparable or better than other clustering methods while also including skill selection. Since the true profiles in the Assist- ment examples are unknown, we cannot judge their recovery. However, visual inspection indicates that the partitions and skill selection seem sensible. To our knowledge, work in this area has not adequately addressed the need to analyze high-dimensional Q-matrices. The approach presented, while allowing for real time estimation of student skill set profiles, can handle large numbers of skills as well as incorporate practical user specifications."

About this resource...

Visits 119

Save to My personal space
Send link

Categories:

Educational Data Mining (EDM)

Tags:

0 comments

Do you want to comment? Sign up or Sign in

¿Cómo puedes configurar o deshabilitar tus cookies?

Conditional Subspace Clustering of Skill Mastery: Identifying Skills that Separate Students

InProceedings