formularioHidden
formularioRDF
Login

Sign up

 

Conditional Subspace Clustering of Skill Mastery: Identifying Skills that Separate Students

InProceedings

In educational research, a fundamental goal is identifying which skills stu- dents have mastered, which skills they have not, and which skills they are in the process of mastering. As the number of examinees, items, and skills increases, the estimation of even simple cognitive diagnosis models becomes difficult. We adopt a faster, simpler approach: cluster a capability matrix estimating each student’s individ- ual skill knowledge to generate skill set profile clusters of students. We complement this approach with the introduction of an automatic subspace clustering method that first identifies skills on which students are well-separated prior to clustering smaller subspaces. This method also allows teachers to dictate the size and separation of the clusters, if need be, for practical reasons. We demonstrate the feasibility and scalabil- ity of our method on several simulated datasets and illustrate the difficulties inherent in real data using a subset of online mathematics tutor data.

"1. Let γN = λi − λ j be the total descent gradient from a peak (Bin i) to a valley (Bin j). Let γP = λi − λ j be the total ascent gradient from a valley (Bin i) to a peak (Bin j). Let Lm be the location of the mode preceding the current valley (scan’s startpoint). Let Lv be the location of the lowest height of the current valley. Initialize Lm = Lv = Bin 1. 1) Scan γi,i+1 until γi,i+1 < 0. If no such gradient exists, there are no remaining valleys. 2) Else, scan γi,i+1 until γi,i+1 ≥ 0 (end of valley) or out of bins; compute γN. If |γN | > τd, have found a “significant” descent. Set Lv = Bin i + 1. 3) Scan γi,i+1 until γi,i+1 < 0 (end of peak) or out of bins; compute γP. If |γP| > τd, we have found a “significant” ascent. Find valley width w. If w > τw, significant valley; store mode locations. Else, do not store. In either case, set Lm = Lv = Bin i + 1. Scan for next valley (return to 1). Else, have not found significant ascent. Scan γi,i+1 until γi,i+1 ≥ 0 (end of next valley) or out of bins. If λi+1 < λLv, current valley is lower than valley at Lv. Set Lv = Bin i + 1. (return to 3) Else, current valley is higher than valley at Lv; have “hiccup mode”. (return to 3) Else, have not found a significant descent. Scan γi,i+1 until γi,i+1 < 0 (end of next peak) or out of bins. If λi+1 > λLm , current peak is higher than peak at Lm. Set Lm = Bin i + 1. Scan for next valley (return to 1). Else, current peak is lower than peak at Lm; have “hiccup mode”. (return to 2) Figure 2: Marginal Skill Distributions: Illustrative Example, Three Assistment Skills. The spirit of our algorithm is similar to mode-hunting (e.g. [12]) excepting that we only want to identify modes that are separated by a valley of substantial depth and width. In a sense, we are “valley-hunting”. For example, if while searching for a descent of substantial depth we find a “hiccup mode” where the marginal distribution slightly increases and then continues to decrease, the algorithm does not view that small valley to be important. (A “hiccup mode” might similarly be found when searching for a substantial ascent.) Figure 2a contains an example marginal distribution of Skill k, a histogram with bin width = 0.10. For example, say a teacher will only adapt classroom strategies for groups of students who are at least 10% of the class and whose capability values are separated by at least 20%. Given τd = 0.1, τw = 0.2, we start at Bin 1 and immediately find a descent of 0.14 (1.5 · 0.10 − 0.1 · 0.10). We know that there is at least one bin in the preceding mode with at least 10% of the students (our depth threshold). We continue scanning to find a total ascent of 0.135 (1.45 · 0.10 − 0.1 · 0.10) at Bin 4, evidence that the next mode also has at least 10% of the students. As both gradients exceed τd, we check that the valley is wide enough by measuring the distance between the two modes (0.0, 0.3). Since 0.3 > 0.2 = τw, both modes are separated by at least 20% capability, and we have identified a “significant valley”. Continuing to scan, we find another descent and valley at Bin 6. In this case, the descent is not large enough yet to indicate a well-separated group (Bin 7 is a “hiccup mode”). A large enough descent is eventually found between Bin 4 and Bin 8, followed by a significant ascent. The next significant valley is then from Bin 4 to Bin 10. We partition the skill at Bin 2 (0.15) and Bin 8 (0.75) to create three groups of students of size at least 10% of the class separated by at least 20% capability on Skill k. If our thresholds were τd = .045, τw = 0.10, four groups would have been found (cutpoints: 0.15, 0.55, 0.75). Figure 2 also includes the three Assistment skill marginal distributions. While Unit Con- version (Figure 2d) has three well-separated peaks, given reasonable depth/size thresholds, our algorithm would not partition this skill since two non-zero bin counts are very small (i.e. modes of trivial mass). We also would likely not partition the skewed Multiplication distribution. Given τd=0.1, τw=0.2, we do partition Evaluate Functions at 0.15, 0.75 for three groups of students and cluster the three subsequent two-dimensional subspaces. Fig- ure 3 shows the methods’ respective results. There is less cross-plane clustering in HC and k-means without partitioning Unit Conversion (Figures 3a,b). MBC again chose 14 total with similar results; however, the subspace clustering (including both finding the partitions and clustering the subspaces) took ≈ 6 seconds (vs. 21) for computational savings of 71%. Figure 3: Cluster Assignments: a) HC, Complete G=3 · 22; b) K-means G=3 · 22; c) MBC G=14. 4 Recovering the True Skill Set Profiles. In this section, we simulate data from the DINA model, a common educational research model, to compare the methods’ ability to recover the students’ true skill set profiles. The deterministic inputs, noisy “and” gate model (DINA) is a conjunctive cognitive diagnosis model used to estimate student skill knowledge [10]. The DINA model item response form is P(yi j = 1 | ηi j, s j, g j)= (1 − s j)ηi jg1−ηi jj where αik = I{Student i has skill k} and ηi j =∏K k=1 α q jk ik indicates if student i has all skills needed for item j; s j= P(yi j=0 | ηi j=1) is the slip parameter; and g j= P(yi j=1 | ηi j=0) is the guess parameter. If student i is missing any of the required skills for item j, P(yi j = 1) decreases due to the conjunctive assumption. Prior to simulating the yi j, we fix the skills to be of equal medium difficulty with an inter-skill correlation of either 0 or 0.25 and generate true skill set profiles Ci for each student. In our work thus far, only a perfect inter-skill correlation has a non-negligible effect on the results. These parameter choices evenly spread students among the 2K natural skill set profiles. We randomly draw our slip and guess parameters (s j ∼ Unif(0,0.30); g j ∼ Unif(0,0.15)). Given the true skill set profiles and slip/guess parameters, we generate the student response matrix Y . Then, using a fixed Q matrix, we calculate and cluster the corresponding B matrix. For the first three methods, no partitioning is done (HC, k-means: G = 2K; MBC: searches from 1 to G > 2K). In conditional subspace clustering, we initially use τd= 0.1, τw= 0.2 and then cluster the resulting subspaces (if any). To gauge performance, we calculate their agreement to the true profiles using the Adjusted Rand Index (ARI), a common measure of agreement between two partitions [9]. Under random partitioning, E[ARI] = 0, and the maximum value is one. Larger values indicate better agreement. Table 1 presents selected simulations for K = 3, 7, 10 for varying J, N. In the Cond (MBC) column, the first ARI corresponds to the partitioning alone, the second to the clustering of the partitioned subspaces (with MBC). We also vary the Q-matrix design to include only single skill items, only multiple skill items, or both. In addition, the Q-matrix was balanced (bal) or unbalanced (unbal). If balanced, all skills and skill combinations occur the same number of times. Unbalanced refers to uneven representation of or missing skills (miss). Table 1: Comparing Clustering Methods with the True Generating Skill Set Profiles via ARIs. Excepting the multiple unbalanced design, the subspace algorithm selected one or more skills for partitioning (in some cases, all skills were correctly selected). In almost all sim- ulations, MBC was comparable to or better than HC and k-means for true skill set profile recovery. The partitioning method coupled with using MBC on the reduced subspaces gave comparable or better results in all cases except the balanced single and multiple skill design. In addition, subspace partitioning/MBC was always faster than MBC alone. Table 2: Comparison of Depth, Width Thresholds. In addition, for the fourth K= 3, J= 30 Q matrix design, we vary the depth and width thresholds. Smaller values of τd, τw will find narrower, shallower separations; in addition, smaller isolated clusters will be found. In this particular example, we found that as we decreased the depth threshold, more skills were (correctly) selected, and the performance of the partitioning by itself improved. While the parameters are designed to be user-specified, we are currently exploring their behavior in order to make good default suggestions. 5 Thirteen Skill Assistment Example. Finally, we briefly look at a higher dimensional Assistment example with K=13 skills, N=344 students, and J=135 items. This data set included multiple skill items and a large amount of missing response data. HC and k-means are not appropriate choices; finding 213=8192 clusters is unreasonable (without, say, allowing for empty clusters as in [1]); MBC will largely depend on choosing an appropriate search range. The conditional sub- space clustering algorithm, however, searches the space for obvious separation and parti- tions 9 of the 13 skills for a total of 221 subspaces (1 sec). All subspaces contained ≤ 13 students and so could likely be used alone or as subspaces for further clustering if needed. 6 Conclusions. We presented a conditional subspace clustering algorithm for use with the capability matrix (or similar skill knowledge estimate). The method selects skills that separate students well and reduces dimensionality for subsequent clustering. Our work so far shows that for most Q-matrix designs, the recovery of true skill set profiles is comparable or better than other clustering methods while also including skill selection. Since the true profiles in the Assist- ment examples are unknown, we cannot judge their recovery. However, visual inspection indicates that the partitions and skill selection seem sensible. To our knowledge, work in this area has not adequately addressed the need to analyze high-dimensional Q-matrices. The approach presented, while allowing for real time estimation of student skill set profiles, can handle large numbers of skills as well as incorporate practical user specifications."

About this resource...

Visits 119

0 comments

Do you want to comment? Sign up or Sign in