Similarity Functions for Collaborative Master Recommendations

InProceedings

Proceedings of Educational Data Mining, 2012

2012 2012

A memory-based collaborative system for recommending Master programs has been recently developed for University College Maastricht (UCM). Given the academic profile of a Bachelor student, the system recommends Master programs for that student based on the similarity of her profile to the profiles of the alumni students. The system is operational since September 2011 and is already popular among the UCM students. This paper considers the question of how to improve the quality of Master recommendations. For that purpose we study several academic profile representations and similarity functions. We identify the best representation strategy and show how to combine recommender systems based on different similarity functions to achieve superior Master recommendations.

"1. INTRODUCTION. University College Maastricht (UCM) is a Bachelor program offering a liberal-arts and sciences education. In this study, students can build their own curriculum consisting of approximately 40 out of 157 offered educational modules: courses, skill trainings, and projects. Thus, the academic profiles of the UCM Bachelor students are diverse. To manage the diversity, UCM employs academic advisors, whose task is to help students choose courses in the light of the final goals: desired type of Master programs, jobs, etc. To facilitate the students and advisors at UCM, we have developed a memory-based collaborative system for recommending Master programs. Given the academic profile (list of past, current, and future academic modules) of a Bachelor student, the system suggests Master programs for that student based on the similarity of her profile to the academic profiles of the alumni students. The tool allows the Bachelor student to modify her own profile, and thus to explore different alternatives in her study and how they influence her Master program possibilities. This paper considers the question of how to improve the quality of Master recommendations. For that purpose we study two representations of the academic profiles of the students (binary and ECTS1-based) and two classes of similarity functions (cosine and Tversky index). We show that: (1) the best representation is ECTS-based and (2) there is no best similarity function. Nevertheless, we introduce an approach to combine recommender systems based on the similarity functions under study so that the resulting combination achieves superior Master recommendations. 2. MASTER RECOMMENDATIONS. The Master Recommendation problem is given as follows. Let S be the set of all the students, C be the set of all the Bachelor modules, and M be the set of all the Master programs. The academic profile of a student sâˆˆS is a vector ps of values pscâˆˆps corresponding to modules câˆˆC. The values psc are binary or ECTS-based: if the student s followed or plans to follow module c, then psc equals 1 or the number of ECTS for c; otherwise, psc equals 0. The set of all the academic profiles ps is denoted by P. In the context of our formalization, the Master Recommendation problem is to find a subset Mâ€™âŠ†M of Master programs that fit the academic profile ps of a student sâˆˆS, given data DâŠ†PxM of academic profiles of alumni Bachelor students, labeled by the Master programs they have chosen. The problem is essentially a classification problem, as each alumni profile is labeled by one Master program, not by a set of programs or preference on them. In this respect, our problem differs from standard recommendation problems where such sets/preferences are available [1]. To solve this Recommendation problem we need a Recommender System h: Pâ†’2M. We have designed our recommender system h as a memory-based collaborative recommender system [1]. The system memory consists of the training data DâŠ†PxM of the academic profiles of UCM alumni students labeled by the Master programs they have chosen. The system operates in a collaborative way [1,2]: given the academic profile ps of a student sâˆˆS, the recommender system h returns the set Mâ€™ of Master programs of the alumni students whose academic profiles are among k-closest in the training data D to the profile ps. To specify completely the recommender system h we need similarity functions over the set P of academic profiles. In this context we note that, for UCM, the set C of modules is much larger than the set of modules a student takes. This implies that the module variables p*c are asymmetric. Thus we need similarity functions for asymmetric binary variables and we choose two such functions: cosine similarity and Tversky index. Given two academic profiles ps, pa âˆˆP the functions are defined as follows: FORMULA_1. The cosine similarity is a symmetric function for asymmetric variables. The Tversky index is an asymmetric function for asymmetric variables. If Î±=Î²=1, then it equals the Jaccard distance; if Î±>Î², we have emphasis on the student to be advised; if Î±<Î², we have emphasis on the alumni students. 3. EXPERIMENTS. We evaluated our recommender system using the leave-one-out method. The UCM data for the system consists of academic profiles of 223 alumni. The total number of Bachelor modules, past and present, to define the academic profiles is 329. The number of unique Master programs to recommend is 147. Among the alumni, 106 followed the same Master as at least one other alumnus. Thus, the academic profiles of these 106 alumni were used in test folds. Table 1 shows the accuracy of our recommender system. We note that a recommended set of Master programs is correct if the Master program of the student from the test fold is in the set. Since the parameter k increases the size of the recommended set, the accuracy grows with k. In addition we note that k is an upper bound on the size of the recommended set of Master programs. We observe that, in the case of recommendations for Master programs, the similarity functions perform better on average when applied on the ECTS representation, as opposed to the binary representation. Moreover, the prediction accuracy when considering a set of between 2 and 80 recommending neighbors is significantly better when applied on the ECTS representation. Table 1. Recommender accuracy versus k, as k increases. Furthermore, we observe that the performance of the classifiers is relatively different in distinct ranges of the k number of neighbors. If we compare the three ECTS-based Tversky indexes in Table 1 we notice that: for k between 1 and 11, Tversky with Î±=1 and Î²=0 performs better than the other two; for k between 31 and 81 recommendations, the Jaccard index outperforms the other variants; while for k from 91 to 171, Tversky with Î±=0 and Î²=1outperforms the other Tversky variants. We conclude therefore that there is no clear best function when taking a big range for k into consideration. 4. COMBINED RECOMMENDING SYSTEM. Figure 1 shows the accuracy curves of two versions V1 and V2 of our recommendation system built on the ECTS-based Tversky similarity functions with Î±=Î²=1 and Î±=0, Î²=1. The convex hull of these curves is a set of points that contain the curves. We can build a combined recommender system whose accuracy curve is that of the convex hull. To illustrate the idea assume we need a recommender system whose accuracy curve contains the line segment (p1, p2) in Figure 1. This means that we need a recommender system V3 whose accuracy for some k defines a point p3 on (p1, p2). We design such a system by a very simple approach similar to that from [3]. If we need to determine Master programs for the academic profile ps, we flip a loaded coin with heads probability equal to 1 - distance(p1, p3) / distance(p1, p2). If the face-up side is heads, the recommending set of V1 for point p1 is given; otherwise, the recommending set of V2 for point p2 is given. In the long run, it is straightforward to prove that the accuracy of the recommender system V3 for the k-value of point p3 will be equal to the accuracy of point p3. Figure 1. Accuracy Curves of the Recommender System. 5. CONCLUSION AND FUTURE WORK. This paper showed how to improve the quality of Master recommendations. It determined the best representation of academic profiles and introduced a new approach on how to combine recommender systems based on different similarity functions to achieve superior performance. Future work will focus on implementing and testing this approach. 6. ACKNOWLEDGMENTS. We would like to thank the members of the Maastricht University Leading in Learning program for their financial support, as well as Prof. Harm Hospers and the staff of UCM for their help and support."

About this resource...

Visits 159

Save to My personal space
Send link

Categories:

Educational Data Mining (EDM)

Tags:

0 comments

Do you want to comment? Sign up or Sign in

¿Cómo puedes configurar o deshabilitar tus cookies?

Similarity Functions for Collaborative Master Recommendations

InProceedings