Predicting graduate-level performance from undergraduate achievements

Inproceedings

Proceedings of Educational Data Mining, 2011

2011

One of the principal concerns in graduate admissions is how future performance in graduate studies may be predicted from a candidate's undergraduate achievements. In this study, we examined the statistical relationship between B.Sc. and M.Sc. achievements using a dataset that is not subject to an admission-induced selection bias. Our analysis yielded three insights. First, we were able to explain 55% of the variance in M.Sc. performance using a small set of highly discriminative B.Sc. achievements alone. Second, we found the final B.Sc. year to be most informative for future M.Sc. performance. Third, when a failed exam was repeated, the crucial piece of information was the grade achieved in the final attempt. We envisage that results such as the ones described in this study may increasingly improve the design of future admission procedures.

"1. INTRODUCTION. Throughout continental Europe, the last decade has seen the gradual adoption of a three-tier education system, consisting of Bachelorâ€™s, Masterâ€™s, and Doctoral programmes. Due to its modularity, the system places high demands on the admission process. Several previous studies have examined the utility of undergraduate GPA (UGPA) scores in predicting graduate-level performance [Lane et al. 2003; Owens 2007]. However, because the UGPA is frequently used as an admission criterion in its own right, most studies to date are based on data with an inherent selection bias and may have underestimated the predictive power of undergraduate performance [Dawes 1975]. In this study, we analysed a dataset that exhibits no selection bias. Specifically, we acquired data from Computer Science undergraduates at ETH Zurich, all of whom were subsequently admitted to the M.Sc. programme, regardless of their undergraduate achievements. We investigated (i) what proportion of the variance of graduate performance could be explained by B.Sc. grades; (ii) whether achievements in the competitive first year or achievements in the final year of the B.Sc. programme proved most predictive; and (iii) the informativeness of first attempts versus final attempts when failed exams had been repeated. 2. METHODS. Data. Data were collected from the B.Sc. and the consecutive M.Sc. programme in Computer Science at ETH Zurich over a seven-year time period. Most B.Sc. courses were mandatory, while the M.Sc. programme granted more freedom of choice. A data matrix was constructed on the basis of 176 students, 125 predic- tor variables, and one target variable (the GPA of the M.Sc. programme achievements, GGPA). Predictor variables included: gender, age at enrolment, rate of progress, single course achievements (first and final examination attempts, measured on the Swiss 6-point grading scale), several GPAâ€™s (precision: two decimal places), and study duration. Methodology. A random-forest algorithm was used to estimate decision trees for regression on random partitions of the data. Predictions were evaluated using an out-of-bag scheme. We used the canonical variable- importance measure of random forests for feature selection, and we used the pseudo-R2 statistic for model selection. Fig. 1. (a) Degree of importance (x-axis) of individual undergraduate variables in predicting graduate-level performance. (b) Box-plots of 100 pseudo-R2 estimates for the first, the second, and the third B.Sc. study year. (c) Box-plots of 100 Pseudo-R2 estimates of single course achievements related to either first attempts or final attempts of exams. 3. RESULTS. Prediction performance and underlying predictors. Regarding the question of overall predictability, a small set of highly discriminative predictor variables explained 55% of the GGPA variance (Figure 1a). Informativeness of undergraduate years. Concerning the relative importance of different study years, the third undergraduate year was most informative for future performance (Figure 1a,b). Repeated exams. Regarding failed and repeated exams, models based on grades from final attempts yielded a significantly higher prediction accuracy than models based on grades from first attempts (Figure 1c). 4. DISCUSSION. Our analysis yielded three insights. First, we showed that it is feasible to explain as much as 55% of the variation in graduate performance purely on the basis of undergraduate achievements. This result outper- forms previous attempts in the literature, and it highlights the significance of undergraduate achievements as criteria for M.Sc. admission decisions. Second, we found third-year achievements to be more predictive for future M.Sc. performance than first- year grades. This is an important result, given that one might intuitively overestimate the predictive power of the highly competitive first-year courses. Third, when exams were failed and repeated, our results indicate that final-attempt grades are more informative than first-attempt grades. Members of admission committees might feel tempted to ask for more information on failed exams, but our study suggests that results commonly reported in academic transcripts may be exhaustive. This observation also indicates that success in subsequent studies may not critically rely on the speed with which students have mastered their material. Rather, the key factor appears to be the amount of knowledge they have acquired at the time of completing an undergraduate degree. An open question is to what extent the statistical approach adopted here can be extended to predict performance across universities and across countries. We will explore this question in a future study."

About this resource...

Visits 144

Save to My personal space
Send link

Categories:

Educational Data Mining (EDM)

Tags:

0 comments

Do you want to comment? Sign up or Sign in

¿Cómo puedes configurar o deshabilitar tus cookies?

Predicting graduate-level performance from undergraduate achievements

Inproceedings