formularioHidden
formularioRDF
Login

Sign up

 

Integrating Data Mining in Program Evaluation of K-12 Online Education

Article

Journal of Educational Technology and Society Special Issue on Learning and Knowledge Analytics 2012
Impact factor:

This study investigated an innovative approach of program evaluation through analyses of student learning logs, demographic data, and end-of-course evaluation surveys in an online K-12 supplemental program. The results support the development of a program evaluation model for decision making on teaching and learning at the K-12 level. A case study was conducted with a total of 7,539 students (whose activities resulted in 23,854,527 learning logs in 883 courses). Clustering analysis was applied to reveal students' shared characteristics, and decision tree analysis was applied to predict student performance and satisfaction levels toward course and instructor. This study demonstrated how data mining can be incorporated into program evaluation in order to generate in-depth information for decision making. In addition, it explored potential EDM applications at the K12 level that have already been broadly adopted in higher education institutions.

"Introduction. Traditionally, the majority of online instructors and institutional administrators rely on web-based course evaluation surveys to evaluate online courses (Hoffman, 2003). The data and information are then used to help inform online program effectiveness and generate information for program-level decision-making. While it enjoys wide use, the survey method only provides learners’ self-report data, not their actual learning behaviors. Several studies have found self-reported data were not consistent with actual learning behaviors (Hung & Crooks, 2009; Picciano, 2002). This inconsistency can potentially compound the already problematic lack of direct observation opportunities. Online program administrators need more effective tools to provide customized learning experiences, to track students’ online learning activities for overseeing courses (Delavari, Phon-amnuaisuk, & Beikzadeh, 2008), to depict students’ general learning characteristics (Wu & Leung, 2002), to identify struggling students (Ueno, 2006), to study trends across courses and/or years (Hung & Crooks, 2009), and to implement institutional strategies (Becker, Ghedini, & Terra, 2000). Each of these needs can be addressed by mining educational data. Nowadays, various educational data are stored in database systems. This is especially true for online programs, wherein student learning behaviors are recorded and stored in Leaning Management Systems (LMS). Program administrators can take advantage of emerging knowledge and skills by extracting and interpreting those data. The purpose of this study is to propose a program evaluation framework using Educational data mining. Program evaluation. Program evaluation is the means by which a program assures itself, its administration, accrediting organizations, and students that it is achieving the goals delineated in its mission statement (Nichols & Nichols, 2000). Evaluation can be done by a variety of means. The most common form of evaluation is through surveying students regarding courses/faculty/programs (e.g., Cheng, 2001; Hoffman, 2003; Spirduso & Reeve, 2011). However, making causal inferences based on a one-time assessment is risky (Astin & Lee, 2003). Nevertheless, perceptional survey data cannot accurately reflect real learning behaviors (Hung & Crooks, 2009; Picciano, 2002). Although various scholars (e.g., Grammatikopoulous, 2012; Vogt & Slish, 2011) have proposed systematic frameworks (e.g., interviews and observation) in order to obtain objective knowledge via multiple means, these methods are difficult to implement in a fully online program. Educational data mining. Data mining (DM) is a series of data analysis techniques applied to extract hidden knowledge from server log data (Roiger & Geatz, 2003) by performing two major tasks: Pattern discovery and predictive modeling (Panov, Soldatova, & Dzeroski, 2009). Educational data mining (EDM) is a field which adopts data mining algorithms to solve educational issues (Romero & Ventura, 2010). Romero & Ventura (2010) reviewed 306 EDM articles from 1993 to 2009 and proposed desired EDM objectives based on the roles of users. For the purpose of this study, which is designed to inform administrators, the list is limited to objectives for administrators: - Enhance the decision processes in higher learning institutions  - Streamline efficiency in the decision making process  - Achieve specific objectives  - Suggest certain courses that might be valuable for each class of learners  - Find the most course effective way of improving retention and grades  - Select the most qualified applicants for graduation  - Help to admit students who will do well in higher education settings    Based on the theory of bounded rationality, decision-making is a fully rational process of finding an optimal choice given the information available (Elster, 1983). An ideal program evaluation framework should provide multiple facets of information to decision makers. Therefore, integrating more than one data source and analytic method is essential for an effective program evaluation. Figure. 1. Program evaluation framework. Program evaluation framework. Figure 1 shows the framework of the proposed program evaluation method. The core strategy of this framework is data triangulation (Jick, 1979) which combines multiple data sources (learning logs, course evaluation survey, and demographic data) and multiple methods (pattern discovery and predictive modeling) to generate accurate, in-depth results. Using this framework, the authors conducted a program evaluation case study to evaluate how the proposed program evaluation framework can support administrators’ decision making. Method. Data source. In this case study, data were collected from a statewide K–12 online institution that serves over 16,000 students in a northwestern state in the U.S. The institution provides fully online courses to K–12 students. Courses were designed by subject-matter curriculum designers and subject-matter teachers to standardize course materials. Teachers were required to complete an online orientation prior to teaching courses for the institution. Teachers received the same or similar training for online teaching provided by the institution. Site coordinators are located at each district in the state and regional principals oversee teacher evaluation. The following data were collected for the academic year of 2009-2010 (3,604 students enrolled in Fall 2009 and 3,935 students in Spring 2010): 1) LMS activity logs; 2) student demographic data; and 3) course evaluation survey data. All data tables were stored in the database and interconnected with unique identifiers (e.g., course ID). LMS activity logs. The LMS activity logs were collected from the Blackboard activity accumulator (Blackboard Inc., 2010) for the Fall 2009 and Spring 2010 academic terms. The following records were removed in data preprocessing: irrelevant fields (e.g., group ID), irrelevant records (e.g., login failure), and data stored in wrong or mismatched fields (about 11.8% of overall activity logs). After data preprocessing, a total of 23,854,527 activity logs were collected from 7,539 students in 883 courses. These students took 1 to 18 courses in the 2009–2010 academic year. Student demographic data The following demographic data were collected for data analysis, including age, gender, graduation year, city, school district, number of online course(s) taken, number of online course(s) passed, number of online course(s) failed, and final grade average. Course evaluation survey. A course evaluation survey investigated students’ satisfaction toward their course and instructor. Course satisfaction contained eight questions related to course content, five related to course structure, and eleven related to instructor satisfaction. Records containing any missing values were removed from the analysis. In addition, because student identifiers were not collected during the Fall 2009 survey implementation, which prevents the researchers from associate survey responses with demographic data and LMS activity logs, only Spring 2010 survey data (2618 respondents) were analyzed in this study. Engagement level. Engagement is considered to be a key variable for enabling and encouraging learners to interact with the material, with the instructor, and with one another, as well as for learning in general. In this study, engagement level was measured by the frequency of various learning interactions that happened within the LMS. Variables under the category “Student Engagement Variable” in Table 1 were applied to measure each student’s engagement level, which included: - Average frequency of logins per course. - Average frequency of tab accessed per course (if the course was organized using “tabbed” navigation). - Average frequency of module accessed per course (if the course was organized using “modules”). - Average frequency of clicks per course. - Average frequency of course accessed per course (from Blackboard portal to course site). - Average frequency of page accessed per course (content created using the page tool). The Page tool allows instructors to include files, images, and text as links on the course menu - Average frequency of course content accessed per course (content created using the content tool). The Content tool allows instructors to create course content within the content area. - Average number of discussion board entries per course. Variables. Table 1 lists variables collected from Blackboard, the student demographic database, and the course evaluation survey. Some variables were transformed with calculations in order to generate more meaningful variables for analysis. For example, student’s birth year was transformed to age. The summary of all learning activities was aggregated to a new variable called “frequency of clicks” that represents each student’s total frequency of clicks in the Blackboard LMS. If students took more than one course during the analysis period, variables of learning activities (e.g., frequency of total clicks and frequency of course access), performance (e.g., final grade), and survey (e.g., course satisfaction and instructor satisfaction) were averaged. Table 1. Variables for data mining. Analytic tools. SAS Enterprise Miner 6.1 (SAS Institute Inc., USA) was employed to perform the following data mining tasks in this study: 1) student clustering which describes shared characteristics of students who passed or failed their course; 2) perception and performance predictions which identify key predictors of course satisfaction, instruction satisfaction, and final grade. Because one of the major target audiences of this article is K–12 administrators, the authors utilized methods like decision tree and K-means clustering, which can produce results more intuitive for non-data miners. Results. Student clustering. Clustering analysis. K-means algorithm (Hartigan & Wongm, 1979; Budayan, Dikmen, & Birgonul, 2009) was applied to group students based on their shared characteristics (Internal Standardization = Range; Maximum Number of Clusters = 6). Total clusters were limited to avoid trivially small or exclusive groups, the identification of which was outside the purposes of this case study. A pass rate equal to “1” means a student passed all courses during the period of analysis. A pass rate equal to “0” means a student failed all courses during the period of analysis. A pass rate between “0” and “1” means a student passed some, but not all, courses during the period of analysis. In clustering analysis, pass rate was set up as the standard for classification and six clusters were generated. Table 2 includes the results of clustering analysis in academic year 2009-2010. The following are shared characteristics of each cluster. - Cluster 1 (316 students, pass rate = 55.07%, all males): Cluster 1 consists of students who are older than Cluster 3 to 6. They were lower-engaged than Cluster 5 and 6 but higher than Cluster 3 and 4. On average, each student took 2.76 courses and failed about half of them. - Cluster 2 (320 students, pass rate = 56.11%, all females): Similar to Cluster 1, Cluster 2 consists of students who are older than Clusters 3 to 6. They are lower-engaged than Cluster 5 and 6 but higher than Cluster 3 and 4. On average, each student took 3.03 courses and failed about half of the courses. - Cluster 3 (594 students, pass rate = 0%, all males): Cluster 3 and 4 includes the lowest-engaged students. Cluster 3 students are all male. On average, each student took 1.43 courses and failed all of them. - Cluster 4 (601 students, pass rate = 0%, all females): Cluster 4 includes the lowest-engaged female students. On average, each student took 1.39 courses and failed all of them. - Cluster 5 (2,311 students, pass rate = 100%, all males): Cluster 5 and 6 represent the highest-engaged students. Cluster 5 students are all male. On average, each student took 1.59 courses and passed all of them. - Cluster 6 (3,397 students, pass rate = 100%, all females): Cluster 6 represents the highest-engaged female students. On average, each student took 1.64 courses and passed all of them. Table 2. Results of clustering analysis. The clusters generated from cluster analysis were associated with two geographical variables: city and school district, in order to examine whether certain types of students were from specific areas. Differences in engagement were found depending on location. Clusters 1 to 6 had similar geographical distributions except for three larger cities (populations larger than 100,000). Cluster 5 (all male, pass rate = 100%) included a larger group of students from one large city. Cluster 6 (all female, pass rate = 100%) included a larger group of students from the other two large cities. There is no notable difference of school district distributions across clusters. Findings. Findings below were summarized from the clustering analysis. 1) Students with higher engagement levels usually had higher performance. 2) Younger students (CLs 5 & 6) who lived in larger cities were more successful than those in smaller cities (CLs 3 & 4) and older students (CLs 1 & 2). 3) All-failed students who were also low-engaged consisted of approximately 15.9% on average per course. 4) All-passed students who were also high-engaged consisted of approximately 75.7% students on average per course. 5) Based on Cluster 1 and 2, on average, older students (age > 16.91) tended to take more than two courses with pass rates ranging from 54.09-56.11%. 6) On average, high-engaged students demonstrated engagement levels twice that of low-engaged students. 7) Frequencies of reading behaviors (such as content access and page access) were much higher than discussion behaviors (p<0.001). 8) Female students were more active than male students in online discussions (with higher DB_Entry avg frequency). 9) Female students had higher pass rates than male students. Average clicks per course in different subject areas. Table 3 shows students’ average frequencies of total clicks and performances per course in different subject areas. Total clicks were equal to the summarized frequency of overall learning activities. The results show that Math and Science had the highest number of total clicks per course and of total clicks per student per course. However, for those who took Math and/or Science courses, their average final grades (56.70 and 64.41 accordingly) were lower than the overall final grade average (71.11). This indicates students participated actively in courses of these two subject areas, but they failed to achieve expected outcomes (70 or higher). Possible reasons for this outcome could be related to course design and/or best practice in teaching strategies for Math and Science courses. On the other hand, English courses received a lower number of clicks combined with less than expected outcomes. Encouraging motivation and engagement in these courses could have a profound effect on future outcomes. Students in Foreign Language and Health not only participated in learning activities actively, but also obtained the highest grades, on average, in each of these two subject areas. Table 3. Average frequencies of total clicks and performance in different subject areas. Findings. 10) Subjects where the level of activity was effective and consistent with student outcomes included Driver Education, Electives, Foreign Language, Health, and Social Studies. 11) Subjects where the level of activity was inconsistent with student outcomes included Math, Science and English. Math and Science courses had high activity levels with less than expected outcomes. 12) Subjects where the level of activity was low and consistent with low student outcomes included English. Subject preferences Figure 2 shows percentages of female and male students in different subject areas. Because the original female versus male ratio is 1.34, subject preferences for female and male students were revealed by comparing female/male ratio in each subject with the original ratio. Subjects above the dashed line are those with higher female ratios. Findings 13) Female students preferred taking Electives, Foreign Language, and Social Studies. 14) Male students preferred taking Drivers Education, Math, and Science. Figure 2. Gender preferences (female students/male students) by subject areas. Pass rate in different subject areas. Table 4 consists of two parts. The first part examines whether pass rates of female and male students in different subjects have significant differences. “F vs. M” compares gender pass rate difference using t-tests. The second part examines pass rate difference between Fall 2009 and Spring 2010 within the same gender. For example, “F vs. F” compares pass rate difference between Fall 2009 and Spring 2010 female students in difference subjects by using ttests. Numbers marked with asterisks represent differences that have statistical significance. Table 4. Pass rate comparisons and statistical tests by gender and subject areas. Note: Statistical significance refers to the possible accurate rate of a statement by testing it with statistical methods. “p < .05” means the statement is at least 95% accurate (error rate is less than 5%). *p < .05, **p < .001 Findings. 15) Overall, females significantly performed better than male students, especially in the following subject areas: Electives, English, and Social Studies. 16) The fail rates during the Fall 2009 term was significantly higher than those during the Spring 2010 term, especially in those subjects with higher fail rates such as English, Math, Science, and Social Studies. After these results were revealed the researchers subsequently learned from the administrators of the program reported in this case study that they had adopted an early alarm system in Spring 2010 to track all communications between instructors and students. The results show those strategies could have improved students’ pass rates in most subject areas. Student performance and engagement by course number. Due to the previous results indicating that students in Math, Science, and English had lower performance than those in other subject areas, researcher were interested in identifying potential anomalies within this group which might help to explain the reasons for the results. Further analysis was applied to identify which Math, Science, and English courses resulted in the highest performance and which Math, Science, and English courses resulted in the lowest performance. Researchers divided courses into three conditions: (a) high-engaged, high-performance, (b) high-engaged, low performance, and (c) low-engaged, low-performance based on student behaviors within the course. Courses categorized as high-engaged and high-performance might represent courses with both effective design and effective implementation because students were highly engaged and achieved expected outcomes. Those categorized as highengaged and low-performance might represent courses with less effective course design because students were unable to achieve expected outcomes despite what appears to be effective implementation. Finally, courses categorized as low-engaged and low performance might represent courses with less effective course design and less effective course implementation. Our analysis revealed that regardless of the content area, most high-engaged, low performance, or low-engaged, low performance courses were entry-level courses. Most high-engaged, high performance courses were advanced level courses. Students’ responses to the survey question asking students to indicate their reasons for taking an online course were then incorporated to help further interpret the results. The majority of responses from young students enrolled in courses categorized as high-engaged and high-performing were “the course was not available in my school.” The majority of older student responses in courses that were categorized as low-engaged and low performance were “I was making up a class I had failed.” Findings. 17) Regardless of Math, Science, or English subject-matter, entry level courses tended to have lower performance whether students were categorized as low-engaged or high-engaged. This may speak more to course structure, design, and support than to the effectiveness of instruction. 18) The reasons students enrolled in a course may influence their engagement level and performance. Student survey responses indicated that students who retook courses they have previously failed, tended to demonstrate lower engagement and lower performance. If students took courses which were not available in their schools, these students were usually high-engaged and high performing. Predictive analysis. CRT Decision Tree analysis (Breiman, Friedman, Olshen, & Stone, 1984) was applied to construct predictive models combining course related data and survey results (Splitting Criterion: Gini; Leaf Size: 60; Maximum Depth: 10; Assessment Measure: Average Squared Error). These settings allow for a larger sequence of sub-trees in order to enrich the study’s findings. Decision Trees classifies instances by sorting them down the tree from the root to the leaf nodes. In the tree structures, leaf nodes represent classifications, and branches represent conjunctions of features that lead to different target values. The following three variables were adopted as dependent variables in the Decision Tree analysis: 1) Average course grade; 2) average course satisfaction; and 3) average instructor satisfaction. Survey questions on course satisfaction and instructor satisfaction can be retrieved from http://goo.gl/x8j18 - Average course grade is each student’s final course grade (range: 0-100). If a student took more than one course, average course grade is the average of multiple courses. - Average course satisfaction was generated by averaging the scores from eight survey questions related to course content and the five survey questions related to course structure (range: 1–7). If a student took more than one course, average course satisfaction is the average of satisfaction scores from multiple courses. - Average instructor satisfaction was generated by averaging the scores from 11 survey questions related to the instructor satisfaction (range: 1–7). If a student took more than one course, average instructor satisfaction is the average of satisfaction scores from multiple courses.   Final grade prediction All variables in Table 1 were imported for final grade prediction. Average course grade was used as the dependent variable and the remainders were treated as independent variables. Because the tree results contained too much information, blank nodes were used to represent the results excluded from the data interpretation. Figure 3 shows the decision tree for final grade prediction. In academic year 2009-2010, 75.7% of students passed all courses, 15.9% of students failed all courses, and 8.4% passed some but not all of their courses. The left branch of the decision tree represents students who passed all courses. The results indicate a positive correlation between engagement level and performance (higher engaged => higher performance). The right branch of the decision tree represents students who had failed in one or more courses. The results imply a negative correlation between engagement level and performance (lower engaged => lower performance). Figure 3. Final grade prediction (complete chart: http://goo.gl/NIfWu). Findings. 19) Engagement level and gender have stronger effects on student final grades than age, school district, school, and city. For most students, high engaged => high performance. 20) Compared with other Blackboard components such as discussion board entries and content access, tab access has negative effects on student performance (higher tab accessed => lower performance). 21) Female students performed better than male students. Final grade prediction (external variables). Additional decision tree analysis was conducted to investigate how external variables (i.e., non-learning activity variables) influenced student performance. Figure 4 is a portion of the decision tree for academic year 2009-2010. Figure 4. Final grade prediction with external variables only (complete chart: http://goo.gl/B8AvB). Findings. 22) Based on the predictive model, female students performed better than male students. 23) Students who were around 16 years old or younger performed better than those who were 18 years or older. Figure 5. Course satisfaction prediction (complete chart: http://goo.gl/5NLWl). Satisfaction prediction. Decision tree analysis was also conducted to predict students’ satisfaction levels toward their course and instructor. Fall 2009 survey data could not be associated with variables in Blackboard, so the following results are limited to Spring 2010 only.   Course satisfaction. All the scores calculated from the responses to survey questions on course satisfaction were averaged into the scores of one course satisfaction variable. The value of “7” for this variable represents highest satisfaction with a course and “1” represents lowest satisfaction with a course. Figure 5 is a portion of the decision tree regarding course satisfaction. Findings. 24) Students with higher average final grades (> 73.25, with a maximum score of 100) had higher course satisfaction. 25) Students who passed all courses or passed some of their courses had higher course satisfaction than all-failed students. 26) Students who took two or more courses in Spring 2010, whether they passed those courses or not, had higher course satisfaction. 27) Female students had higher course satisfaction than male students. 28) Online behaviors (i.e., frequency of page accessed and number of discussion board entries) had minor effects on course satisfaction (higher frequency/number => higher course satisfaction). 29) Students in different cities showed different course satisfaction levels. Instructor satisfaction. All the scores calculated from the responses to survey questions on instructor satisfaction were averaged into the scores of one instructor satisfaction variable. The value of “7” for this variable represents highest satisfaction with an instructor and “1” represents lowest satisfaction with an instructor. Figure 6 is a portion of the decision tree regarding instructor satisfaction. Figure 6. Instructor satisfaction prediction (complete chart: http://goo.gl/QCdpw). Findings. 30) Students with higher average final grades (> 73.25, with a maximum score of 100) indicated higher instructor satisfaction. 31) Students who took two or more courses in Spring 2010, whether they passed those courses or not, showed higher instructor satisfaction. 32) Female students indicated higher instructor satisfaction than male students. 33) Online behaviors (frequency of module accessed) had minor effects on instructor satisfaction (higher frequency => higher course satisfaction). However, there were six students indicated low instructor satisfaction, despite extremely high frequency of course access and high final grades. 34) Older students taking one course (> 17.5 years old) had higher instructor satisfaction 35) Students from different schools showed different satisfaction levels for their online instructors.. 36) Younger female students (<15.5 years old) with lower average final grade (<76.5) indicated lower instructor satisfaction. Discussion. This study is a first attempt at program evaluation combining multiple data sources. The goal of this project was to propose a new program evaluation framework in order to generate sufficient information for program-level decisionmaking. The advantages of this framework are data triangulation and data interpretation. Below are triangulation and interpretation results: - Female students generally performed better than male students (findings (9), (15), (21), and (22)); however, the findings were limited to the following subject areas: Electives, English, and Social Science (finding (15)). - High-engaged students generally performed better than low-engaged students (findings (1), (10), (12) and (19)); however, the findings were limited to non-STEM courses (findings (10) and (11)). One possible factor influencing high-engaged students’ inability to consistently achieve expected outcomes may have been poor course design. - Younger students generally performed better than older students (findings (2) and (23)); however, the findings were limited to students in larger cities (findings (2)). - One possible explanation for older students’ generally lower performances may be that older students took more than two courses per semester for credit recovery (findings (5) and (18)). Younger students took fewer courses, but the fact the courses were generally not available in school district may have increased motivation (findings (2) and (18)). Overall, using multiple forms of data allows for a more meaningful analysis of actual student behaviors, and the identification of potential relationships with demographic data, satisfaction data, and student outcomes. The result is a much richer and deeper analysis of student performance and teaching, as well as of effective course design, than could ever be accomplished with survey data or behavior mining alone. Demographic and performance. Based on results revealed by the program evaluation framework, some indicators can be applied to identify students more likely to be successful and those more likely to be at-risk. In this study, a student who possessed more of the characteristics listed below was more likely to be successful: - Female - Younger than 16.5 years - Took one or two courses per semester - Took a Foreign Language or Health course  - Lived in a larger city  A student who possessed more of the characteristics below was more likely to be at risk of failure. - Male - Older than 18 years - Took more than two courses per semester - Took entry-level courses in Math, Science, or English - Lived in a smaller city These indicators can be applied to develop an early warning system (Macfadyen & Dawson, 2010), so administrators and teachers can have a list of successful and at-risk students before each semester starts. Engagement and performance. Based on data mining analysis, higher-engaged students usually had higher performance. This finding is also supported by previous studies (e.g., Hung & Crooks, 2008; Hung & Zhang, 2009). However, the conclusion may be limited to courses which were well designed and implemented. In this study, entry-level courses tended to have lower performance, regardless as to whether students were categorized as low-engaged or high-engaged. This means high-engaged students might still have lower final grades if they were in a course with course structure, design, and/or support issues. Lim & Morris (2009), when studying post-secondary students, found junior and senior students had significantly higher survey mean scores in perceived learning, learning application, and learning involvement than freshman and sophomore students. Assuming higher perceived learning, learning application, and learning involvement equates to high motivation, the authors could not explain why older students had significantly higher engagement than young students. Our study, by combining analysis of engagement and performance through data mining with survey responses, revealed why students had different levels of engagement and performance. For example, the majority of responses from the students enrolled in courses categorized as high engagement and high performance were, “The course was not available in my school.” Meanwhile, the majority of responses from students in courses that were categorized as low engagement and low performance were, “I was making up a class I had failed.” The level of engagement in our study may have been influenced by motivation. In addition, Lim & Morris (2009) found older students had a better chance to be successful in online learning at the higher education level. However, our study found older students were more likely to be at-risk students in K-12 online education. Students older than 18 tended to be low engaged and lower performing in their courses. Satisfaction and performance. There is no confirmed relationship between student performance and satisfaction (positive correlation—Eiszler 2002, Nasser & Hagtvet 2006; No relation—Ladebo, 2003, Walker & Palmer, 2011). In the case study, students with higher final grades usu"

About this resource...

Visits 138

0 comments

Do you want to comment? Sign up or Sign in