formularioHidden
formularioRDF
Login

Sign up

 

Using learning analytics to assess students' behavior in open‐ended programming tasks

Inproceedings

In this paper, we describe a strand of the Project-Based Learning Analytics project, aimed at assessing student learning in unscripted, open-ended environments. We review the literature on educational data mining, computer vision applied to assessment, and emotion detection, to exemplify the directions we're pursuing with the research. We then discuss the relevance of the work, and describe one case study, in which students created computer programs and snapshots of the code were recorded when the users attempted a compilation. The subjects were sophomore undergraduate students in engineering. Underlying this work is the assumption that human cognition can evolve in ways that are too subtle or too complex to be detected by conventional techniques, and computational techniques can not only help educators design systems with better feedback, but could also be a novel lens to investigate human cognition, finding patterns in massive datasets otherwise inaccessible to the human eye.

"1 Introduction and significance. Politicians, educators, business leaders, and researchers are unanimous to state that we need to redesign schools to teach the so-called 21st century skills: creativity, innovation, critical thinking, problem solving, communication, collaboration, among others. None of those skills are easily measured using current assessment techniques, such as multiple choice tests or even portfolios. As a result, schools are paralyzed by the push to teach new skills, and the lack of reliable ways to assess those skills. One of the difficulties is that current assessment instruments are based on end products (an exam, a project, a portfolio), and not on processes (the actual cognitive and intellectual development while performing a learning activity), due to the intrinsic difficulties in capturing detailed process data for large numbers of students. However, new sensing and data mining technologies are making it possible to capture and analyze massive amounts of data in all fields of human activity. Current attempts to use artificial intelligence techniques to assess human learning have focused on two main areas: text analysis and emotion detection. The work of Rus et al. (2009), for example, makes extensive use of text analytics within a computer-based application for learning about complex phenomena in science. Students were asked to write short paragraphs about scientific phenomena -- Rus et al. then explored which machine learning algorithm would enable them to most accurately classify each student in terms of their content knowledge, based on comparisons with expert-formulated responses. Similarly, speech analysis further removes the student from the traditional assessment setting by allowing them to demonstrate fluency in a more natural setting. Beck and Sison (2006) have demonstrated a method for using speech recognition to assess reading proficiency in a study with elementary school students that combines speech recognition with knowledge tracing (a form of probabilistic monitoring.) The second area of work is the detection of emotional states using non-invasive techniques. Understanding student sentiment is an important element in constructing a holistic picture of student progress, and it also helps to better enable computer-based systems to interact with students in emotionally supportive ways. Using the Facial Action Coding System (FACS), researchers have been able to develop a method for recognizing student affective state by simply observing and (manually) coding their facial expressions, and applying machine learning to the data produced (Craig et al., 2008). Researchers have also used conversational cues to realize student emotional state. Similar to the FACS study, Craig et al. (2008) designed an application that could use spoken dialogue to recognize the states of boredom, frustration, flow, and confusion. They were able to resolve the validity of their findings through comparison to emote-aloud (a derivative of talk-aloud where participants describe their emotions as they feel them) activities while students interacted with AutoTutor. Even though researchers have been trying to use all these artificial intelligence techniques for assessing students’ formal knowledge and emotional states, the field would derive significant benefit from three important additions: 1) student activity data (gestures, sketches, actions) as a primary component of analysis, 2) automation of data analysis processes, 3) multidimensional data collection and analysis. Educational data mining (EDM; Amershi & Conati, 2009; Baker, Corbett, Koedinger, & Wagner, 2004) has been used in many contexts to measure students’ learning and affect, but in Baker and Yacef’s review of its current uses, the majority of the work is focused on cognitive tutors or semi-scripted environments (Baker & Yacef, 2009). At the same time, qualitative approaches presents some crucial shortcomings: (1) there is no persistent trace of the evolution of the students’ artifacts (computer code, robots, etc.), (2) crucial learning moments within a project can last only seconds, and are easy to miss with normal data collection techniques, and (3) such methodologies are hard to scale for large groups or extended periods of time. Most of previous work on EDM, however, has been used to assess very specific and limited tasks – but the 21st century skills we need to assess now are much more complex, such as creativity, the ability to find solutions to ill-structured problems, to navigate in environments with sparse information, as well as dealing with uncertainty. Unscripted learning environments are well-known for being challenging to measure and assess, but recent advances in low-cost bio sensing, natural language processing and computer vision could make it possible to understand students’ trajectories in these environments. In this paper, I will present one first and simpler example of the possibility to use learning analytics and educational data mining (Baker & Yacef, 2009) to inspect students’ behavior and learning in project-based, unscripted, constructionist (Papert, 1980) learning environments, in which traditional assessment methods might not capture students’ evolution. The case study examines patterns in students programming scientific models. Snapshots of the code generated by students were automatically stored in a file, which was later analyzed using custom-built tools. One of such tools is the Code Navigator, which allows researchers to go back and forth in time, “frame-by-frame,” tracking students’ progression and measuring statistical data. Results show that we could not only identify phases in the development of a program (see Figure 2), and also some preliminary evidence of typical behaviors pertaining to novices and experts. Many researchers have attempted to automate the collection of action data, such as gesture and emotion. For example, Weinland et al. (2006) and Yilmaz et al. (2005) were able to detect basic human actions related to movement. Craig et al. (2007) created a system for automatic detection of facial expressions (the FACS study). The technique that Craig et al. validated is a highly non-invasive mechanism for realizing student sentiment, and can be coupled with computer vision technology and biosensors to enable machines to automatically detect changes in emotional state or cognitive-affect. Another area of active development is speech and text mining. For example, researchers have combined natural language processing and machine learning to analyze student discussions and writing, leveraging Independent Component Analysis of student conversations -- a technique whose validity has been repeatedly reproduced. The derived text will is subsequently analyzed using Latent Semantic Analysis (Rus et al. 2009). Given the right training and language model, LSA can give a clearer picture of each student’s knowledge development throughout the course of the learning activity. In the realm of exploratory learning environments, Bernardini, Amershi and Conati (2009; 2010) built student models combining supervised and unsupervised classification, both with log files and eye-tracking, and showed that meaningful events could be detected with the combined data. Montalvo et al., also using a combination of automated and semi-automated real-time coding, showed that they could identify meaningful meta-cognitive planning processes when students were conducting experiments in an online virtual lab environment. However, most of these studies did not involve the creation of completely openended artifacts, with almost unlimited degrees of freedom. Our study is one attempt in the direction of understanding the process of creation of these artifacts, especially by novices. Our goal in the paper will be to establish a proof of existence that automatically-generated logs of students programming can be used to infer consistent patterns in how students go about programming, and that by inspecting those patterns we could design better support materials and strategies, as well as detect critical points in the writing of software in which human assistance would be more needed. Since our data relies in just nine subjects, I don’t make claims of statistical significance, but the data points some clear qualitative distinctions between the students. 2 Methods and data collection. 2.1 Dataset. The goal of the logfile analysis was to identify patterns in the model building process using the NetLogo (Wilensky, 1999) programming environment. NetLogo can log to an XML file all users’ actions, such as key presses, button clicks, changes in variables and, most importantly for this study, changes in the code. The logging module uses a special configuration file, which specifies which actions are to be logged. This file was distributed to students alongside with instruction about how to enable logging, collect the log-files, and send those files back for analysis. In what follows, I show some general data about the files collected and conduct one more detailed analysis for one student. I will try to show how the data collected can, rather than completely elucidate the problem, point researchers to instances during the model-building process in a more in-depth qualitative analysis could be worthwhile. Nine students in a sophomore-level engineering class had a 3-week programming assignment. The task was to write a computer program to model a scientific phenomenon of their choice. Students had the assistance of a ‘programming’ teaching assistance, following the normal class structure. The teaching assistant was available for about 3-4 hours a week for each student, and an individual, 1-hour programming tutorial session was conducted with each of the students on the first week of the study. 158 logfiles were collected. Using a combination of XQuery and regular expression processors (such as ‘grep’), the files were processed, parsed, and analyze (1.5GB and 18 million lines of uncompressed text files). Below is a summary of the collected data (in this order): number of files collected by students, the total size in megabytes for the set of files, total number of events logged, total number of global variable changes, and its proportion in relation to the total number of logged events (in percent and in absolute numbers). Table. 1. Number of events collected per student. The overwhelming majority of events collected were global variable changes (99.6% of the total, and 60.8% on average per student. Note that in Table 1, “Globals” refer to the number of events containing only a variable change, while “Not Globals” refer to all other events.) This particular kind of event takes place when students are running or testing models – every single variable change gets recorded, what accounts for the very large number of events (almost 9 million.) Since the analysis of students’ interactions with models is out of the scope of this paper, all non-coding events were filtered out from the main dataset, so we were left with 1187 events for 9 users. 3 Data and discussion. For the analysis, I will first focus on one student and conduct an in-depth exploration of her coding strategies. Then, I will compare her work with other students, and show how differences in previous ability might have determined their experience. 3.1 Coding strategies. 3.1.1 Luca. Luca is a sophomore student in materials science and built a model of how crystals grow. She had modest previous experience with computers, and her grade in the class was also around the average, which makes her a good example for an in-depth analysis of log files. Figure 2 summarizes Luca’s model building logs. The red curve represents the number of characters in her code, the blue dots represent the time between compilation (secondary y-axis to the right), green dots placed at y=1800 represent successful compilations, orange dots placed at y=1200 represent unsuccessful compilations. In the following paragraphs, I will analyze each of the 6 parts of the plot. The analysis was done by looking at the overall increase in character count (Figure 2), and then using the Code Navigator tool (Figure 1) to locate the exact point in time when the events happened. Figure 1. The Code Navigator, which allows researcher to go back and forth in time tracking how students created a computer program. Figure 2. Code size, time between compilations and errors, for Luca’s logfiles. 1. Luca started with one of the exemplar models seen in the tutorial (the “very simple” solidification model). In less than a minute, she deleted the unnecessary code and ended up with a skeleton of a new model (see the big drop in point A). 2. She spent the next half-hour building her first procedure, to generate one of the two types of crystal growth she purported to include in the model. During this time, between A and B, she had numerous unsuccessful compilations (see the orange dots), and goes from 200 to 600 characters of code. 3. The size of the code remains stable for 12 minutes (point B), until there is a sudden jump from 600 to 900 characters (just before point C). This jump corresponds to Luca copying and pasting her own code: she duplicated her first procedure as a basis for a second one. During this period, also, she opens many of the sample models with NetLogo. 4. Luca spends some time making her new procedure work. The frequency of compilation decreases (see the density of orange and green dots), the average time per compilation increases, and again we see a plateau for about one hour (point D). 5. After one hour in the plateau, the is another sudden increase in code size, from 900 to 1300 characters (between D and E). Actually, what Luca did was to open a sample model and copy a procedure that generates a hexagonal grid, which was needed for her model. Note that code compilations are even less frequent. 6. After making the “recycled” code work, Luca got to her final number of 1200 characters of code. She then spent about 20 minutes “beautifying” the code, fixing the indentation, changing names of variables, etc. No real changes in the code took place, and the are no incorrect compilations.   Luca’s narrative suggests, thus, four prototypic modeling events: - Stripping down an existing model as a starting point. - Long plateaus of no coding activity, during which students browse other models (or their own model) for useful code. - Sudden jumps in character count, when students import code from other model or copy an paste code from within the working model. - A final phase in which students fix the formatting of the code, indentation variable names etc. 3.1.2 Shana, Lian, Leen, and Che. This initial analysis is useful to examine logfiles from other students as well in search of similarities. In the following, I show plots (character count vs. time) from five different students (Luca, Che, Leen, and Shana, Figure 3) which include all of students' activity (including opening other models—the “spikes”—note tha the plot in Figure 2 did not show all of Luca’s activities but only her activities within her model, i.e excluding opening and manipulating other models). Figure 3. Code size versus time for Luca, Shana, Che, and Leen. First, let’s examine Shana’s logfiles. After many ‘spikes,’ there is a sudden jump (at time=75) from about 200 to 4,000 characters of code. A closer, systematic examination revealed that Shana employed a different approach. After some attempts to incorporate the code of other models into her own (the spikes), she gave up and decided to do the opposite: start from a ready-made model and add her code to it. She then chose a very well-established model and built hers on top of it. The sudden jump to 4,000 characters indicates the moment when she opened and started making the model ‘her own,’ by adding her procedures. She seamless integrated the pre-existing model into her new one, adding significant new features. Leen, on the other hand, had yet another coding style. He did open other models for inspiration or cues, but did not copy and paste code. Instead, he built his procedures in small increments by trial-and-error. In Table 2 we can observe how he coded a procedure to “sprout” a variable number of white screen elements in his model (during a 30-minute period). The changes in the code are indicated in red. Table 2. Leen’s attempts to write the “InsertVacancies” procedure His trial-and-error method had an underlying pattern: he went from simpler to more complex structures. For example, he first attempts a fixed, “hardcoded” number of events (sprout), then introduces control structures (loop, while) to generate a variable number of events, and finally introduces new interface widgets to give the user control over the number of events. Leen reported having a higher familiarity with programming languages than Luca and Shana, which might explain his different coding style. Che, with few exceptions, did not open other models during model building. Similarly to Leen, they also employ an incremental, trial-and-error approach, but we can clearly detect many more long plateaus in Liam’s graph. Therefore, based on these four logfiles and the other five analyzed some canonical coding strategies were found in most of them: a. Stripping down an existing model as a starting point. b. Starting from a ready-made model and adding one’s own procedures. c. Long plateaus of no coding activity, during which students browse other models (or their own model) for useful code. d. Long plateaus of no coding activity, during which students think of solutions without browsing other models. e. Period of linear growth in the code size, during which students employ a trial-and-error strategy to get the code right. f. Sudden jumps in character count, when students import code from other models, or copy and paste code from within their working model. g. A final phase in which students fix the formatting of the code, indentation, variable names, etc. Based on those strategies, and the previous programming knowledge of students, the data suggest three coding profiles: - “Copy and pasters:” more frequent use of a, b, c, f, and g. - Mixed-mode: a combination of c, d, e, and g. - “Self-sufficients:” more frequent use of d, e. The empirical verification of these canonical coding strategies and coding profiles has important implications for design, in particular, constructionist environments. Each coding strategy and profile might entail different support strategies. For example, students with more advanced programming skills (many of which exhibited the “self-sufficient” behavior) might require detailed and easy-to-find language documentation, whereas “copy and pasters” need more working examples with transportable code. The data shows that students in fact are relatively autonomous in developing strategies for learning the programming language, and points designers to the need of having multiple forms of support (see, for example, Turkle and Papert (1991)). 3.2 Code compilation. Despite these differences, one behavior seemed to be rather similar across students: the frequency of code compilation. Figure 4 shows the moving average of unsuccessful compilations (thus, the error rate) versus time, i.e., the higher the value, the higher the number of unsuccessful compilations within one moving average period (the moving average period was 10% of the overall duration of the logfile—if there were 600 compilation attempts, there period of the moving average would be 60). Figure 4. Error rate versus compilation attempts (time). For all four students, eliminating the somewhat noisy first instants, the error rate curve follows an inverse parabolic shape. It starts very low, reaches a peak halfway through the project, and then decreases reaching values close to zero. Also, the blue dots on top of y=0 (correct compilations) and y=1 (incorrect compilations) indicate the actual compilation attempts. Most of them are concentrated in the first half of the activity—approximately 2/3 in the first half to 1/3 in the second half. This further confirms the previous logfiles analysis in which the model-building process is not homogenous and simple, but complex and comprised of several different phases: an initial exploration characterized by few unsuccessful compilations, followed by a phase with intense code evolution and many compilation attempts, and a final phase of final touches and smaller fixes. 4 Conclusion. This paper is an initial step towards developing metrics (compilation frequency, code size, code evolution pattern, frequency of correct/incorrect compilations, etc.) that could both serve as formative assessments tools, and pattern-finding lenses into students’ free-form explorations in technology-rich learning environments. The frequency of code compilations, together with the code size plots previously analyzed, enables us to trace a reasonable approximation of each prototypical coding profile and style. Such an analysis has three important implications for the design of open-ended environments: ─ To design and allocate support resources, moments of greater difficulty in the modeling process should be identified. Our data indicate that those moments happens mid-way through the project. ─ Support materials and strategies need to be designed to cater to diverse coding styles and profiles. By better understanding each student’s coding style, we also have an extra window into students’ cognition. Paired with other data sources (interviews, tests, surveys), the data could offer a rich portrait of the model-building process and how it affects students’ understanding of the scientific phenomena and the programming language."

About this resource...

Visits 604

0 comments

Do you want to comment? Sign up or Sign in