formularioHidden
formularioRDF
Login

Sign up

 

Translating Learning into Numbers: A Generic Framework for Learning Analytics

Article

With the increase in available educational data, it is expected that Learning Analytics will become a powerful means to inform and support learners, teachers and their institutions in better understanding and predicting personal learning needs and performance. However, the processes and requirements behind the beneficial application of Learning and Knowledge Analytics as well as the consequences for learning and teaching are still far from being understood. In this paper, we explore the key dimensions of Learning Analytics (LA), the critical problem zones, and some potential dangers to the beneficial exploitation of educational data. We propose and discuss a generic design framework that can act as a useful guide for setting up Learning Analytics services in support of educational practice and learner guidance, in quality assurance, curriculum development, and in improving teacher effectiveness and efficiency. Furthermore, the presented article intends to inform about soft barriers and limitations of Learning Analytics. We identify the required skills and competences that make meaningful use of Learning Analytics data possible to overcome gaps in interpretation literacy among educational stakeholders. We also discuss privacy and ethical issues and suggest ways in which these issues can be addressed through policy guidelines and best practice examples.

"Introduction. In the last few years, the amount of data that is published and made publicly available on the web has exploded. This includes governmental data, Web2.0 data from a plethora of social platforms (Twitter, Flickr, YouTube, etc.), and data produced by various sensors such as GPS location data from mobile devices. In the wake of this, data-driven companies like Google, Yahoo, Facebook, Amazon, etc. are growing exponentially by commercially exploiting such data for marketing or in the creation of new personalised services. The new “data economy” empowers companies to offer an increasing amount of data products at little or no cost to their users (e.g., Google Flu Trends, bit.ly customised URLs, Yahoo Pipes, Gapminder.com). This growth in data also renewed the interest in information retrieval technologies. Such technologies are used to analyse data and offer personalised data products customised to the needs and the context of individual users. It is already evident that data in combination with information retrieval technologies are not only the basis for the emergent data economy, but also hold substantial promises for use in education (Retalis et al., 2006; Johnson et al., 2011). One example of this is the research on personalisation with information retrieval technologies which has been a focus in the educational field for some time now (Manouselis et al., 2010). The main driver is the vision of improved quality, effectiveness, and efficiency of the learning processes. It is expected that personalised learning has the potential to reduce delivery costs while at the same time creating more effective learning experiences, accelerating competence development, and increasing collaboration between learners. Not so long ago, for universities and companies alike, gathering data on their users met with substantial limitations in terms of cost, time requirements, scope, and authenticity of the data, as this was typically done using questionnaires or interviews with a selected representative number of stakeholders. The new data economy has made data collection very much an affordable activity. It is based on the highly economic electronic data mining of people’s digital footprints and the automated analysis of behaviours of the entire constituency rather than sampling. Because data mining is not a separate act to normal user behaviour, the information retrieved is also highly authentic in terms of reflecting real and uninterrupted user behaviour. As such, data mining is more comparable to observational data gathering than to intrusive collection via direct methods. This will not make questionnaires and structured interviews obsolete, but it will greatly enhance our understanding and highlight possible inconsistencies between user behaviour and user perception (Savage and Burrows, 2007). The proliferation of interactive learning environments, learning management systems (LMS), intelligent tutoring systems, e-portfolio systems, and personal learning environments (PLE) in all sectors of education produces vast amounts of tracking data. But, although these e-learning environments store user data automatically, exploitation of the data for learning and teaching is still very limited. These educational datasets offer unused opportunities for the evaluation of learning theories, learner feedback and support, early warning systems, learning technology, and the development of future learning applications. This leads to the importance of Learning Analytics (LA) being increasingly recognised by governments, educators, funding agencies, research institutes, and software providers. The renewed interest in data science and information retrieval technologies such as educational data mining, machine learning, collaborative filtering, or latent semantic analysis in Technology-Enhanced Learning (TEL) reveals itself through an increasing number of scientific conferences, workshops and projects combined under the new research term Learning Analytics. Examples are the 1st Learning Analytics conference in Banff, Canada, 2011; the 4th International Conference on Educational Data Mining 2011 in Eindhoven, Netherlands; the 1st dataTEL workshop on Educational Datasets for Technology-Enhanced Learning at the Alpine-Rendez-Vous conference La Clusaz, France 2011; the 2nd International Conference on Learning Analytics and Knowledge (LAK12), Vancouver 2012; the 1st Workshop on Learning Analytics and Linked Data (LALD 2012); and more. Thus, the increasing amount of dedicated research events and publications make a meta-analysis of the domain timely and needed in order to establish a solid scientific basis which facilitates the development of new learner-oriented services. Critical dimensions of learning analytics. Despite the great enthusiasm that is currently surrounding LA, it also raises substantial questions for research. In addition to technically-focused research questions such as the compatibility of educational datasets, or the comparability and adequacy of algorithmic and technological approaches, there remain several “softer” issues and problem areas that influence the acceptance and the impact of Learning Analytics. Among these are questions of data ownership and openness, ethical use and dangers of abuse, and the demand for new key competences to interpret and act on LA results. We shall point at these issues in more detail below. This means that the implementation of LA in learning processes requires to be carefully crafted in order to be successful and beneficial. This necessity motivated us to identify six critical dimensions (soft and hard) of LA, which need to be covered by the design to ensure an appropriate exploitation of LA in an educationally beneficial way. By soft issues we mean challenges that depend on assumptions being made about humans or the society in general, e.g., competences or ethics. They are opposed by the hard challenges of the fact-based world of data and algorithms (cf. also the similar soft-hard distinction in Dron, 2011). In its coverage of soft issues, our framework differs from other, more workflow oriented models for LA, like that by Siemens (2011), although in his presentation he does acknowledge these as of concern. Rather than being a process model such as those collected in Elias (2011), we aim at a description framework that can later be developed into a domain model or ontology. The critical dimensions highlighted here have been deduced from discussions in the emerging research community using a general morphological analysis (GMA) approach (cf. Ritchey, 2011). In this early formation stage of the LA community, scientific exchanges such as open online courses (MOOC) in Learning and Knowledge Analytics (LAK11, LAK12), or the above-mentioned events and congregations soon began to revolve around a number of key questions, like: Who is the target group for LA? What are we trying to achieve? How do we deal with privacy and data protection? These questions are naturally extended by other on-going debates such as the openness of data, which has been a topic for some time in the EDM and Open Linked Data domain, as well as technical and theoretical questions on achieving meaningful extraction of information from data. Our chosen approach leading to the proposed framework consisted of a number of gathering and analysis processes. First, as a matter of opinion mining, we scanned the scientific interactions from proceedings and presentations of the conferences and working groups mentioned above. We conducted a brief literature review of abstracts in the field of Learning Analytics and Educational Data Mining. Additionally, we scanned the live discussions on the LA Google Groups (http://groups.google.com/group/learninganalytics and http://groups.google.com/group/LAK11), as well as the LAK11 MOOC (presentation chats and social networking exchanges). Furthermore, we looked back at recent RTD projects that contained elements of analytics and the questions and lessons they produced, e.g., the Language Technologies for Lifelong Learning project (http://www.ltfll-project.org) contained an analytics approach related to learner positioning and conceptualisation. Following these reviews, we applied cognitive mapping (Ackermann, Eden, and Cropper, 2004) for synthesising and sense making. We analysed these discussions and clustered them into the proposed six fields of attention, which we then presented as the first draft of the framework to a community of commercial and academic experts for evaluation and feedback at the SURF seminar on Learning Analytics (Eindhoven, 30-31 August 2011). The number six in the framework is not chosen for any particular reason, and other divisions are of course possible. However, we find the dissection into these six dimensions a useful and easy to follow domain orientation. With the framework, we take the presumption that responsible designers of analytic processes will not only implement what is technically possible to do and legally allowed (or at least not prohibited), but to consider holistically the outcomes for stakeholders and, even more importantly, the consequences for the data subjects, i.e., the people supplying the data (cf. the section on stakeholders below). The framework intends to be a guide as much as a descriptor of the problem zones. Hence we refer to it as a “design framework” that can and should be used to design LA services from an inclusive perspective. We will argue below that this will help the transferability of LA approaches between different contexts of application and research. Proposed design framework for learning analytics. Our proposed model for the domain and application of LA in figure 1 below considers six critical dimensions. Each of the dimensions can be subdivided into several instantiations falling into that dimension. For example, the generic “stakeholder” dimension can have instantiations (values) like “learners” and “teachers.” The list of instantiations in the diagram is not exhaustive and can be extended on a case-by-case basis. To stay with the above example, commercial service providers and even automated agents could also function as stakeholders in a LA process. It is useful to note that through connecting various (and also multiple) different instantiations of each dimension, concrete use cases can be constructed. We call the dimensions “critical” in the sense that each of the six fields of attention is required to have at least one instantiation present in a fully formulated LA design. We realise, though, that some dimensions are vaguer than others in this respect. Figure. 1. Critical dimensions of learning analytics. The six dimensions of the proposed LA framework are (cf. Figure 1): stakeholders, objectives, data, instruments, external constraints, and internal limitations. We will discuss each of these dimensions individually in the following and exemplify their instantiations and impact on the LA process and the benefits and opportunities they may determine. We will also elaborate apparent problem zones and limitations that may hinder any envisaged benefits. Before embarking on the abstract dimensions in detail, we would like to illustrate the purpose and possible usage of the framework on the following sample use case, which is created out of a number of instantiations of the six dimensions. This specific example relates to conducting a social network analysis of students discussing in a forum using the SNAPP tool, based on the work by Dawson et al. (Dawson, 2008; Macfadyen & Dawson, 2010). Table 1. Sample use case and values for dimensions. The above use case can be used (1) as a checklist when designing a purposeful LA process; (2) as a sharable description framework to compare context parameters with other similar approaches in other contexts, or for replication of the scientific environment. The framework allows an indefinite number of use cases with the respective value arguments. Stakeholders. The stakeholder dimension includes data clients as well as data subjects. Data clients are the beneficiaries of the LA process who are entitled and meant to act upon the outcome (e.g., teachers). Conversely, the data subjects are the suppliers of data, normally through their browsing and interaction behaviour (e.g., learners). It is important to make this distinction in order to understand the impact of the process on individuals. In certain cases, the two types of stakeholder groups can be the same, as is the case if a LA application feeds back information to learners about their own learning rather than to inform the teacher, as would be a common case in informal learning scenarios. In the traditional learner-teacher scenario, the teacher would act as the data client, who receives information gathered from the data subjects, i.e., the learners. As shown in the framework model (Figure 1), the main stakeholder groups of LA in formal learning situations are learners, teachers, and educational institutions. These may be expanded or substituted by other stakeholder groups, such as researchers, service providers, or governmental agencies. Each of the groups has different information needs and can be provided with tailored views on information using LA. Information flow between stakeholders can best be exemplified with the common hierarchical model taken from formal education (Figure 2). What the diagram illustrates as an example is by which ways benefits might be obtained from LA. The pyramid encapsulates the academic layers of education and training institutions. In the most direct way, data analysis from the student level, e.g., via a LMS, can inform the above layer, in this case the teachers. Teachers can then use the analytics information to plan targeted interventions or adjust their pedagogic strategies. Institutions can, similarly, retrieve benefits from student and teacher data in order to provide staff development opportunities or to plan policies like quality assurance and efficiency measures. We also want to stress the major benefits LA offers for self-reflection on every level (cf. left side of the diagram). We would like to see institutions enabling and actively encouraging students to reflect on their learning data. But also teachers and institutions can gain new insights by reflecting on their performance. Not immediately involved in the learning processes, researchers (right of the diagram) could harvest data for the purpose of evaluating or innovating teaching processes or learning services. Finally (on top of the diagram), Government agencies may collect cross-institutional data to assess the requirements of Higher Education Institutes (HEI) and their constituencies. Figure 2. Information flow between LA stakeholders. Although they are the most widespread form in formal education, hierarchies are not the only flow models to describe where benefits can be retrieved. For example, peer evaluation using Personal Learning Environments (PLE) may be another information environment for LA. Peer environments also prevail in academic transactions like conferences or publications that are based on peer review systems. Practical examples for a horizontal peer-related information flow are the various scientific impact measures that exist, e.g., citation indexes. Equally, serious games can provide a non-hierarchical approach and/or team perspective to collaborative learning, e.g., how fast a team completed a level. In each of these, however, lie some issues of dependency and possible legal constraints (cf. further below). Example opportunities for LA with respect to different stakeholder groups are: Students can be supported with specific learning process and reflection visualisations that compare their performance to the overall performance of a course. Furthermore, they can be provided with personalised recommendations for suitable learning resources, learning paths, or peer students (Gaviria et al., 2011). Teachers can be provided with course monitoring systems that inform them about knowledge gaps of particular pupils and thus enable them to focus their attention on those pupils. They can also harvest emergent group models that can lead to shared understanding of domain topics or processes for better curriculum design and on-the-fly adaptations. Institutions can monitor the performance of students regarding drop-out and graduation rate on a much finer granular level. In this way, they can evaluate their courses and improve outcomes of their courses. Other stakeholders: We would like to emphasise that stakeholders need not be confined to formal education settings, but include all formal, non-formal, or informal environments, such as professional development (CPD). In these cases, the stakeholders are to be substituted by the relevant entities. For non-formal learning, for example, stakeholders would include a “learner” instantiation with (only) a self-reflection dimension in which feedback is mirrored back to the same person. In work-based learning, employees and line-managers may be the most common stakeholder groups involved. More notably, computer agents can also serve as stakeholders, for example as data clients that take further decisions on the learner’s behalf or trigger an event (e.g., notification e-mail, recommendation of content or peer, etc.). Objectives. The main opportunities for LA as a domain are to unveil and contextualise so far hidden information out of the educational data and prepare it for the different stakeholders (see above). Monitoring and comparing information flows and social interactions can offer new insights for learners as well as improve organisational effectiveness and efficiency. This new kind of information can support individual learning processes but also organisational knowledge management processes (Butler & Winne, 1995). We can distinguish two fundamentally different objectives: reflection and prediction (cf. Figure 1 above). Reflection: Reflection is seen here as the critical self-evaluation of a data client as indicated by their own datasets in order to obtain self-knowledge. Wolf (2009) calls this process the “quantified self”, i.e., self-observation and reacting to one’s own performance log data. There already is a growing number of Personal Informatics Systems, i.e., humancomputer interaction systems that support this process (Li & Forlizzi, 2010). However, reflection may also be seen as critical self-evaluation based on other stakeholders’ datasets. This would especially be true if, for example, a teacher was led to reflect upon their teaching style as indicated by the datasets of their students. In the above hierarchical flow model (Figure 2), the higher order stakeholder would have the ability to utilise all the datasets from lower constituencies for their own reflection. On an individual level, LA can support reflection of learning processes and offer personalised information on the progress of the learner (Govaerts et al., 2010). On the institutional level, LA can enhance monitoring processes and suggest interventions or activities for particular students. Greatest care should however be taken not to confuse objectives and stakeholders in the design of a LA process and not to let, e.g., economic and efficiency considerations on the institutional level dictate pedagogic strategies, as this would possibly lead to industrialisation rather than personalisation. LA is a support technology for decision making processes. Therein also lies one of the greatest potential dangers. Using statistical analytic findings is a quantitative not a qualitative support agent to such decision making. We are aware that aligning and regulating performance and behaviour of individual teachers or learners against a statistical norm without investigating the reasons for their divergence may strongly stifle innovation, individuality, creativity and experimentation that are so important in driving learning and teaching developments and institutional growth. Prediction: Apart from support for reflective practice, LA can also be used for predicting and modelling learner activities (Siemens, 2011; Verbert et al., 2011). This can lead to earlier intervention (e.g., to prevent drop-out), or to adaptive services and curricula. Using Machine Learning techniques, for example, learner profiles can be built dynamically and automatically, saving the learner filling in and maintaining profile data. In predictive outcomes lies currently much hope for efficiency gains in terms of establishing acts of automatic decision making for learning paths, thus saving teacher time for other more personal interventions. But prediction suffers potentially from big ethical problems (to which more further below), in that judgements about a person, whether originating from another human or a machine agent, if based on a limited set of parameters could potentially limit a learner’s potential. For example, not every learner who has difficulties mastering subject level two, will automatically not master level three. We have to prevent re-confirming old-established prejudices of race, social class, gender, or other with statistical data, leading to restrictions being placed upon individual learners. Furthermore, there are limitations in the use of LA data as a means for supporting the learning process. Learning processes assume the leading role of the learner, rather than that of the teacher. However, the reliability of a LA-supported learner profile and its usefulness to the learners will remain questionable. For example, what LA data can be used in order to define whether a learning activity had a “high” or “low” impact on the learning process of learners, and at which points in the process itself? The diversity of learning makes it also problematic to judge which learning activity was of high value for learner A but of low value for learner B. With respect to pedagogic theories, we would like to argue that LA does neither support nor ignore specific pedagogic theories, and as an abstract concept is pedagogically neutral. Indeed, we are of the opinion that LA can be used to evaluate different pedagogic strategies and their effects on learning and teaching through the analysis of learner data. This can be defined as a specific pedagogically oriented objective under the current dimension, but, as we will discuss further below, certain technologies are not pedagogically neutral and this will influence the analytics process in one way or another. Educational data. LA takes advantage of available educational datasets from different Learning Management (LMS) and other systems. Institutions already possess a large amount of student data, and use these for different purposes, among which administering student progress and reporting to receive funding from the public authorities are the most commonly known. Linking such available datasets would facilitate the development of mash-up applications that can lead to more learner-oriented services and therefore improved personalisation. LA strongly relies on data about learners and one of the major challenges LA researchers are facing is the availability of publicly available datasets to evaluate their LA methods. Most of the data produced in institutions is protected, and the protection of student data and created learning artefacts is a high priority for IT services departments. Nevertheless, similar to Open Access publishing and related movements, calls for more openness of educational datasets have already been brought forward (Drachsler et al., 2010). Anonymisation is one means of creating access to so-called Open Data. Recently, Verbert et al., (in press) presented a state of the art review of existing educational datasets. How open educational data should be, requires a wider debate (cf. section on legal constraints below), but, already in 2010, several data initiatives where started to make more educational data publicly available: dataTEL challenge—The first dataTEL challenge was launched as part of the first workshop on Recommender Systems for TEL (Manouselis et al., 2010), jointly organized by the 4th ACM Conference on Recommender Systems and the 5th European Conference on Technology Enhanced Learning (EC-TEL 2010) in September 2010. In this call, research groups were invited to submit existing datasets from TEL applications that can be used for LA research purposes and recommender systems for TEL. dataTEL workshop—The “Datasets for Technology Enhanced Learning” workshop was organised at the third STELLAR Alpine Rendez-Vous in March 2011. During this workshop, related initiatives that are collecting educational datasets, and apply these in data-driven learning applications were presented, and challenges related to privacy and data protection were discussed. PSLC dataShop (Stamper, 2011) offers an open data repository that provides access to a large number of educational datasets. dataShop has data from students derived from interactions with intelligent tutoring systems. LinkedEducation.org (Dietze et al., 2012) is another initiative that provides an open platform to promote the use of data for educational purposes. At the time of writing, five organizations have contributed datasets. Despite these pioneering activities, it does, by comparison, still seem somewhat bizarre that in the commercial world with clicking the “register” button, the default access to all user data becomes owned by some company, whereas educational institutions operate on the default that everything is protected from virtually everyone. Distinguishing educational data by access rights in open and protected datasets (Figure 1) is not as simple as it sounds. Because the technical systems producing and collecting data are typically owned by the institution, the easiest assumption would be that this data belongs to them. However, which employees of the institution exactly are included in the data contract between a learner (or their parents) and the educational establishment, is as yet unresolved. This poses severe constraints on inner-institutional research or wider institutional use. We will bring up some more legal consideration under the point on external constraints below. Like in related research domains, LA datasets create a new set of challenges for research and practice. These include: - A lack of common dataset formats like the suggested one from the CEN/ISSS PT social data group (cf. CAM Schema at: https://sites.google.com/site/camschema/home; and Wolpers et al., 2007). - The need for version control and a common reference system to distinguish and point to different datasets. - Methods to anonymise and pre-process data according to privacy and legal protection rights (Drachsler et al., 2010). - A standardised documentation of datasets so that others can make proper use of it like that promoted by the dataseal-of-approval initiative (cf. http://www.datasealofapproval.org). - Data policies (licences) that regulate how users can use and share certain datasets. For instance, the Creative Commons licensing rights could be considered as a standard way to grant permissions to datasets. DataCite (Brase, 2009) is an organization that enables to register research datasets and to assign licensing rights to them, so that the datasets can be referenced similar to academic articles. From a technical point of view, idealised datasets probably remain the biggest challenge for analytics. This is to say that the assumption that datasets consist of context-free, meaningful and only meaningful data, is highly optimistic. In most natural settings, users “pollute” databases by producing erroneous or incomplete datasets. For example, teachers who want to see their students’ view on LMS courses often set themselves up as “test students” or create “test courses”. These are not always obvious, but need to be removed from the data to be analysed. Therefore empirical findings coming from a specific dataset are almost certainly affected by the context of data collection and processing. Similarly, data collection often leads to “enmeshed identities” being used for analytics and prediction. A dataset cannot typically distinguish between a single individual and a shared presence in the learning space (group work on a single device). Students who often work together with others on shared devices (laptops, smartphone, lab space, etc.) produce enmeshed fingerprints in their educational data. This may lead to behaviours being attributed to a logged-in identity that may actually have originated from an “invisible” partner. Standardised documentation of datasets can be seen as paramount to raise awareness of this danger. Additionally, from a pedagogic perspective, it remains an on-going challenge to formulate indicators from the available datasets that bear relevance for the evaluation of the learning process. The selection of specific data and their weighting (under the methods applied in the “instruments” dimension) against the real behaviour of students is of greatest importance, as is the process of relating behaviour pattern data to cognitive developments. Instruments. Different technologies can be applied in the development of educational services and applications that support the objectives of educational stakeholders. LA takes advantage of so-called information retrieval technologies like educational data mining (EDM; cf. Romero et al., 2008), machine learning, or classical statistical analysis techniques (cf. Figure 1), but other techniques may also be considered relevant, e.g., social network analysis (cf. Buckingham & Ferguson, 2011) or Natural Language Processing (NLP). Through these technologies, LA can contribute tailored information support systems to the stakeholders and report on demand. For instance, LA could be applied to develop a drop-out alert system. High drop-out rates are a challenging problem in education, especially distance education. Further research on LA can contribute to decrease the drop-out rate by developing e.g., a Drop-out Analyser that notifies the teacher of a course in time which students are in danger of falling behind or dropping out. This could be done by using LMS datasets and train a certain information retrieval technology (e.g., a Bayesian classifier) on the datasets to learn behavioural patterns of students that dropped out. Afterwards, the Drop-out Analyser could be applied on a follow-up online course and flag up students that show similar patterns. The teacher of the course could then intervene in an appropriate manner. Preliminary prototypes of such systems are already available, like the Blackboard Early Warning System. Under the dimension “instruments” in our model (Figure 1), we also subsume conceptual instruments such as theoretical constructs, algorithms, or weightings, by which we mean different ways of approaching data. These ways in the broadest sense “translate” raw data into information. The quality of the output information and its usefulness to the stakeholders depend heavily on the methods chosen. Hildebrandt (2010), quite rightly, warns that “invisible biases, based on … assumptions … are inevitably embodied in the algorithms that generate the patterns”. Competing methods, technologies and algorithms applied to the same set of data, will result in different outcomes, and thus may lead to different consequences in terms of decision making based on these outcomes. LA designers and developers need to be aware that any algorithm or method they apply is reductive by nature in that it simplifies reality to a manageable set of variables (cf. Verbert et al., 2011). External constraints. Many different kinds of constraints can limit the beneficial application of LA processes, some being “softer” than others. It has been suggested to us to identify them as ethical, legal, and social constraints, but also to feature organisational, managerial, and process constraints. This we find a useful subdivision of external limitations, but other divisions look equally logical. In the abstraction of the diagram above (cf. Figure 1), we propose the preliminary distinction of conventions, under which we count ethics, personal privacy, and similar socially motivated limitations, and, norms that are restricted by laws or specific mandated policies or standards. For reasons of space, we want to elaborate especially on the ethical aspects as this has grown into a field of much recent attention and debate (Bollier, 2010) and even spawned a collaborative effort in the Learning Analytics research community (Siemens, 2012). "

About this resource...

Visits 333

0 comments

Do you want to comment? Sign up or Sign in