Generating Predictive Models of Learner Community Dynamics

InProceedings

Proceedings of 1st Learning Analytics and Knowledge (LAK2011), Feb 28 - Mar 1, 2011

2011 2011

In this paper we present a framework for learner modelling that combines latent semantic analysis and social network analysis of online discourse. The framework is supported by newly developed software, known as the Knowledge, Interaction, and Social Student Modelling Explorer (KISSME), that employs highly interactive visualizations of content-aware interactions among learners. Our goal is to develop, use and refine KISSME to generate and test predictive models of learner interactions to optimise learning.

"1 Introduction. The nascent field of Learning Analytics focuses on ""the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs"" (https://tekri.athabascau.ca/analytics/call-papers). One approach to learning analytics is social network analysis, which examines the patterns of interaction among learners. Social network analysis of, in particular, elearning is facilitated by the availability of digital data that are amenable to such analysis. Considerably less attention has been paid to the content of the artifacts around which the learners are interacting. Content analysis is time-consuming, painstaking, and detailed work. Without content analysis, however, claims about the nature of the dynamics among learners are left wanting. Understanding learning, it seems, requires digging deeply into the data that are available. In this paper we introduce a framework that interweaves social network analysis, semi-automated content analysis, information visualization, and applied economic theory to help us understand and optimise learning. We are interested in investigating research questions such as: Can we â€œpredictâ€ when particular interactions will result in learning? What are some characteristics of interactions of effective learning? This paper begins with a brief introduction and survey of relevant literature using social network and latent semantic network analysis (LSA) to analyze online discourse. Next, a description of the prototypic software environment (the Knowledge Space Visualizer or KSV) on which the new software (the Knowledge, Interaction and Semantic Student Model Explorer, or KISSME) is being developed is presented. The use of LSA in the generation of student models suitable for studies of collaborative learning is then proposed. Finally, we present a theoretical framework for understanding the dynamics of collaborative learning in terms of examining the outcomes of social and semantic interactions among participants. 2 Background. Wasserman and Faust [1] describe social network analysis (SNA) as a methodology that focuses on relationships and patterns of relationships. As such it â€œrequires a set of methods and analytic concepts that are distinct from the methods of traditional statistics and data analysisâ€ (p. 3). They cast SNA in the broader list of topics that have been studied using network analytic methods, including community [2], group problem solving [3-5], diffusion and adoption of innovations [6-8], and cognition [9, 10]. No matter what the objective of the study, though, network analysis focuses on the relations between units. Studies have explored the application of SNA to explore learning and knowledge construction in Networked Learning/Computer-Supported Collaborative Learning (NL/CSCL) environments. However, researchers have yet to achieve consensus on what methods to use. For example, de Laat, Lally, and Lipponen, [11] used content analysis, critical event recall and SNA to study interaction patterns. They suggest that SNA can be used to complement content analysis [12, 13] to describe and understand patterns of interaction in NL/CSCL. Of the various network metrics that are available (see [1]), these researchers focus on density and centrality. In contrast, Reffay and Chanier [14] applied SNA to determine the cohesion of groups engaged in CSCL. They argue that embedding tools that perform such analyses in the design of the learning environment itself may be more effective than time-consuming content analysis to support teaching and learning. The importance of time-based analyses has also been noted [15][16] The study by de Laat et al [11] was the first application of using SNA to illustrate how patterns change over time and the relationship of those patterns to teaching and learning. An important generalization from the literature is that the essential features to conduct SNA are two or more units, usually learners and the elucidation of the relationship between them. But there is another equally important type of network analysis to be considered in learning analytics and knowledge work: the network of ideas. Ideas, unfortunately, are difficult to delineate. 2.1 Latent Semantic Analysis. Latent semantic analysis (LSA) represents both a statistical technique and a model of human knowledge acquisition. Landauer and Dumais [17] propose LSA as a model that could answer the question, how do individuals know so much given as little information as they get? This problem is variously known as Platoâ€™s Problem, the â€œProblem of Induction"", the â€œpoverty of the stimulus"", or â€œthe problem of the expert"". (Platoâ€™s solution was that individuals possess innate knowledge and only need some stimulation to reveal it.) LSA provides a high-dimensional representation of the associations between words and the documents containing those words. The final output from LSA is a series of measures that describe the relationships between units such as words, documents, or words-and-documents. In LSA, each document or word is represented by a vector in high-dimensional latent semantic space. The vector is calculated by examining patterns of co-occurrence of words in a term-by-document matrix, which is subsequently simplified using Singular Value Decomposition (SVD). Thus, each document is represented by a vector of numbers, typically numbering between 100 and 300 elements. Whereas dimensions resulting from the application of SVD to data can typically be interpreted (e.g. the dimensions from Principal Components Analysis), the dimensions resulting from LSA are not typically interpretable. This limitation has made the interpretability of LSA-based analyses difficult in the past. Information visualization techniques seem to be a natural next step in interpreting LSA, and can be used to create meaningful representations of ongoing learning processes. Visualization of LSA-derived similarities may be problematic, though, due to an unacceptable reduction of dimensionality to two or three dimensions suitable for visualization from that which is optimal for LSA (typically around 300) [18]. 3 Software. In this section we describe software designed to support the visualization of learner models based on social and semantic networks. We present a description of the Knowledge Space Visualizer (KSV), a prototypic software system on which our new software, KISSME, is based. 3.1 The Knowledge Space Visualizer (KSV). KISSME extends the Knowledge Space Visualizer, which was developed by the first author for his doctoral dissertation. The KSV was designed to allow researchers to use computer-assisted two-dimensional visualization of learner-generated contributions to an online discourse space. In its simplest form this generates a graph in which nodes are contributions and links are relationships between those contributions such as ""reply"", ""reference"" and ""annotate"" (see Figure 1). These explicit relationships between contributions are based on the behaviours of the contributors. A learner, for example, can intentionally choose to make a contribution that is a reply to another learner's contribution. In the resulting graph the links are based on these behavioural relationships. Content is not considered. In addition to the explicit linkages defined by behaviours such as replying, referencing and annotating there exist implicit linkages between contributions to the discourse space. These implicit linkages concentrate on the similarity of the content of the contributions. Whereas human raters can evaluate the similarity between documents reliably and with good validity, it is very tedious and time-consuming work. There are a variety of automated and semi-automated techniques that can be used to determine the similarity of text-based contributions. One powerful technique is LSA, described above. Fig. 1. Structural relationships between contributions. Blue lines indicate ""build-on"" or ""replyto"" relationships. Magenta lines indicate ""reference"" links. The preceding examples are based on the use of a force-directed layout algorithm to position the nodes in to respect the strength of the ties between them while minimizing the distortion of the network of the relationships between the nodes. Other types of layouts are also possible. For example, other researchers [19] have highlighted the importance of chronology when studying the dynamics of learning communities. The KSV supports this sort of inquiry by facilitating the positioning of notes chronologically. More generally, the KSV supports the use of any categorical, ordinal, or continuous variable from the data set to define either of the axes for the display. So in addition to the use of a continuous chronological scale to define the horizontal axis, authorship can be used to define the vertical axis. An example of the resulting learner-time display is shown in Figure 2. Once contributions are positioned on whatever set of operationally defined axes the analyst has chosen, links between nodes can be overlaid without affecting the positioning of the nodes. For example, the behavioural links can be overlaid on the learner-time display to show how patterns of interaction change over time. An example of this overlay is shown in Figure 3. In a similar way, links between contributions based on latent semantic analysis can be overlaid on the same learner-time display to show the degree to which contributions are similar over time and authorship. More computationally intensive measures can also be visualized. For example, one can determine which contributions were opened (and possibly read) by a learner within some specified time interval before that contributor added a new contribution to the discourse space. An example of this sort of ""recency influence"" diagram is shown in Figure 4. Fig. 2. Chronological-authorial layout of contributions. Fig. 3. Chronological-authorial layout of contributions overlaid with structural links. Perhaps some of the most interesting diagrams that can be produced using the KSV are based on the superposition of different link types on the same layout. For example, one can overlay links of LSA-based semantic similarity atop those based on ""recency influence"" to investigate the degree to which the content of recently opened (read) contributions is reflected in new contributions. The KSV also allows the user to constrain the analysis by specifying beginning and end dates for the analysis. Rather than specifying the dates a priori, the user can manipulate the beginning and end dates with specially designed slider. In addition to being able to manipulate the beginning and end dates independently of one another, the user can manipulate both dates simultaneously, effectively providing time slices of the network graph. Fig.4. Chronological-authorial layout with overlaid with structural and recency links. One of the key innovations of the KSV was the use of flexible thresholds in the creation of network representations. This is what allowed us to create visualizations of LSA-based representations of texts. Rather than attempting to provide a twodimensional layout based on the first few dimensions resulting from the matrix decomposition used in LSA, our approach has been to determine the similarities between documents based on the cosines between the vectors representing documents. A graph is then created in which the nodes correspond to the documents and the edges correspond to the LSA-based similarities between them. A force-directed layout algorithm is then applied to the graph such that the positions of nodes in the twodimensional representation minimize the distortion of the (very low dimensional) representation. This representation of a maximally connected graph typically lacks clarity, and in typical cases where there are tens or hundreds of nodes the graph is essentially unintelligible due to the large number of edges. This problem of overly connected graphs also presents a conceptual problem: does it make sense to connect two document nodes if their LSA-based similarity is very low? Other researchers [20] have attempted to address the ""threshold problem"" but heir research suggests that no typical value of cosine threshold for determining document similarity exists. Our approach to tackle this problem is to provide the end user with control over the choice of threshold to use. We do so by providing a slider control in the software that allows the user to specify the cosine value below which edges are not drawn between document nodes. The dynamic nature of this control allows the user, for example, to examine patterns of cluster formation as the similarity threshold is varied. This provides an example of how visual approaches to learning analytics can provide solutions to previously intractable problems. The answer to the question of ""when are two documents (or ideas) different"" is typically ""it depends on what you're looking for"". Given a collection of documents generated by students on, for example, the physics of light. At the most permissive level of similarity threshold, all documents are related by virtue of being in the same language. This corresponds to a similarity threshold of zero. At a value slightly higher than zero, one could imagine the documents cluster into two groups: one about colours of light and one about reflection. As one raised the threshold higher yet one could imagine the colours cluster fragmenting into smaller clusters of related notes about topics such as rainbows, wavelength, and so on. The interactive nature of being able to manipulate the threshold supports this broad range of possibilities for determining the diversity of ideas that are present in discourse space. The Knowledge Space Visualizer, while providing powerful visualizations of multi-dimensional networks, has several limitations. First, it relies on the end user having an functional installation of a recent version of Java. Recent advances in browser-based technology -- specifically the widespread adoption of HTML5 -- has enabled the production of highly interactive browser-based visualizations. Perhaps more significantly, the KSV was limited by its focus on document-based networks. The KSV enables the visualization of relationships between documents, based on both explicit and implicit linkages, but other than examining patterns of authorship and coauthorship it was not particularly good at generating visualizations of author-based networks. We are working on creating next-generation software that will facilitate the examination of networks of authors. In its earliest versions, the KSV was highly tuned to data from Knowledge Forum. The KSV was recently enhanced to allow the importation of data from almost any data source that provided indications of authorship, chronology and content. The KSV was released as open source code and is maintained on Google Code at http://code.google.com/p/ksv. 3.2 Visualizing Student Models: The Knowledge, Interaction and Semantic Student Model Explorer (KISSME) Recent work has led to the implementation of a learner model based on interactions with other learners. The functionality of the KSV, in terms of being able to manipulate the threshold at which two nodes are considered similar enough to be joined by visible edges, was extended from document nodes to learner nodes. Put another way, a learner model based on social network analysis was created in the KSV and the implementation of a flexible threshold (based on the intensity of the interaction between any two learners) allowed researchers to investigate patterns of interaction. The KSV allowed the analyst to exercise considerable control over various parameters such as the intensity of interaction necessary to establish a social link between participants, as well as the date at which the social network was analysed. The ability of the analyst to vary these parameters allowed the detection of patterns of interaction that were previously obscured [21]. However, the network between authors was based solely on their patterns of interaction. No information about the content of their contributions was used in the generation of the graphs. The ability to model students or other participants and then to visualize those models in an interactive visualization environment offers the potential to gain insights into the nature and outcomes of interactions between learners. In the work with the STEF lab we constrained our analyses to focus on the social networks that formed among learners. While this approach revealed interesting patterns of interaction, we felt the results were incomplete because no attention was paid to the content of the learners' contributions to the online discourse space. Other researchers have conducted studies that meld automated interaction analysis with manual content analysis [11, 16]. However, manual content analysis represents the rate-limiting step in this sort of analysis. Because manual content analysis takes so long it is incommensurable with real-time analysis, which is one of our goals. Therefore, we are interested in using some sort of automated or semi-automated content analysis. For reasons specified earlier we have chosen to use latent semantic analysis to help us conduct automated content analysis. For our purposes, all that we are using LSA for is to generate mathematical representations of the participants' contributions to the discourse space. We can then use those mathematical representations in a variety of ways. LSA uses a vector representation of text. One characteristic of these vectors is that they are additive: the vectors of two documents can be added together to get the vector of the combined documents. We can extend this property to generate latent semantic models of participants by adding together the vector representations of all their contributions to the discourse space. This is not the first application of LSA to student modelling. Other researchers [22-24] have used LSA in student modelling but they have not focused on the collaborative nature of learning. Still others have extended techniques from earlier research on LSA to apply to e-learning contexts [25-27]. Zampa and Lamaire's recent work [23] builds on the notion of matching students to text based on the Vygotsky's Zone of Proximal Development. However, theirs is an individualistic model: the selection of ""stimuli"" is meant to effect individualized optimization of learning. Our approach is somewhat different: we are interested in combining information about patterns of interaction among participants with information about the content of those contributions. We too take a Vygotskian approach: that optimal learning will take place when interactions occur between individuals who are neither too similar nor too dissimilar from each other, based on the semantics of what they have written. This approach of combining social network analysis and latent semantic network analysis is an example of the sort of ""multi-dimensional"" network championed by Noshir Contractor [28]. Our current work includes the implementation of software that will allow us as researchers to examine the interplay of interactions between learners and the latent semantic models of those learners. We are interested in testing the Vygotskian hypothesis that uptake [29] is most likely to occur when the semantic relatedness of the corresponding contributor models is neither too high nor too low. We are also interested in simulations of learner interactions that take into consideration both interactions and semantic relatedness. This, we believe, would allow us to generate models of community dynamics in collaborative learning. Once we have simulation data that incorporates interaction and content we can make inferences about the characteristics result in the success (broadly defined) of some learning communities. 4 Game Theoretical Approaches to Understanding the Learner's Group Dynamics Our approach to understanding community dynamics is based on understanding the nature of the interaction between members of that community. We are examining a variety of theoretical approaches but one that seems particularly promising is the application of game theory [30] to interactions between users. This approach requires us to consider the outcomes of interactions between users in terms of ""payoffs"" to each player. Of course, different players can employ different strategies. We consider this to be part and parcel of learning: our hypothesis is that as learners gain expertise, they enhance their repertoire of learning strategies, and through experience they learn when to employ particular strategies. 5 Summary. We have proposed a framework that combines social network analysis and latent semantic analysis of online discourse. The proposal is speculative: previous work with latent semantic analysis has yielded promising results that may help us understand the nature of interactions among learners. Examining those interactions using a framework such as game theory may allow us to gain insight into the nature of community dynamics."

Acerca de este recurso...

Visitas 263

Guardar en Mi espacio personal
Enviar enlace

Categorías:

Learning Analytics and Knowledge (LAK)

Etiquetas:

0 comentarios

¿Quieres comentar? Regístrate o inicia sesión

¿Cómo puedes configurar o deshabilitar tus cookies?

Generating Predictive Models of Learner Community Dynamics

InProceedings