formularioHidden
formularioRDF
Login

Sign up

 

Analyzing Participation of Students in Online Courses Using Social Network Analysis Techniques

InProceedings

There is a growing number of courses delivered using e-learning environments and their online discussions play an important role in collaborative learning of students. Even in courses with a few number of students, there could be thousands of messages generated in a few months within these forums. Manually evaluating the participation of students in such case is a significant challenge, considering the fact that current e-learning environments do not provide much information regarding the structure of interactions between students.There is a recent line of research on applying social network analysis (SNA) techniques to study these interactions. And it is interesting to investigate the practicability of SNA in evaluating participation of students. Here we propose to exploit SNA techniques, including community mining, in order to discover relevant structures in social networks we generate from student communications but also information networks we produce from the content of the exchanged messages. With visualization of these discovered relevant structures and the automated identification of central and peripheral participants, an instructor is provided with better means to assess participation in the online discussions. We implemented these new ideas in a toolbox, named Meerkat-ED. Which prepares and visualizes overall snapshots of the participants in the discussion forums, their interactions, and the leader/peripheral students. Moreover, it creates a hierarchical summarization of the discussed topics, which gives the instructor a quick view of what is under discussion. We believe exploiting the mining abilities of this toolbox would facilitate fair evaluation of students’ participation in online courses.

"1. INTRODUCTION. There is a growing number of courses delivered using e-learning environments, especially in postsecondary education, using computer-supported collaborative learning (CSCL) tools, such as Moodle ,WebCT and Blackboard . Online asynchronous discussions in these environments play an important role in collaborative learning of students. It makes them actively engaged in sharing information and perspectives by interacting with other students [Erlin et al. 2009]. There is a theoretical emphasis in CSCL on the role of threaded discussion forums for online learning activities. Even basic CSCL tools enable the development of these threads where the learners could access text, revise it or reinterpret it; which allow them to connect, build, and refine ideas, along with stimulating deeper reflection [Calvani et al. 2009]. There could be thousands of messages generated in a few months within these forums, containing long discussion threads bearing many interactions between students. Therefore the CSCL tools should provide a means to help instructors for evaluating participation of students and analyzing the structure of these interactions; which otherwise could be very time consuming, if not impossible, for the instructors to be done manually. Up to now, current CSCL tools do not provide much information regarding the participation of students and structure of interactions between them in discussion threads. In many cases, only some statistical infor- mation is provided such as frequency of postings, which is not a useful measure for interaction activity [Erlin et al. 2009]. This means that the instructors who are using these tools, do not have access to convenient in- dicators that would allow them to evaluate the participation and interaction in their classes [Willging 2005]. Instructors usually have to monitor the discussion threads manually which is hard, time consuming, and prone to human error. On the other hand, there exists a large body of research on studying the participa- tion of students in such discussion threads using traditional research methods: content analysis, interviews, survey observations and questionnaires [de Laat et al. 2007]. These methods try to detect the activities that students are involved in while ignoring the relations between students. For example, content analysis methods, as the most common traditional methods, provide deep information about specific participants. However, they neglect the relationships between the participants while their focus is on the content, not on the structure [Willging 2005]. In order to fully understanding the participation of students, we need to under- stand their patterns of interactions and answer questions like who is involved in each discussion, who is the active/peripheral participant in a discussion thread [de Laat et al. 2007]. Nurmela et al. 1999 demonstrated the practicality of social network analysis methods in CSCL, as a method for obtaining information about relations and fundamental structural patterns. Moreover, there is a recent line of work on applying social network analysis techniques for evaluating the participation of students in online courses like works done by Sundararajan 2010, Calvani et al. 2009, de Laat et al. 2007, Willging 2005, Laghos and Zaphiris 2006, and Erlin et al. 2009. The major challenges these works tried to tackle are: extracting social networks from asyn- chronous discussion forums (might require content analysis), finding appropriate indicators for evaluating participation (from education’s point of view) and measuring these indicators using social network analysis. As clarified in the related works, Section 2, none of these works provides a complete or specific toolbox for analyzing discussion threads. However, they attempted to address one of these challenges to some extent. Here, we elaborate on the importance of social network analysis for mining structural data in the field of computer science and its applicability to the domain of education. for monitoring and evaluating participation of students in online courses. We propose Meerkat-ED, a specific and practical toolbox for analyzing interac- tions of students in asynchronous discussion forums of online courses. Meerkat-ED analyzes the structure of these interactions using social network analysis techniques including community mining. It prepares and visu- alizes overall snapshots of participants in the discussion forums, their interactions, and the leader/peripheral students in these discussions. Moreover, it analyzes the content of the exchanged messages in this discussions by building an information network of terms and using community mining techniques to identify the topics discussed. Meerkat-ED creates a hierarchical summarization of these discussed topics in the forums, which gives the instructor a quick view of what is under discussion in these forums. It further illustrates how much each student has participated in these topics, by showing his/her centrality in the discussions on that topic, the number of posts, replies, and the portion of terms used by that student in the discussions. In the follow- ing, we first introduce some basic backgrounds of social network analysis and elaborate on its applications in the context of on-line Education. We then present Meerkat-ED – our solution for social network analysis of online courses in Section 3 and illustrate its practicability on our own case study data in Section 4. 2. BACKGROUND AND RELATED WORKS. Social networks are formally defined as a set of actors or network members whom are tied by one or more type of relations [Marin and Wellman 2010]. The actors are most commonly persons or organizations, however, they could be any entities such as web pages, countries, proteins, documents, etc. There could also be many different types of relationships, to name a few, collaborations, friendships, web links, citations, information flow, etc. [Marin and Wellman 2010]. These relations represented by the edges in the network connecting the actors and may have a direction (shows the flow from one actor to the other) and a strength (shows how much, how often, how important). Unlike proponents of attribute based social sciences, social network analysts argue that causation is not located in the individuals, but in the social structure [Marin and Wellman 2010]. Social network analysis is the study of this structure. Rooted in sociology, nowadays, social network analysis has became an in- terdisciplinary area of study, including researchers from anthropology, communications, computer science, education, economics, criminology, management science, medicine, political science, and other disciplines [Marin and Wellman 2010]. Social network analysis examines the structure and composition of ties in the network to provides insights into: 1) understanding the central actors in the network (prestige); 2) detecting the individuals with the most outgoing connections (influence), the most incoming connections (prominence), and the least connections (outlier); 3) identifying the proportion of possible ties that actually exist (density); 4) tracking the actors that are involved in passing information through the network (path length); 5) find- ing the actors that are communicating more often with each other (community), etc. The availability and growth of large datasets of information networks makes community mining a very challenging research topic in social networks analysis. There has been a considerable amount of work done to detect communities in social networks [Palla et al. 2005], [Newman and Girvan 2004], [Chen et al. 2009], etc. 2.1 Social Network Analysis of Asynchronous Discussions in Online Courses In order to apply social network analysis techniques to assess participation of students in an e-learning environment, we need to first extract the social network from the e-learning course. Then we consider which measures show an effective participation, and finally report these measures in an appropriate way. Here, we give an overview of the previous works related to each of these phases. Fig. 1: This nanogram illustrates a comparison of participation of one group (blue lines) with the average participation of other groups (red lines) using the nine indicators defined by Calvani et al. 2009. Figure reproduced from [Calvani et al. 2009]. Extraction of Social Network. CSCL tools record log files that contain the detailed actions that occurring within them. Hence, log files include information about the activity of the participants in the discussion forums [Nurmela et al. 1999]. de Laat et al. 2007, Willging 2005, Erlin et al. 2009 and Laghos and Zaphiris 2006 used these log files to extract the social net- work underneath of discussion threads. Laghos et al. stated that they considered each message as directed to all partici- pants in that discussion thread while others considered it as only directed to the previous message. Gruzd and Haythornth- waite 2008 and 2009, proposed an alternative and more com- plicated way of extracting social networks, called named net- work. They argue that using this common method (connecting a poster to the previous poster in the thread) would result in losing much of the connections. Their approach briefly is: first using named entity recognition to find the nodes of the net- work, then counting the number of times that each name is mentioned in posts by others to obtain the ties, and finally weighting these ties by the amount of information exchanged in the posts. However, their final reported results are not that promising and even obtaining those results requires many man- ual corrections during the process. Regarding what we should consider as the participation in extracting the social network, Hrastinski 2008 suggested that apart from writing, there are other indicators of participation like accessing the e-learning environment, reading posts or the quantity and quality of the writing. However, all of these methods extracted networks just based on posts by student (writing level). Measuring the Effectiveness of Participation. Daradoumis et al. 2006 defined high level weighted (showing the importance) indicators to represent collaboration learning process; task performance, group function- ing, social support, and help services. They further divided these indicators to skills and sub-skills, and assigned every sub-skill to an action. For example, group functioning is divided into: active participation behavior, task processing, communication processing, etc. On the other hand, communication processing is itself divided into more sub-skills: clarification, evaluation, illustration, etc. and clarification is then mapped to the action of changing description of a document or url. In the education context, Calvani et al. 2009 defined 9 indicators for measuring the effectiveness of participation to compare different groups within a class; extent of participation (number of messages ), proposing attitude (number of messages with proposal label), equal participation (variance of messages for users), extent of role (portion of roles used), rhythm (variance of daily messages per day), reciprocal reading (portion of messages that have been read), depth (average response depth), reactivity to proposal (number of direct answers to messages with proposal label) and conclusiveness (number of messages with conclusion label); all summarized in a nonagon graph which shows the group interactions relatively to the mean behavior of all groups (Figure 1). However, for measuring the effectiveness of participation, most of the previous works simply used general social network measures (different centrality measures, betweenness, etc.), available in one of the common general social network analysis toolboxes. Sundararajan 2010, de Laat et al. 2007, Willging 2005, Erlin et al. 2009 used UCINET [UCINET] and Laghos and Zaphiris 2006 used NetMiner [NetMiner]. 3. SOCIAL NETWORK ANALYSIS FOR EDUCATION: MEERKAT-ED. In this section, we illustrate the practicability of social network analysis in evaluating participation of students in online discussion threads. We present our specific social network analysis toolbox, named Meerkat-ED, to analyze online courses. Meerkat-ED is designed for assessing the participation of students in asynchronous discussion forums of online courses. It analyzes the structure of interactions between students in these discussions using social network analysis techniques. It exploit community mining techniques in order to discover relevant structures in social networks generated from student communications and also information networks produced from the content of the exchanged messages. With visualization of these discovered relevant structures and the automated identification of central and peripheral participants, an instructor is provided with better means to assess participation in the online discussions. Meerkat-ED prepares and visualizes overall snapshots of participants in the discussion forums, their inter- actions, and the leader/peripheral students. It creates a hierarchical summarization of the topics discussed in the forums using community mining, which gives the instructor a quick view of what is under discussion in these forums. It further illustrates how much each student has participated on these topics, by showing his/her centrality in the discussions on that topic, the number of posts, replies, and the portion of terms used by that student in discussions on the topic. Meerkat-ED builds and analyzes two kinds of networks out of the discussion forums: social network of the students where links represent correspondence, and network of the phrases used in the discussions where links represent co-occurrence of phrases in the same sentence. Interpreting the first network shows the interaction structure of the students participated in the discussions. Furthermore, centrality of students in this network corresponds to their leadership in the discussions. In- terpreting terms network depicts the terms used in the discussion and the relations between these terms. Finding the hierarchical communities in this network demonstrates the topics addressed in the discussions. Choosing each of these topics outlines the students who participated in that topic and the extent of their participation. 3.1 Interpreting Students Interaction Network. Interpreting the network of interaction between students helps instructors monitor the interaction structure of students, and examine which students are the leaders in given discussions and who are the peripheral students. Here, we first describe how the network is extracted based on the information from the discussion threads. Then, we continue by bringing an analysis of leadership of the students based on their centrality in this network. The student network shows the interaction between students in the discussion forums, where the nodes represent students of the course and edges are the interaction between these students (i.e. messages exchanged). The edges are weighted by the number of messages passed between the two incident students. This network could be built both directed or undirected (chosen by the instructor); in the directed model, each message is considered connecting the author of the message to the author of its parent message. The leadership and influence of students in the discussions could be compared by examining the centrality of nodes corresponding to them in the network; as the nodes’ centrality measures their relative importance within a network. Moreover, students could be ranked more explicitly in a concentric centrality graph in which the more central/powerful the node is, the closer it is to the center (Figure 4). 3.2 Interpreting Term Network. Interpreting the term network, depicts the terms used in the discussions and the relation between these terms. Moreover, finding the hierarchical communities in this network, demonstrates the topics exchanged in the discussions. Furthermore, choosing each of these topics would outline the students who participated in that topic and the extent of their participation. In the term network, nodes represent noun phrases occurring in the discussions; and edges show the co-occurrence of these terms in the same sentence. Each co-occurrence edge contains the messages in which its incident terms occurred together; and is weighted by the number of sentences in which these terms co-occurred. For building this network, we need to first extract the noun phrases from the discussions, then build the network by setting the extracted phrases as nodes and checking their co-occurrence in all the sentences of every message for creating the edges. We have used the OpenNlp toolbox [OpenNlp] for extracting noun phrases out of discussions. OpenNlp is a set of natural language processing tools for performing sentence detection, tokenization, pos-tagging, chunking, parsing, and etc. Using sentence detector in OpenNlp, we first segmented the content of messages to their consisting sentences. The tokenizer was used to break down those sentences to words. Having the tokenized words, we used the Part-Of-Speech tagger to determine their grammatical tags – whether they are noun, verbs, adjective, etc. Then using the chunker, we grouped these words to the phrases, and we picked the detected noun phrases, which are sequences of words surrounding at least one noun and functioning as a single unit in the syntax. For obtaining better sets of terms to represent the content of the discussions, pruning on the extracted noun phrases was necessary. We removed all the stopwords, and split the phrases that have stop word(s) within into two different phrases. For example the phrase ”privacy and confidentiality” is split into two terms: “privacy”, and “confidentiality”. To avoid having duplicates, the first characters were converted to lower case (if the other characters of the phrase are in lowercase) and plurals to singular forms (if the singular form appeared in the content). For instance “Patients” would be “patients” then “patient”. As final modification, we removed all the noun phrases that just occurred once; which would prune most of unwanted phrases. The term Network could be further analyzed to group the terms co-occurring mostly together. These groups represent the different topics discussed in the messages and could be obtained by detecting the communities in the term network. This idea is similar to work done in Chen et al. 2008. For creating the hierarchy of the topics, we applied a community mining algorithm repeatedly to divide one of the current connected components of the network, until the size of all components is smaller than a threshold, or the division of any of the components would result in a loose partitioning. We used FastModularity [Clauset et al. 2004] as the community detection algorithm, however it could be any other community mining approach. Based on the detected term communities, the participation of students and how wide their participation are could be validated. In other words, students who participated in different topics could be considered more active than students that just talked about a smaller number of topics. This evaluation could be examined by selecting each student and checking how many topics he/she participated in. 4. CASE STUDY. In this section, we validate the feasibility of Meerkat-ED and illustrate its practical application on our own case study data. Here, Meerkat-ED is used for visualizing, monitoring and evaluating participation of students in the discussion forums. The data set we have used is obtained from a postsecondary course. The course titled Electronic Health Record and Data Analysis, and was offered in Winter 2010 at University of Alberta. The permission to use the anonymized course data for research purposes was obtained from all the students registered in the course, at the end of the semester so as not to bias the communications taking place. This data is further anonymized by assigning fake names to students and replacing any occurrence of first, last or user name of the students in the data (including content of the messages in discussion forums) with the assigned fake name. We also removed all email addresses from the data. In the chosen course, as is also usual in other courses, the instructor initiated different discussion threads. For each thread he posted a question or provided some information and asked students to discuss the issue. Consequently students posted subsequent messages in the thread, responding to the original question or to the response of other students. This course was offered using Moodle which is a widely-used course management system. Moodle like other CSCL tools, enables interaction and collaborative construction of content, mostly using its Forum tool which is a place for students to share their ideas [Moodle]. Only using Moodle, to evaluate student participation the instructor is limited to shallow means such as the number of posts per thread and eventually the apparent size of messages. The instructor would have to manually monitor the content of each interaction to measure the extent of individual participation, which is hard, time consuming and even unrealistic in large classes or forums with large volume, where different participants can be assigned to moderate different discussions and threads. To assess participation, we build and analyze two kinds of networks from these information: the social network of students and the network of the terms used by them. The instructor of the course denoted the usefulness of the results of these analysis in evaluating the participation of students in the course. Like in [Sundararajan 2010] where the authors noted that using SNA it was easy to identify the workers and the lurkers in the class, in this case study, the instructor reported that using Meerkat-ED it was easy to have an overview of the whole participation and it was possible to identify influential students in each thread as well as identify quiet students or unvoiced opinions, something that would have been impossible with the simple statistics provided by Moodle. More importantly, focusing on the relationships in the graph one can identify the real conduit for information rather than simply basing assessment of patrticipation on message size or frequency of submissions. Learners who place centarly in the network as conduit for the information control and can cause more knowledge exchange which is desirable in an online class. Regardless of the frequency of messages, their size or content, if they do not have influence, their authors remain marginal and sit on the periphery of the network (See Figure 4). This role of conduit of information versus mariginal students can change during the course of the semester or from one discussed thread to the other. The systematic analysis of centrality of participants per topic discussed provided by Meerkat-ED allowed a better assessment of the participation of learners at each discussion topic level. 4.1 Interpreting Students Interaction Network. As explained before, first of all we have to extract the students network from the discussion thread. Figure 2 shows the visualized network of students in the course. The size of the nodes corresponds to their degree centrality in the network – the number of incident edges. This means that the bigger a node is, the more messages the student represented by that node sent and received. The thickness of the edges in the net- work represents the weight of interactions which is based on the number of messages in the interaction of communicating students. Choosing an edge would bring up a pop up window that shows these messages as illustrated in Figure 3. The next step is to analysis the leadership of the students based on their centrality in this network. The nodes’ centrality is depicted by the size of the nodes in the visualized network as illustrated in Figure 2. Moreover, students could be ranked more explicitly in a concentric centrality graph in which the more central/powerful the node is, the closer it is to the center, as presented in Figure 4. 4.2 Interpreting Term Network. For this specific course, we extract the term network from the discussion forum. Figure 5 presents the visualization of this term network, where the size of the nodes represents the frequency of their corresponding terms and the thickness of edges represents the weight of the co-occurrences (i.e. the number of sentences in which incident terms occurred together). Fig. 2: Visualized Student Network: The left panel lists the students in the course. The right panel shows the social network of interaction of students in the course. The size of nodes corresponds to their centrality/leadership in the discussions. The width of edges represents the weight of communication between incident nodes. Fig. 3: Visualization of messages in an interaction: the interaction window shows the messages passed between nodes incident to the selected edge: Chloe and Eric. Selecting each message from the left panel would show its title, sender, receiver and content. Selecting an edge would show these messages as illustrated in Figure 6. In this visualization the instructor would see a list of the discussion threads in the course while selecting any set of those discussions/messages would bring up the corresponding term network, along with the list of terms occurring in them and the list of students that participated in these selected set of discussions/messages. Selecting any of these terms would show the students that used that term. Likewise, selecting any of the students would outline the terms used by that student, as illustrated in Figure 5; which is highlighting the terms discussed by the student named Chloe. Fig. 4: Comparing centrality of students: the students closer to the center are more central in the student network, i.e., have participated more in the discussions of the course. Likewise, the further from the center, the less the student was active; here James is the least active student in the discussions and is placed on the outer circle. Fig. 5: Visualized Term Network: The left panel lists the discussion threads in the course. The middle panel shows the network of terms in the selected set of discussions. The upper right panel shows list of students participated in the selected discussions, along with some statistics about their participation such as number of posts, replies, etc. The bottom right panel shows the terms used in these discussions. Selecting each student, would outline the terms used by that student. The difference between the number of terms discussed by the students could help the instructor to compare the participations of the students: students who discuss more terms participate more as well. In order to further analyzed the term Network, as explained before, we group the terms co-occurring mostly together. Figure 7a shows the detected topics (term communities) in the network given in Figure 5. The green nodes show the representative nodes of communities. Each representative node, contains 10 most central terms of the terms in the community it represents. The size of the representative nodes corresponds to the number of terms in their communities; while the size of the leaf nodes, terms, is related to their frequency, same as the term network. Similar to the term network, here also one could select a set of terms, usually within a topic, to see who participated in a discussion with that topic and to what extent, as illustrated in Figure 7b. Fig. 6: Co-occurrence of terms: selecting a co-occurrence edge would bring up a pop op window that shows the messages these incident terms co-occurred together in, highlighting the corresponding terms in the content. Fig. 7: Term communities (Topics): The gray circles outline the communities boundaries and the green nodes represent the community representatives. Each community representative is accompanied with its top 10 phrases in its community. These could be seen in the tooltip in the figure. Selecting each topic, would outline the students who participated in a discussion with the topic, and the terms in that topic. Here, the topic is roughly about ”patient, disclosure, confidentiality and society”. Moreover, students who participated in this topic and their contribution could be seen in the upper right panel. 5. CONCLUSIONS. In this paper we elaborated the importance of social network analysis for mining structural data and its applicability in the domain of education. we introduced social network analysis and community mining for studying the structure in relational data. We illustrated the place and need for social network analysis in study of the interaction of users in e-learning environments; then summarized some recent studies in this area. We also proposed Meerkat-ED, a specific and practical toolbox for analyzing students interactions in asynchronous discussion forums. Our toolbox prepares and visualizes overall snapshots of participants in the discussion forums, their interactions, and the leaders/peripheral students. Moreover, it creates a hierarchical summarization of the discussed topics, which gives the instructor a quick view of what is under discussion. It further illustrates individual student participation in these topics, measured by their centrality in the discussions on that topic, their number of posts, replies, and the portion of terms used by them. We believe exploiting the mining abilities of this toolbox would facilitate fair evaluation of students’ participation in online courses."

About this resource...

Visits 360

1 comment

Do you want to comment? Sign up or Sign in

2017/12/29 07:32

t's fortunate that I have gorgeous profile in informal community. The blog was absolutely incredible. Lots of extraordinary information which can be helpful in a few or the other way. Continue refreshing the blog, write my college paper for me looking forward for more contents. Incredible job and keep it up.