Social and Semantic Network Analysis of Chat Logs

InProceedings

Proceedings of 1st Learning Analytics and Knowledge (LAK2011), Feb 28 - Mar 1, 2011

2011 2011

Multi-user virtual environments (MUVEs) allow many users to explore the environment and interact with other users as they learn new content and share their knowledge with others. The semi-synchronous communicative interaction within these learning environments is typically text-based Internet relay chat (IRC). IRC data is stored in the form of chatlogs and can generate a large volume of data, posing a difficulty for researchers looking to evaluate learning in the interaction by analyzing and interpreting the patterns of communication structure and related content. This paper describes procedures for the measurement and visualization of chat-based communicative interaction in MUVEs. Methods are offered for structural analysis via social networks, and content analysis via semantic networks. Measuring and visualizing social and semantic networks allows for a window into the structure of learning communities, and also provides for a large cache of analytics to explore individual learning outcomes and group interaction in any virtual interaction. A case study on a learning based MUVE, SRIâ€™s Tapped-In community, is used to elaborate analytic methods.

"1 Introduction. Multi-user virtual environments (MUVEs) are used for many different purposes in a number of contexts, but the interaction within these environments can often lead to learning outcomes and resource sharing, and there is an increase in their use for learning communities. Communicative interaction within these environments is commonly conducted via Internet relay chat (IRC). Text boxes displaying IRC has been a successful tool for allowing for communicative interaction. However, IRC poses a difficulty for researchers seeking to analyze and interpret the communicative interaction since data is stored in the form of chatlogs, which can produce large volumes of text data. This paper discusses and applies procedures for the representation and analysis of chat interaction in learning based MUVEs, or learning taking place in any type of MUVE, as social and semantic networks. A description of the social and semantic network approach to human communication is presented followed by a review of parallel methodological techniques. Elaboration of methods presented in this research is covered along with sample outputs from a case study on SRIâ€™s Tapped-In (tappedin.org) community [1]. Finally, applications and future research possibilities are offered. This work was supported by NSF Award #0943147. The views expressed herein do not necessarily represent the views of NSF. 2 Concepts. 2.1 Communicative Interaction in MUVE. Although MUVEs have a wide array of uses, communicative interaction within the environment is often conducted through Internet Relay Chat (IRC). IRC is conducted in a semi-synchronous way, where comments posted appear almost instantly for other users to view and respond to. IRC is a much more real time mode of computermediated communication than listserv messages, bulletin boards, and email. Much like instant messaging (IM), IRC allows users to select a username that appears before each comment they post, allowing multiple users to comment and maintain conversational interaction. IRC interaction is conducted within a chat-box that displays all usersâ€™ comments along with their username in a log file. In addition to IRC interaction being semisynchronous, it is also persistent. The persistence of these interactions allow for the storage of all data as chatlogs, which can in turn be used for analyses of the usersâ€™ interaction. However, the nature of chatlogs as a dynamic, non-threaded interaction introduces some methodological hurdles regarding analysis. Posts to IRC conversations are generally quite short, usually a few words to several lines allowing the IRC interaction to allow for multi-participant semi-synchronous interaction, similar to face-to-face (FtF) conversation in the sense that many people can interact in the same communicative space [2]. However, IRC interaction departs from FtF in how interactional coherence is achieved, and users adapt to CMC interaction in interesting ways [3]. One of the principle differences is adjacency relevance, where an utterance in FtF interaction is generally relevant to the one before it. In IRC, this constraint is loosened: an utterance may be relevant to the one that appeared several lines before it. Herring [3] showed that users invent devices to manage interactional coherence in spite of the fact that IRC contributions can arrive out of sequence, violating conventions from in face-to-face conversation concerning the sequential organization of contributions. Since we cannot assume that adjacent contributions are relevant to each other, as they may be relevant to contributions a little while back, we need an estimate of this relevance. Estimates of who was talking to who are generated from the concept of chat proximity: these people are co-present, and contributions may be relevant to any of these things said recently. The constraint that contribution C_i is relevant to C_(i-1) is loosened to a temporal window. The algorithm introduced in the Methods section below is motivated by this unique interaction, and uses a temporal window to capture such non-adjacent utterances. Smith, Farnham, & Drucker [4] investigated the social life of small graphical chat spaces by analyzing Microsoftâ€™s V-Chat systems. The VChat research illustrates the usage patterns of graphical chat systems, illuminating the ways physical proxemics are translated into social interactions in online environments. Krikorian, Lee, Chock, and Harms [5] developed methods to study user proximity in graphical chat rooms, and found that various perceived demographics influenced the social â€œdistanceâ€ of avatars in the graphical chat environment. In addition to the spatial analysis, there have also been methodological advancements regarding the communicative content of virtual environments. Sack [6] generated conversation maps of newsgroup postings and described very large conversations by visualizing large amounts of interaction in newsgroups. Suthers, Dwyer, Medina, and Vatrapu [7] developed a framework for representing and analyzing distributed interaction within multi-user virtual environments, including some structural representation of interaction in sequential records of events. Rosen, Woelfel, Barnett, and Krikorian [8] explicated a methodology for semantic network analyses of IRC interaction in virtual worlds. Rosen & Corbit [9] developed network analytic techniques for the measurement and representation of the structure of networks from IRC interaction. Understanding the structure and content of the interaction provides an in-depth and unique window into MUVEs along several lines. First, network position can be used to identify network roles, similar to Turner et al. [10], identifying roles such as answer person and question person. Second, network analytic techniques can be employed on the subsequent data. Network visualizations can be generated allowing for visual and representational analyses, elements that have traditionally important to community research [11]. Finally, network analytics and representations can be used in cohort with semantic network analysis for a more complete understanding of the learning environments and interactions. 2.1 Social Networks. Social network perspectives focus on the structure of social systems. Individual characteristics are only part of the story: people influence each other, and ideas and materials flow throughout the network [13]. From the network perspective, the social environment can be expressed as patterns or regularities in relationships among interacting units. This section elaborates some of the network concepts and terminology used in the subsequent methods for the analysis of MUVEs. The form of network that will be utilized herein is a communication network. Communication networks are generally defined as the patterns of contact that are created by the flow of messages among communicators through time and space [see 2]. However, these flows are not clear in IRC interaction from an adjacency approach, and the algorithmic solution presented in this paper is way of deal with this problem. Communication network analysis identifies the communication structure, or communication flow. Relation ties (linkages) between actors are channels for the transfer (flow) of either material or nonmaterial resources, or for an association between actors. The ties that exist between the nodes can vary along several elements, including direction, reciprocity, and strength. Ties between actors can be measured as being either directional, or non directional. Ties that are directional indicate the movement from one point to another, such as the number of phone calls one person makes to another, or the degree of liking one person has for another. Additionally, these links can also be symmetrical or asymmetrical. If the link is directional and the relation has different values in each direction then the link is asymmetrical and lacks reciprocity. Non-directional links simply indicate an association of two actors in a shared partnership, such as two students being part of the same class. There are many measures of centrality for individual nodes, as well as how connected the entire network is; select measures are discussed below. Degree Centrality. The degree measure of centrality is calculated by counting the number of adjacent links to or from an actor in a network [12]. Freeman [14] conceptualized this measure as an indicator of individual activity, yet it does not capture system-wide properties of the network like density and centralization, discussed below. It does, however, represent the number of alternatives available to an individual in the network. While a relatively straightforward measure, degree centrality provides insight into individual contributions to the interconnectedness of the overall network [14]. Betweenness Centrality. Betweenness centrality measures the relative brokerage of an individual node i by indicating the number of nodes j that need to go through i to get to other nodes k that could otherwise not be reached. Betweenness centrality is calculated by the proportion of all geodesics linking j & k that pass through i, Î£ for all nodes. Density. Density is used to measure the completeness of the relations in a network, also called connectedness. Measured as the ratio of total links to possible links, density can identify networks as being sparse (relatively disconnected) or dense (relatively well connected). Centralization. Centralization measures the disparity, or variation of the individualsâ€™ centrality (which can be betweeness or degree centrality) in a given network. The higher a networks degree centralization is the more likely it is that few individuals are well connected while others are less connected. Conversely, the more decentralized a network is, the more equal the membersâ€™ centrality scores are. 2.2 Semantic Networks. In semantic network analysis, a specific text is analyzed to generate a measure of the degree to which words are associated. The association is then used to infer something about their meaning or the meaning of the context they were used in. One of the more common approaches is to generate the amount of co-occurrence between word-pairs within a particular set of text. Then, the co-occurrence measure of relatedness across a particular set of words can be used to group, cluster, or scale the words (or a specific subset, such as frequently occurring words. The groups or clusters can be used for analysis, or used to obtain additional measures for use in other analyses, or bases for formal content analysis [15, 16]. 3 Method. 3.1 Social Network Analysis. The structure of the communicative interaction within a MUVE may be examined through network analysis. Network analysis is a set of research procedures for identifying structures in social systems based on the relations among the systemâ€™s components, and is the methodology used to operationalize the network approach to interaction, discussed above. The basic network data set is an n x n matrix S, where n equals the number of nodes in the analysis. A node is the unit of analysis; in the current research a MUVE participant will be considered a node. Each cell, Sij, indicates the strength of the relationship, which would typically represent the amount of communication from person i to person j. Since there is no inherent direct communicative relationship between individuals in IRC interaction, the relationship used herein assigns relational strength by capturing temporal proximity of contributions in IRC. Relationships in networks are analyzed as directional when possible, and in the current study direction is established based on the ordering of contributions within a temporal window in IRC. This method provides the directional differences between all analyzed parties, representing the communication matrix. Network mapping procedures are used to generate sociogram maps that visually represent the networks created using above procedures. These will allow for the visual analysis of other network data, as well as elaborate cliqueâ€™s and network roles that can remain cloaked when only analyzing numerical outputs. MUVE Communication Matrix Formation. To generate the n x n matrix used in the analysis of MUVE interaction, a process was developed that extracts the strength of the relationship between each cell, Sij. Since IRC is logged temporally based on the sequential comments of participants, methods can be used to generate relational strength based on proximity in the interaction. The algorithm includes several parameters to generate relational data from IRC interaction. Using the time stamp that accompanies all posts, a temporal parameter was used to help insure that a user is not considered connected to all users that posted after their post. This parameter can be set for use based on the context, as some interactions are faster moving than others; the current study used a limit of 120 seconds before a users connection was reset. See Table 1 for Pseudocode of the algorithm used. The algorithm is O(n), where n is the number of records in the chat dataset. Each record contains timestamp, userid, and contribution. The algorithm assumes that chat records in the dataset are sorted by timestamps. The algorithm was implemented in the Java programming language. The window size parameter was set to 120 seconds. Table. 1. Pseudocode of the algorithm used for communication matrix formation 3.2 Semantic Network Analysis. The method used in this study adapts and implement neural-based content analysis software to observe Internet communication patterns in chat rooms [8]. This implementation uses Catpacâ„¢ [17], a developed and proven semantic network analysis package that has the capability to extract word patterns and clusters. Clusters are extracted by sliding a text-window through the text and associating each word in the window with a neuron in an artificial neural network. Using a proprietary variation of an interactive activation and competition algorithm, connection strengths or weights are generated as a function of the coactivation patterns among the neurons. These weights in turn serve as the basis of cluster analysis and Galileo mappingâ„¢ [18]. Catpac has been used for the study of traditional text [20], such as articles and long response questionnaires. It has been successful in revealing clusters of associated words in text that provided helpful quantitative data to support qualitative interpretations. One of the most important aspects of the method used in the procedure discussed in this paper is the ability to analyze data based on set parameters. For this, an algorithm has been developed that parses chat data into separate and interrelated files used to determine individual, group, and systematic organizational patterns over time. This becomes useful when combined with a qualitative analysis where the researcher has an ethnographic understanding of the community members, whereas there is a ""name file"" that allows for directed analysis and the labeling of contributions. For example, if the learning community were associated with a large undergraduate class, the teacher would have the ability to observe semantic clusters extracted from any designated groupsâ€™ communication (e.g. freshmen, non-majors, etc.). If the analysis was on a mentor-based learning community one could observe the difference between communication originating from mentors/teachers as compared to student users. Other uses bridge to industry, where virtual task groups' general learning interactions could be parsed, revealing both potentially positive and negative trends in the interaction. 4 Outputs. Outputs below were generated from SRIâ€™s Tapped In [1], a virtual organization that hosts the content and activities of many thousands of education professionals annually in more than 8,000 user-created spaces that include IRC, threaded discussions, shared files and URLs, and other tools to support collaborative work. Education agencies and institutions of higher education use Tapped In to meet the needs of their students and faculty. Also, approximately 40-60 community-wide activities per month are explicitly designed to help connect members, and groups are often formed after members meet in these activities. Analytic outputs below were generated from a single chat session of a Tapped In user group that deals with the use of wikis in the classroom. The session was 1 hour long with 62 participants. 4.1 Social network analytics and visualization. The analytics employed in the current methodological explication are Degree Centrality, Betweenness Centrality, Density, and Centralization. The degree centrality for the users in the Tapped-In group can be found in Table 2, and the visualization of the network can be found in Figure 1. Network centralization is 6.372% (Outdegree) and 4.104% (Indegree), indicating a decentralized network. Network density is 46.20, indicating a fairly dense network. The centrality measures presented in Table 2 offer interesting insights into the user interaction within the chat. First, the degree centrality indicates the number of incoming and outgoing connections via chat posts. The values have been normalized relative to the number of users. The in-degree measures indicate the number of message posts within the time window that were pointed back to that user, and the out-degree indicates the number of messages that user posted pointing back to other users within the window. A few users were the most active, with some slight differences between their in- and out-degree values. However, since the centralization measure was very low, the distribution of interaction was indeed spread through the network, without a core group of users that were substantially outweighing a periphery. The betweenness centrality scores indicate that there are indeed several users that have very high scores, and thus act as bridges of information in the network. These users connect other individuals that could not otherwise reach many of the people in the network. This bridging role can be seen in the visualization of the network in Figure 1. It is clear that user 7, 44, 37, and 27, who have the highest betweenness centrality, are structurally positioned in the network between many other users that would otherwise be disconnected. The sociogram also reveals several cliques (i.e. clusters) with in the network, as well as a few people that are not very well connected, often connected only to one other user, such as 14 and 56. Table 2. Degree and Betweenness centrality rankings for users. Degree centralites below 1.0 and betweenness centralities below 0.5 have been removed from table. User IDâ€™s correspond to sociogram in Figure 1, and have been anonymized from original user logon names Fig. 1. Sociogram of chat-based user interaction generated using algorithm in Table 1. Thickness of lines indicated tie strength and arrows indicate direction of flow. Using network analytic measures, such as centrality and density, provide for structural analyses of IRC interaction based on chat proximity. These measures and visualizations can thus be used to decipher effects along different levels of granularity; at the micro level, user roles, such as bridging or leader roles, can be identified, at the meso level one can identify group formation through clique detection [see 21 for elaboration], and at the macro level overall network centralization and density can be calculated. These measures, however, represent only a small subset of possible analytic approached afforded by network analysis, and measures should be chosen that help explain the phenomena being explored. 4.2 Semantic network visualization. Using the neural network engine in the CATPAC package allows for the semantic analysis of parsed content. The plot in Figure 2, produced using ThoughtView [19], contains a multi-dimensional scaling representation of the top 40 words from the entire 1-hour of IRC from the group analyzed. Usernames were automatically stripped from the data for the analysis of the complete interaction. There are some contextual issues regarding automated textual analysis, like the occurrence of errors in user typing, abbreviations, and icons. Additional outputs available, but not included here, include dendograms, frequency lists, and two-dimensional plots [19].Additional parsing of content into individual text files is also possible with the algorithm, allowing for analysis of specific user content based on other parameters such as demographics, individual network metrics (e.g. centrality), learning outcomes, etc. Extracting semantic clusters from user activity in IRC can allow for further exploration of contributions from specific users identified in the social network analysis as relevant to research agendas. For example, there may be interest in analyzing the content of contributions from very central participants, or from participants that became more central over time (if longitudinal data is used). The individualized outputs are not included in this paper due to space limitations. Fig. 2. Multi-dimensional plot of word clusters. 5 Conclusion. This paper presented two approaches for the analysis of learning communities using IRC, and learning interaction in IRC. A social network approach for structural analysis is paired with a semantic network approach for content analysis. An algorithm was introduced for the formation of network matrices from IRC interaction. Similar versions of the social and semantic approaches discussed above have been introduced separately in earlier papers [8, 9], but the algorithm used for the network analysis is introduced here for the first time, as well as implications of combining the two procedures. One of the shortcomings of using the proposed algorithm to create social network ties in IRC from a temporal approach is that ties are an abstraction from chat interaction, rather than the traditional bilateral connections between actors. Unfortunately, some online learning environments offer little other evidence of social connectivity, and the chat proximity analysis offers a window into the social structure of chat interaction. One of the strengths of the technique is that latent or informal networks can be discovered from the interaction that may have otherwise been cloaked from analysis. Evidence of social network ties in learning communities can exist (such as direct messaging), but these connections represent intentional ties where users are choosing to be connected to each other. There is potential utility in uncovering informal network connections that may represent bridging between otherwise disconnected social groups, as well as pivotal moments where an idea or discussion has migrated across community boundaries. Communities can exist informally, and future research should employ clustering analysis and clique detection to enable automatic community detection."

Acerca de este recurso...

Visitas 268

Guardar en Mi espacio personal
Enviar enlace

Categorías:

Learning Analytics and Knowledge (LAK)

Etiquetas:

0 comentarios

¿Quieres comentar? Regístrate o inicia sesión

¿Cómo puedes configurar o deshabilitar tus cookies?

Social and Semantic Network Analysis of Chat Logs

InProceedings