formularioHidden
formularioRDF
Login

Regístrate

 

Dataset-driven Research for Improving Recommender Systems for Learning

InProceedings

In the world of recommender systems, it is a common practice to use public available datasets from different application environments (e.g. MovieLens, Book-Crossing, or EachMovie) in order to evaluate recommendation algorithms. These datasets are used as benchmarks to develop new recommendation algorithms and to compare them to other algorithms in given settings. In this paper, we explore datasets that capture learner interactions with tools and resources. We use the datasets to evaluate and compare the performance of different recommendation algorithms for Technology Enhanced Learning (TEL). We present an experimental comparison of the accuracy of several collaborative filtering algorithms applied to these TEL datasets and elaborate on implicit relevance data, such as downloads and tags, that can be used to augment explicit relevance evidence in order to improve the performance of recommendation algorithms.

1. First, we present an analysis of datasets that have been (or will soon be) made publicly available and that capture learner interactions with tools and resources in TEL settings. These datasets can be used for a wide variety of research on learning analytics. 2. Second, the paper presents an experimental comparison of the accuracy of several collaborative filtering algorithms applied to TEL datasets. 3. Third, we research the extent to which implicit feedback of learners, such as reading information, downloads and tags, can be used to augment explicit relevance evidence in order to improve the performance of recommender systems for TEL. The paper is organized as follows: Section 2 presents an analysis of datasets that capture learner interactions and that can be used for learning analytics. Section 3 presents an overview of existing recommendation algorithms, and in particular collaborative filtering algorithms, that can be applied to these datasets to suggest relevant resources to learners or teachers. Section 4 presents an overview of evaluation metrics that are commonly used to evaluate recommendation algorithms. Then, we present our evaluation results of the application of these algorithms to TEL datasets. We evaluate algorithms based on both explicit rating data and implicit relevance data such as tags and downloads that are available in some datasets. Results and opportunities for future research in this area are discussed in Section 6. Conclusions are drawn in Section 7. (6) http://adenu.ia.uned.es/workshops/recsystel2010/datatel.htm. 2 DataTEL Challenge. In this section, we present the objectives and results of the first dataTEL challenge that was targeted to collect TEL datasets. These datasets capture user interactions with tools and resources in learning settings and can be used for various purposes in the learning analytics research area. In this paper, we focus on the application of these datasets to validate recommendation algorithms and to tackle challenges to support recommendation for learning. 2.1 Objectives. In the world of recommender systems, it is a common practice to use public available datasets from different application environments (e.g. MovieLens, Book-Crossing, or EachMovie) in order to evaluate recommendation algorithms. These datasets are used as benchmarks to develop new recommendation algorithms and to compare them to other algorithms in given settings [7]. In such datasets, a representation of implicit or explicit feedback from the users regarding the candidate items is stored, in order to allow the recommender system to produce a recommendation. This feedback can be in several forms. For example, in the case of collaborative filtering systems, it can be ratings or votes (i.e. if an item has been viewed or bookmarked). In the case of content-based recommenders, it can be product reviews or simple tags (keywords) that users provide for items. Additional information is also required, such as a unique way to identify who provides this feedback (user ID) and upon which item (item ID). The user-rating matrix used in collaborative filtering is a well-known example. Although recommender systems are increasingly applied in TEL, it is still an application area that lacks such publicly available and interoperable datasets. Although there is a lot of research conducted on recommender systems in TEL, they lack datasets that would allow the experimental evaluation of the performance of different recommendation algorithms using comparable, interoperable, and reusable datasets. This leads to awkward experimentation and testing such as using datasets from movies in order to evaluate educational recommendation algorithms. This practice seems to lack the necessary validity for proving recommendation algorithms for TEL [18]. To this end, the dataTEL Theme Team of the STELLAR Network of Excellence7 launched the first dataTEL Challenge that invited research groups to submit existing datasets from TEL applications that can be used as input for TEL recommender systems. A special dataTEL Cafe event took place during the RecSysTEL 2010 workshop in Barcelona to discuss the submitted datasets and to facilitate dataset sharing in the TEL community. (7) http://www.teleurope.eu/pg/groups/9405/datatel/. 2.2 Collected Datasets. Seven datasets have been collected as a result of the first dataTEL challenge. In this paper, we use datasets that include usage related data (such as ratings, tags, reads or downloads) as a basis to demonstrate and evaluate social recommendation for learning. We present an overview of datasets that include such usage data, including information on the data elements that are available and basic statistics of the number of resources, users and activities that are stored. Some of these datasets are already publicly available, whereas others are still under preparation and not yet publicly accessible. An up-to-date overview of datasets is available at http://www.teleurope.eu/pg/pages/view/50630/. We expect an increasing amount of learning related datasets in the upcoming year. Mendeley dataset. The first dataset was submitted by Mendeley [13] and includes usage data of papers that are available through the Mendeley scientific portal8 . Mendeley is a research platform that helps users to organize research papers and collaborate with colleagues. In the context of learning, such a dataset provides useful data for recommender systems that are targeted to recommend papers to learners or teachers or to suggest suitable peer learners on the basis of common research or learning interests. Examples of paper recommenders that have been evaluated in TEL settings are InLinx (Intelligent Links) [2], Papyres [26] and pioneering work on the application of recommender systems in TEL conducted by Tang and McCalla [31]. Although research on paper recommenders has been elaborated more extensively in the Research2.0 domain that emerged in recent years, the dataset is currently one of the few available datasets that captures a very large set of user activities. This dataset can be used meaningfully for research on TEL recommender systems in contexts where papers are considered as learning resources. Five files are included in the Mendeley dataset that capture data since 2009: – Online catalog. The online catalog file contains metadata for 1.857.912 articles. Articles have a title, year, number of readers and abstract. – Online article view log. The online article view set include a random sampling of 200.000 users that are extracted from usage logs. Time at which each view occurred is provided. – Library readership. The library readership set includes 41.220 user libraries that contain more than 20 articles. From the 13.313.548 library entries, 2.655.578 (19.95%) have been read by users. – Library stars. The library stars set provides data on articles that have been starred by users. 186.976 (1.40%) of the 13.313.548 library entries have been starred by users. – Article tags. This collection contains 254.681 tags that were applied to 27.652 articles by 4.099 users. (8) http://www.mendeley.com/. Among others, this dataset is useful for research on (1) extraction of users interests, on the basis of articles that have been tagged, starred, read or added to libraries by users, and evolutions in these interests on the basis of time recordings, (2) identification of users who share common interests, on the basis of their usage behavior, and (3) identification of implicit quality/relevance indications of individual articles by analyzing their usage data. APOSDLE-DS dataset. The APOSDLE-DS dataset [1] originates from the APOSDLE9 project, which ran from March 2006 to February 2010. APOSDLE is an adaptive work-integrated learning system that aims to support learning within everyday work tasks. It recommends resources (documents, videos, links) and colleagues who can help a user with a task. The dataset captures 1500 user activities of 6 users during an evaluation period of 3 months. The activities captured are perform task, view resource, edit annotation, perform topic, selected learning goal, adapting experience level, adding resource to collection, being contacted, contacting person, browse data and creating new learning path. The dataset also includes 163 descriptions of documents and document fragments on which these activities were performed. From the collected data, the adding resource to collection action can provide direct information about the relevance of a resource. This action occurred 581 times within the evaluation period. Creating a new learning path is considered as an attempt to plan learning activities over a longer time period and can provide a solid basis for research on the recommendation of sequences of resources. Unfortunately, this action occurred only a few times (< 25). Also direct collaboration activities are rare: being contacted occurred 11 times and contacting person 69 times. Implicit data to cluster users who share similar interests or goals are available more extensively (149 perform task, 861 perform topic and 414 select learning goal activities). Whereas the current collection contains data of only a few users and may be too small for statistical analysis, the dataset provides a good example of relevant learning activities to be captured in learning settings. ReMashed dataset. The ReMashed dataset [10] was collected within the ReMashed environment10 that focuses on community knowledge sharing. The main objective of ReMashed is to offer personalized recommendations from the emerging information space of a community. The ReMashed dataset is based on aggregating contributions of the users in the ReMashed portal [9]. (9) http://www.aposdle.tugraz.at/ (10) http://remashed.ou.nl. This portal aggregates Web 2.0 contributions from a range of remote services (delicious, Youtube, Flickr, Slideshare, blogs, and twitter) of the users. The data collection started in February 2009 and is still ongoing. It includes information about interests (learning goals), bookmarks, tags, ratings and contents. Until now, 140 users are registered. In total, 23.000 tags and 264 ratings are given to 96.000 items. The ReMashed dataset includes only publicly available contributions from users. Although, the data is publicly available, the dataset is not prepared yet for public access as it requires anonymization and the commitment of the users. Organic.Edunet dataset. The Organic.Edunet dataset [21] was collected on the Organic.Edunet Web portal11 , a learning portal for organic agriculture educators that provides access to more than 10.500 learning resources from a federation of 11 institutional repositories. The portal mostly focuses on serving school teachers and university tutors and has attracted almost 12.000 unique visitors from more than 120 countries, out of which about 1.000 are registered users. This dataset contains data from the initial operational phase of the portal that took place in the context of the EC-funded Organic.Edunet project12 . The dataset was collected from January 2010 until September 2010 and includes information about 345 tags, 250 ratings and 325 textual reviews that these users have provided. The particularity of this dataset is the fact that ratings are collected upon three different dimensions/criteria: the usefulness of a resource as a learning tool, the relevance to the organic thematic, and the quality of its metadata. This allows for the deployment of an elaborate multi-criteria recommendation service within the portal. MACE dataset. The MACE dataset [36] originates from the MACE13 project, which ran from September 2006 to September 2009. The MACE portal14 provides advanced graphical metadata-based access to learning resources in architecture that are stored in different repositories all over Europe. Therefore, MACE enables architecture students to search through and find learning resources that are appropriate for their context. From 2007 until now, 1.148 users registered at the portal. The portal offers access to about 150.000 learning resources, from which 12.000 have been accessed by registered users. These objects hold together about 47.000 tags, 12.000 classification terms and 19.000 competency values. Tags were assigned by logged in users and the classification and competency terms by domain experts. Most user actions with the MACE portal were logged, including search activities, using facetted search, social tags, geographical locations, classifications and/or competencies, access of learning resources, download of resources, social tagging, including add tag, add comment and add rating, and access of user pages. (11) http://www.organic-edunet.eu (12) http://project.organic-edunet.eu (13) http://www.mace-project.eu/ (14) http://portal.mace-project.eu/ The time of each user activity is recorded. The dataset provides useful and rich data for various research purposes. In addition to explicit rating feedback, access time, downloads, tags and comments can provide useful implicit indications that can be used to gain knowledge about user interests. The availability of a relatively large set of both explicit and implicit relevance data makes this dataset a potentially useful candidate for recommender research. Travel well dataset. The Travel well dataset [35] was collected on the Learning Resource Exchange portal15 that makes open educational resources available from 20 content providers in Europe and elsewhere. Most registered users are primary and secondary teachers who come from a variety of European countries. The dataset contains data from the pilot phase which was conducted during the EC-funded MELT-project16 . These data were collected from August 2008 until February 2009 on 98 users. The dataset includes explicit interest indicators that can be used to infer the relevance of a resource for the user. Users can rate resources on a scale of 1 to 5 for usefulness and add tags to resources. In total, 16.353 user activities were recorded on 1.923 resources. The particularity of the dataset is that it contains information of the home country, mother tongue and spoken languages of users. Additionally, it has metadata on the origin of the educational resource and its language. The dataset thus allows tracking the interests of users on travel well resources, indicating that the user and resource come from different countries and that the language of the resource is different from that of the users mother tongue [34]. Additionally, this dataset is useful for research on extraction of teacher interests and identication of teachers who share common interests, on the basis of their tags and ratings. The availability of a relatively large set of such explicit relevance indicators makes this dataset a potentially useful candidate for recommender research in TEL. 2.3 Summary. Table 1 summarizes the details of the collected datasets, including information on the number of users, items and activities that are captured and details on the data elements that are provided. The MACE, Organic.Edunet and Mendeley datasets are the largest datasets that collected user data of 1.148, 1.000 and 200.000 users. The Travel well and ReMashed datasets contain ratings and tags of 98 and 140 users, respectively. The current sample of APOSDLE captures data of relatively few users. Of interest in this discussion are the data elements that are provided by the datasets. Explicit relevance feedback, such as ratings by users, are provided in the MACE, ReMashed, Organic.Edunet and Travel well datasets. These datasets provide ratings on a five point likert scale and are interesting datasets for evaluating recommender algorithms. Mendeley provides information on articles that are starred by a user (1 if the article has been starred and 0 otherwise), but the semantics of such stars in user libraries may be different for different users (i.e. a star can indicate relevance feedback, but may as well indicate that the user wants to read the article at a later stage). Therefore, the application of such data for recommendation is less straightforward. (15) http://lreforschools.eun.org (16) http://info.meltproject.eu/. Table 1. Overview datasets. In addition to ratings/stars, most datasets include additional user interactions, such as tags, downloads or the inclusion of a resource in a user library. In Section 5.2, we research the extent to which such activities can be used to improve the performance of recommendation algorithms. The APOSDLE dataset includes a wide variety of additional learner related activities, including tasks that are performed by a user, her learning goals and learning paths that she constructed. Whereas the dataset may be too sparse to draw conclusions at this point, the capturing of such activities has a big potential for building recommender systems for learning. The application of this dataset for recommendation for learning is further discussed in Section 6. At the time of writing, the Mendeley, MACE, APOSDLE and Travel well datasets are already publicly available. The Organic.Edunet and ReMashed datasets will be made publicly available soon, after clearing the remaining privacy issues. In the remainder of this paper, we report on experimental results with the datasets that are currently available. 3 Recommender Systems. Recommender systems apply data analysis techniques to help users find items that are likely of relevance. Recommender algorithms are often categorized into three areas: collaborative filtering, content-based filtering and hybrid filtering. Collaborative filtering is the most widely implemented and most mature technology [4]. Collaborative recommender systems recognize commonalities between users on the basis of their ratings or implicit relevance indications and generate new recommendations based on inter-user comparisons. Content-based filtering matches content resources to user characteristics [29]. These algorithms base their predictions on individual information and ignore contributions from other users. Hybrid recommender systems combine two or more recommendation techniques to gain better performance with fewer drawbacks [4]. In this paper, we evaluate the performance of collaborative filtering (CF) on TEL datasets. Similar experiments on TEL settings have been reviewed in Manouselis et al. [20]. The basic idea of CF-based algorithms is to provide recommendations based on the opinions of other like-minded users. The opinions of users can be obtained explicitly from the users or by using implicit measures. Two approaches are distinguished for recommending relevant items to a user: – User-based collaborative filtering computes similarities between users to find the most similar users and predicts a rating based on how similar users rated the item. In a first step, a user-based collaborative filtering algorithm searches users who share similar rating patterns with the active user. In a second step, ratings from these similar users are used to calculate a prediction for the active user. – Item-based collaborative filtering applies the same idea, but uses similarity between items instead of users. The approach was popularized by Amazon.com - i.e. users who bought x also bought y. In a first step, an item-item matrix is built that determines relationships between pairs of items. In a second step, this matrix and the data on the active user are used to make a prediction. Once the most similar items are found, the prediction is then, for instance, computed by taking a weighted average of the target user ratings on similar items. To enable empirical comparison of different approaches, we implemented different metrics to compute similarities between users and between items and different algorithms for computing predictions, including the standard weighted sum algorithm and simplified Slope One scheme [17]. The different approaches are presented briefly in this section. A more thorough review of various design options that can be considered for collaborative filtering algorithms can be found in [18]. We report on experimental results in Section 5. 3.1 User-based Collaborative Filtering. User-based collaborative filtering assigns weights to users based on similarities of their ratings with that of the target user [6]. For calculating the similarity between a target user u and another user v, different similarity metrics can be used. We first briefly present commonly used metrics. Then, we present the standard weighted sum algorithm for generating predictions based on these similarity computations. Cosine similarity. In this case, two users are thought of as two vectors in the m-dimensional item-space. First, the set of items (Iuv ) that both user u and user v have rated is selected. Then, similarity weights are calculated using the following formula FORMULA_(1). where rui is the rating of user u on item i and rvi is the rating of user v on item i. Basically, the cosine similarity between user u and user v is the angle between the ratings vector of user u and the ratings vector of user v. Pearson correlation. In this case, similarity between two users u and v is measured by computing the pearson correlation between them using the following formula FORMULA_(2). where rv and ru denote the average ratings for users u and v, respectively. In essence, this similarity measure takes into account how much the ratings of other users for an item deviate from their average rating value. Tanimoto-Jaccard. The Jaccard or Tanimoto Coefficient [32] measures the overlap degree between two sets by dividing the numbers of items observed by both users (intersection) and the number of different items from both sets of rated items (union). The similarity between two users u and v is defined as: FORMULA_(3). where |Iu | and |Iv | represent the number of items that have been rated by user u and user v, respectively. This similarity metric considers only the number of items that have been rated in common and ignores rating values. The metric can be applied on binary datasets that do not contain rating values. In addition, studies have shown that the metric is advantageous in the case of extremely asymmetric distributed or sparse datasets [24]. Prediction Computation. After computing similarity weights, top-K users with maximum weights are selected as experts. Suppose u is a test user and i is a corresponding test item. Let τu be the set of experts who have rated i. The predicted rating rui is computed as: FORMULA_(4). Basically, the approach tries to capture how similar users rate the item in comparison to their average ratings. If τu is empty, i.e. no expert has rated the test item i, then the average rating of the user is outputted as the prediction. 3.2 Item-based Collaborative Filtering. Item-based collaborative filtering applies the same idea, but uses similarity between items instead of users. Once similar items are found, predictions are computed by taking a weighted average of the target user ratings on these similar items. We briefly describe the similarity computation and the prediction generation. The description is based on [30]. Item similarity computation. The computation of similarities between items proceeds in a similar way than computing similarities between users in userbased CF. The basic idea in similarity computation between two items i and j is to first isolate the users who have rated both items and then to apply a similarity computation technique to determine the similarity wij . We illustrate the approach using the cosine similarity metric. Alternative similarity measures such as pearson correlation and tanimoto or jaccard coefficients (see previous section) are also commonly applied to calculate similarity between items. To compute the cosine similarity, we first isolate the co-rated cases (i.e., cases where the users rated both i and j). Let the set of users who both rated i and j be denoted by U, then the cosine similarity is given by FORMULA_(5). where rui is the rating of user u on item i and ruj is the rating of user u on item j. Thus, this formulation views two items and their ratings as vectors, and defines the similarity between them as the angle between these vectors. Prediction computation. In the case of item-based predictions, a weighted sum technique computes the prediction of an item i for a user u by computing the sum of the ratings given by the user on items similar to i. Each rating is weighted by the corresponding similarity wij between items i and j. Formally, we can denote the prediction of item i for user u as FORMULA_(6). Basically, this approach tries to capture how the active user rates the similar items. The weighted sum is scaled by the sum of the similarity weights to make sure the prediction is within the predefined range. Slope One scheme. The Slope One scheme [17] is an alternative scheme to compute item-based CF predictions that simplifies the implementation of standard item-based collaborative filtering algorithms. The scheme is based on a simple ”popularity differential”. Let the set of users who both rated i and j be denoted by U. Given a training set c, and any two items j and i with ratings ruj and rui respectively by some user u in U, then the average deviation of item i with respect to item j is considered as: FORMULA_(7). The slope one scheme then simplifies the prediction formula to. FORMULA_(8). Details are presented in [17]. The advantage is that this implementation of Slope One does not depend on how the user rated individual items, but only on the user average rating and on which items the user has rated. Experimental results are presented in Section 5. 4 Evaluation Metrics. In this paper, we focus on the measurement of accuracy and coverage of recommendation algorithms, which can be measured by offline analysis of data: – Accuracy measures how well the system generates a list of recommendations. Measures typically used are precision, recall and F1. Precision indicates how many recommendations were useful to the user, whereas recall measures how many desired items appeared among the recommendations. F1 is the harmonic mean of precision and recall - that is, (2 ∗ precision ∗ recall)/(precision + recall). – Predictive accuracy evaluates the accuracy of a system by comparing the numerical recommendation scores against the actual user ratings for the user-item pairs in the test dataset. Mean Absolute Error (MAE) between ratings and predictions is a widely used metric. MAE is a measure of the deviation of recommendations from their true user-specified values. The MAE is computed by first summing absolute errors of the N corresponding ratings-prediction pairs and then computing the average. The lower the MAE, the more accurately the recommendation engine predicts user ratings. Root Mean Squared Error (RMSE) and Correlation are also used as statistical accuracy metric. – Coverage is a measure of the percentage of items and users for which a recommendation system can provide predictions. A prediction is impossible to be computed in case that no or very few people rated an item or in case that the active user has zero correlations with other users. A more comprehensive review of evaluation metrics for collaborative filtering algorithms can be found in Herlocker et al. [12]. 5 Experimental Results. In this section, we present our experimental results of applying collaborative filtering techniques to TEL datasets. We used the Apache Mahout17 framework for comparing the performance of different collaborative filtering algorithms on datasets. Apache Mahout is an open source framework that provides implementations of standard item-based and user-based collaborative filtering algorithms and implementations of different metrics to compute similarities between users and between items, including pearson, cosine and tanimoto measures. First, we present results of collaborative filtering algorithms and the influence of different similarity metrics on datasets that contain ratings, including the MACE and Travel well datasets. We also compare these results with accuracy results of algorithms on the MovieLens dataset [6], that is often used by the recommender system community to evaluate algorithms. Then, we present results of collaborative filtering algorithms applied to binary data without ratings, such as data of Mendeley. In this set of experiments, we used implicit relevance indications such as tags and downloads as a basis to generate recommendations. 5.1 Collaborative filtering based on ratings. In a first set of experiments, we applied collaborative filtering algorithms to datasets that contain rating data. First, we compare the influence of different similarity metrics on collaborative filtering. For this first set of experiments, we selected all users from the MACE and the Travel well collection who provided at least 5 ratings. User ratings were randomly split into two sets - observed items (80%) and held-out items (20%). Ratings for the held-out items were to be predicted. We used the Mean Absolute Error (MAE) as the evaluation metric for predictive accuracy in this experiment. Results are presented in Figure 1. These results indicate that item-based CF based on tanimoto similarity outperforms item-based CF based on pearson and cosine similarity measures for both the MACE and Travel well datasets. (17) http://mahout.apache.org/. In contrast, the use of cosine and pearson measures on the MovieLens dataset improves predictive accuracy of item-based collaborative filtering. These results are consistent with previous experiments that demonstrate that the use of the tanimoto similarity measure on datasets that are very sparse, such as the MACE and Travel well datasets, is beneficial [24]. Fig. 1. MAE of item-based collaborative filtering based on different similarity metrics. In a second experiment, we compared results of item-based, user-based and slope-one collaborative filtering schemes. For each dataset, we used the best performing similarity measure. Results are presented in Figure 2 and indicate that also the best choice of algorithm is dataset dependent. In the case of MACE, standard item-based collaborative filtering outperforms user-based and slope-one collaborative filtering. For Travel well data, user-based collaborative filtering outperforms the other schemes. The simplified Slope One scheme gives the most accurate results for the MovieLens dataset - which is consistent with findings reported in [16]. Whereas predictive accuracy results of the best performing algorithms on MACE and Travel well data are comparable to reported results of collaborative filtering schemes applied to the MovieLens dataset, the major bottleneck of applying these collaborative filtering schemes to the collected TEL data is the limited coverage of the approach. In MACE, only 113 of 1.148 users provided explicit relevance feedback in the form of ratings. In addition, only 1.706 of 12.000 accessed resources were rated. In the Travel well dataset, more users have provided ratings (56 out of 98), but the number of resources that have been rated by multiple users is very small. In order to address these sparsity issues, we elaborate on the use of implicit relevance indicators and the use of binary data for collaborative filtering in the next section. 5.2 Collaborative filtering on implicit relevance data. Fig. 2. MAE of user-based, item-based and slope-one collaborative filtering. Implicit feedback techniques appear to be attractive candidates to improve recommender performance in the TEL domain, where explicit feedback ratings are often sparse. Behaviors most extensively investigated as sources for implicit feedback in other areas have been reading, saving and printing [14]. Morita and Shinoda [25] show that there is a strong tendency for users to spend a greater length of time reading those articles rated as interesting, as opposed to those rated as not interesting. This finding has been replicated by others in similar environments [15]. Other behaviors that have been explored include printing, saving, tagging and bookmarking [28]. We explore the use of implicit relevance data in the Travel well, MACE and Mendeley datasets. In addition to explicit rating data, the Travel well dataset includes 11.943 tags that are provided by 76 users on 1.791 resources. In the MACE dataset, 48.004 tags are provided by 283 users on 6.673 resources. In addition, MACE includes: (1) information about the access of resources (resultViewed event), including the date and time when the u

Acerca de este recurso...

Visitas 243

0 comentarios

¿Quieres comentar? Regístrate o inicia sesión