formularioHidden
formularioRDF
Login

Sign up

 

Edu-mining for Book Recommendation for Pupils

Inproceedings

This paper proposes a novel method for recommending books to pupils based on a framework called Edu-mining. One of the properties of the proposed method is that it uses only loan histories (pupil ID, book ID, date of loan) whereas the conventional methods require additional information such as taste information from a great number of users which is costly to obtain. To achieve this, the proposed method solves the book recommendation problem as a problem of loan date prediction, relying solely on loan histories. Experiments show that the proposed method achieves an accuracy of 60% and outperforms the method (weighted slope open collaborative filtering) used for comparison. In addition to the performance, the proposed method has the following two advantages: (i) it is inexpensive compared to the conventional methods and (ii) reading level is adjustable.

"1. These facts imply that Edu-mining has to solve the problems that arise from the differences between educational data and normal data in its own scheme. Second, it prefers simple and inexpensive techniques. It should be implemented at moderate cost since it mainly aims at the use in school. Also, the target users of Edu- mining are mainly teachers and/or students (including pupils). If the used techniques are simple, the target users are likely to use them easily. Besides, they may sometimes be able to give feedback on the techniques. Third and finally, whereas data/text mining aims at improving the quality of the mined knowledge, it is not necessarily the case in Edu-mining; its ultimate goal is to achieve good educational outcomes. In case of normal book recommendation, the ultimate goal is to find and recommend books that the user wishes to read (and probably to purchase). By contrast, in case of our task, this is not the ultimate goal; the ultimate goal is to facilitate their intellectual development by recommending proper books. This is the basic concept of Edu-mining. The next section describes the basic idea of the proposed method based on Edu-mining. 3 Basic Idea. So far, we have seen the basic concept of Edu-mining and its relation to book recommendation for pupils. This section describes the basic idea of the proposed method based on Edu-mining. In book recommendation for pupils, the peculiarity of the data is that taste information obtained from pupils may be unreliable as Section 1 describes. The proposed method overcomes the problem by not using taste information. Instead, it solves the book recommendation problem as a problem of loan date prediction. It uses a simple and intuitive way to predict loan dates. Before describing the basic idea of the proposed method, let us introduce a new loan date called absolute loan date. Loan date normally has the form of date, month, and grade of the pupil (e.g., 1st Sep. 1st grade). This form of loan date is not suitable for the calculation used in the proposed method as we will see below. So, absolute loan date is used instead of the normal loan date. Absolute loan date is a simple mapping of the normal loan date. The first day of the first grade is the base date and mapped to 0. Other loan dates are simply mapped to the corresponding absolute loan dates of which distance from the base date is given by the number of days from the first day of the first grade. For example, a month later from the first day is mapped to 30 (or 31), the first day of the second grade is mapped to 365, and so on. Figure 1 illustrates the mapping between normal loan dates and absolute loan dates. Figure 1. Mapping between normal loan date and absolute loan date. Here, it is worthwhile to note that absolute loan dates roughly correspond to reading levels. Namely, first grade pupils tend to borrow books of low reading levels whereas upper grade pupils tend to borrow books of higher reading levels. This implies that if one can predict absolute loan dates, s/he can also estimate reading level. This is why reading level is adjustable in the recommendation of the proposed method. Now let us describe the basic idea of the proposed method. The proposed method solves the book recommendation problem as a problem of loan date prediction as already mentioned. This is equivalent to saying that the proposed method predicts absolute loan dates from loan histories. Once it predicts absolute loan dates, it can easily recommend books to the target pupil because it knows when s/he will borrow the books s/he has not borrowed yet. Simply, it recommends books which are predicted to be borrowed at the day of the recommendation or near the day. Or, if one wishes to recommend a book of a higher reading level, it can recommend books which are predicted to be borrowed some days (say, a half year later) after the day of the recommendation; the opposite can also be done. To see how the proposed method predicts absolute loan dates, suppose that we have loan histories shown in Figure 2 where loan dates are expressed by absolute loan dates. Figure 2 shows, for example, that pupil A borrowed book A on the absolute date 365 (equivalently, the first day of the second grade). Further suppose that we are predicting the absolute loan date of book B for pupil A (the question mark in Figure 2 denotes that pupil A has not borrowed book B yet). If we look at the loan history of pupil B, we will notice that s/he borrowed book B 370 days after book A. Based on this, it is natural to predict the absolute loan date of book B for pupil A to be 370 ( 670 300) days after the loan date 365 of book A for pupil A, or equally 735 ( 365 670 300 ). Similarly, based on the loan history of pupil C, it is natural to predict the absolute loan date of book B for pupil A to be 725 ( 365 710 350 ). To obtain the final prediction, we take the average of the two absolute loan dates, that is, 735 725 /2 730 (equivalently, the first day of the third grade). It should be noted that adding the average of the differences between the loan dates to the loan date of the base book gives the same result. For instance, 730 = 365 + {(670 - 300) + (710 - 350)}/2. Figure 2. Example of loan histories. Although the loan histories in Figure 2 involves only three pupils and two books for illustration purpose, actual loan histories often involves far more pupils and books. Therefore, the average is taken over the relevant books and the relevant pupils in actual use. A rough definition of relevant pupils and relevant books is as follows (the next section will describe the strict definition). A relevant pupil is those who have borrowed the following two books: (a) one of the books the target pupil borrowed and (b) the book of which absolute loan history is to be predicted. A relevant book is the book that is borrowed by (i) the target pupil and (ii) one or more of the relevant pupils. This is the basic idea of how the proposed method predicts absolute loan dates from loan histories. The next section describes the prediction method in detail. 4 Proposed Method. To formalize the prediction method, we will use the symbol and to denote a pupil and a book, respectively, in the given loan histories. We will also use the symbol , to denote the absolute loan date when the pupil borrowed the book ; if the pupil has not borrowed the book yet, then , is set to 1. Now, let be the target pupil (target for book recommendation) and be the book of which absolute loan date is to be predicted. Then, a relevant pupil is those who have borrowed both and one of the books the target pupil borrowed. Thus, a set of relevant pupils is defined by FORMULA_1. where denotes one of the books the target pupil borrowed. Also, a relevant book is a book that satisfies the following two conditions: (i) a book that the target pupil borrowed, and (ii) a book of which relevant pupil exists 1 . Using Equation (1), a set of relevant books is defined by FORMULA_2. Using Equation (1) and Equation (2), absolute loan dates are predicted by FORMULA_3. Here, corresponds to the simple prediction of absolute loan dates discussed in the basic idea in Section 2 (for instance, 365 670 300 ). The sums in the numerator are the total sum of the simple predictions over the relevant pupils and the relevant books; in the case of the same example, the sums correspond to 725 730. The denominator is the number of simple predictions. Hence, Equation (3) gives the average of the simple predictions. In case of meaning that the proposed method cannot predict the absolute loan date. Intuitively, books whose absolute loan date is given by Equation (3) are similar, in terms of the topic, to the books that the target pupil borrowed because the average is taken over the relevant books and relevant pupils; the average is taken over the relevant pupils who have borrowed some of the same books as the target pupil and over the books that the relevant pupils have borrowed. In other words, the book preferences of the target pupil are implicitly included in the prediction through the relevant pupils and the relevant books. Furthermore, the similarity of each relevant book is considered in Equation (3). This can be seen by noting that Equation (3) can be rewritten as FORMULA_4. In the rewritten version of Equation (3), the base date , (the first term in the numerator) is weighed by the factor which denotes the number of pupils that borrowed both and . It is reasonable to think that the more pupils borrow two books, the more similar the two are, and in turn it is reasonable to give a higher weight to such a pair in the prediction. Equation (3) exactly does this. Also, it should be noted that the denominator can be regarded as the credibility of the prediction because it denotes the number of relevant pupils and relevant books involved in the prediction. The prediction is not reliable if it is made based on few relevant pupils and few relevant books. Considering this, predictions whose where denotes a certain threshold are discarded in the book recommendation. Once absolute loan dates are predicted for the books that the target pupil has not borrowed yet, the proposed method recommends books to the target pupil as follows. It recommends books which are predicted to be borrowed at the day of the recommendation or near the day; here ( 5 or 10, for example). Or, if a teacher wishes to recommend (or the target pupil wishes to read) books of a higher reading level, it recommends books which are predicted to be borrowed some days after the day of the recommendation. If one wishes the opposite, it recommends books which are predicted to be borrowed some days before the day of the recommendation. The amount of days can be chosen by an intuitive way to specify reading level. Recall that absolute loan date is simply the one to one mapping of normal loan date. If one sets the amount to 365 days after, it corresponds to specifying a one-grade-higher reading level. 5 Evaluation. For evaluation, we collected loan histories of pupils in an elementary school where the grades range from first to sixth. Table 1 shows the statistics on the loan histories. Table 1. Statistics on the loan histories used for evaluation. In the evaluation, we conducted two experiments. In the first, we evaluate how accurately pil the proposed method can recommend books similar to the books that the target pu borrowed, which is described in 4.1. In the second, we evaluate the capability of the proposed method in estimating reading level, which is described in 4.2. 5.1 Experiment on Book Recommendation Accuracy. The experimental conditions and procedures are as follow. First, we randomly selected 10 target pupils (two for each grade, from first to fifth grade) from the loan histories; pupils in sixth grade ware not included in the experiment because of the limitation of the proposed method which will be discussed in Section 5. Second, we predicted absolute loan dates for the target pupils using the proposed method; the threshold , which was discussed in Section 3, was set to five. Third, we selected five most difficult books and five easiest books for each pupil according to the predicted loan date. Then, the 10 books were shown to two elementary school teachers together with the corresponding loan history. Finally, the two teachers separately rated each book as similar (to one or more of the books in the loan history in terms of its topic), not-related, or unknown referring to the corresponding loan history. The performance of the proposed method was measured by accuracy. Accuracy was defined by FORMULA_5. For comparison, we implemented the weighted slope one collaborative filtering [2], which had been shown to be effective in item recommendation. To fully implement the weighted collaborative filtering, we need taste information for each book as described in Section 1. However, normal loan histories such as the ones used in this evaluation, do not contain taste information. For this reason, we implemented the weighted collaborative filtering with the loan histories in which an equal rating was given to all books. Doing so, it can recommend related books but cannot rank recommended books; all books are equally favored. So, 10 books were randomly chosen from the recommended books and shown to the two teachers for evaluation. The performance was measure by accuracy as in the proposed method. Table 2 shows the results. It shows that the proposed method achieves an accuracy of 0.600. This means that on average, six out of the 10 books recommended by the proposed method are related to the books that the target pupil borrowed. It seems to be not so difficult for teachers or even pupils to select related books from the recommended books which are actually related 60% of the time. Table 2. Evaluation on book recommendation accuracy. Table 2 also shows that the proposed method outperforms the weighted slope one collaborative filtering. Indeed, the difference between the two is significant (normal approximation to the binomial test, p<0.01). The performance of the weighted slope one collaborative filtering implies that its recommendation may confuse teachers and pupils because more than half of the recommended books are not relevant. 5.2 Experiment on Reading Level Estimation. The experimental conditions and procedures are as follow. First, we made five pairs of books by randomly selecting a book from the five most difficult books and a book from the five easiest books which the proposed method recommended to each target pupil (50 pairs in total). Second, we randomly labeled the two books in each pair as A and B. Third, four human raters (undergraduate students) separately determined which book in each pair was more difficult by referring to a book search system that retrieves book information including the title, the author(s), the number of pages, the picture(s) (if available), the reading level (if available), and the synopsis (if available). They separately gave each pair either 1, 1, or 0 meaning A is more difficult, B is more is difficult, and indistinguishable, respectively. Then, we merged the results. If the sum is equal to or greater than 3, then A is determined to be more difficult. Similarly, the sum is equal to or smaller than 3, then B is determined to be more difficult. If the sum is 2 or2, the first and second authors joined the four human raters in the evaluation giving newly the pair +1, -1, or 0. If the new sum is equal to or greater (smaller) than 3 ( 3), then A (B) is determined to be more difficult; otherwise, indistinguishable. Also, if the sum is between 1 and 1, the pair is determined to be indistinguishable. As the results, 34 out of 50 pairs were distinguishable in terms of the reading level. For the 34 pairs, the predictions of the proposed method agreed with the decisions of the human raters 62% of the time (21 Out of 34 pairs). Although the results show that the predictions of the proposed method roughly agree with the decisions of the human raters, the agreement is not as high as we expected. We will discuss the reason in the next section. 6 Discussion. The evaluation has shown that the proposed method is effective in recommending books related to the books that the target pupil borrowed. The reason is that the proposed method predicts absolute loan dates from the relevant books and the relevant pupils. The effects can be seen in the results of the recommendation. The proposed method is capable of recommending books in series as Table 3 shows. As underlined, the proposed method recommended Astronomical observation 1, 4, and 9 to the pupil who borrowed Astronomical observation 8. Information about books in series is useful for recommendation since teachers or book database systems do not necessarily have the information. More importantly, the results show that the proposed method is effective in recommending related books. For instance, it recommended Constellation observation 1, which is highly related to Astronomical observation 8, The wonder of the Earth, The birth of the great telescope Subaru, and Journey in the space. Table 4 shows another example. By contrast, the performance of the proposed method concerning reading level is not as high as we expected. As already described in the previous section, the differences in reading level were indistinguishable in 32% of the 50 pairs. For the rest, the predictions of the proposed method agreed with those of the human raters 62% of the time. One of the major reasons is that we used loan histories whose term is one year in the evaluation (or should we say we could only collect that amount?). This means that the difference in reading level is a one-grade higher or lower at most and often much less than one-grade. This explains why 32% of the 50 pairs were indistinguishable in terms of reading level. Considering this, the proposed method will improve in the reading level prediction with longer term loan histories. Another reason is related to the problem of evaluation. It is not so easy to accurately evaluate reading level. It was sometimes difficult for the human raters to determine which book was more difficult by only referring to the book search system. It is possible to take another way of evaluation, which will be our future work. Table 3. Example of book recommendation (books in series). Table 4. Example of book recommendation (related books). This section has discussed the effectiveness of the proposed method in book recommendation. Here, it is also worthwhile to discuss the limitations of the proposed method. One of the limitations is that the proposed method is not effective in recommending books to pupils during the last days of school. It is often impossible to take the differences of absolute loan dates in Equation (3) because there are no pupils in higher grades. This is why we excluded pupils in sixth grade from the evaluation. Another limitation is that the proposed method is not capable of recommending books that have never been borrowed; other systems based collaborative filtering have the same limitation. By contrast, teachers or especially librarians can properly recommend such books. It requires other techniques to achieve this. 7 Conclusions This paper proposed a novel method for book recommendation based on Edu-mining. It has three advantages over the conventional methods: (i) it is inexpensive, (ii) it can recommend books related to the books that the target pupil borrowed, and (iii) reading level is adjustable. The evaluation reveals that the proposed method achieves an accuracy of 60% in recommending related books and outperforms the weighted slope open collaborative filtering. The evaluation also reveals that the reading level predicted by the proposed method roughly agrees with the reading level determined by human raters. For future work, we will investigate how the prediction of reading level can be evaluated more accurately. We will also investigate how other sources of information can be used to improve the proposed method."

About this resource...

Visits 322

0 comments

Do you want to comment? Sign up or Sign in