Asia Pacific Journal of Information Systems (APJIS)

menu05_sub01_ov.gif

Share this page

Most Downloaded Articles

Date	December 2007
Vol. No.	Vol. 17 No. 4
DOI
Page	187~206
Title	Pre-Evaluation for Prediction Accuracy by Using the Customers Ratings in Collaborative Filtering
Author	Seok Jun Lee, Sun Ok Kim
Keyword	Recommender System, Collaborative Filtering, Pre-evaluation for prediction, Pre-information
Abstract	The development of computer and information technology has been combined with the information superhighway internet infrastructure, so information widely spreads not only in special fields but also in the daily lives of people. Information ubiquity influences the traditional way of transaction, and leads a new E-commerce which distinguishes the existing E-commerce. Not only goods as physical but also service as non-physical come into E-commerce. As the scale of E-Commerce is being enlarged as well. It keeps people finding information they want. Recommender systems are now becoming the main tools for E-Commerce to mitigate the information overload.Recommender systems can be defined as systems for suggesting some Items (goods or service) considering customers interests or tastes. They are being used by E-commerce web sites to suggest products to their customers who want to find something for them and to provide them with information to help them decide which to purchase. There are several approaches of recommending goods to customer in recommender system but in this study, the main subject is focused on collaborative filtering technique. This study presents a possibility of pre-evaluation for the prediction performance of customers preference in collaborative filtering before the process of customers preference prediction. Pre-evaluation for the pre-diction performance of each customer low performance is classified by using the statistical features of ratings rated by each customer is conducted before the prediction process.In this study, MovieLens 100K dataset is used to analyze the accuracy of classification. The classification criteria are set by using the training sets divided 80% the 100K dataset. In the process of classification, the customers are divided into two groups, classified group and non classified group. To compare the prediction performance of classified group and non classified group, the prediction process runs the 20% test set through the Neighborhood Based Collaborative Filtering Algorithm and Correspondence Mean Algorithm. The prediction errors those prediction algorithm are allocated to each customer and compared with each users error.Research hypothesisTwo research hypotheses are formulated in this study to test the accuracy of the classification criterion as follows.Hypothesis 1: The estimation accuracy of groups classified according to the standard deviation of each users ratings has significant difference.To test the Hypothesis 1, the standard deviation is calculated for each user in training set which is divided 80% MovieLens 100K dataset. Four groups are classified according to the quartile of the each users standard deviations. It is compared to test the estimation errors of each group which results test set are significantly different.Hypothesis 2: The estimation accuracy of groups that are classified according to the distribution of each users ratings have significant differences.To test the Hypothesis 2, the distributions of each users ratings are compared with the distribution of ratings of all customers in training set which is divided 80% MovieLens 100K dataset. It assumes that the customers whose ratings distribution are different that of all customers would have low performance, so six types of different distributions are set to be compared. The test groups are classified into fit group or non-fit group according to the each type of different distribution assumed. The degrees in accordance with each type of distribution and each customers distributions are tested by the test of 　　 goodness-of-fit and classified two groups for testing the difference of the mean of errors. Also, the degree of goodness-of-fit with the distribution of each users ratings and the average distribution of the ratings in the training set are closely related to the prediction errors those prediction algorithms. Through this study, the customers who have lower performance of prediction than the rest in the system are classified by those two criteria, which are set by statistical features of customers ratings in the training set, before the prediction process.

Home l Site Map l Abstracting/Indexing l FAQ l Publisher l Contact Us l Admin Login

© 2013 The Korean Society of Management Information Systems. All rights reserved.