APJIS Asia Pacific Journal of Information Systems

???

The Journal for Information Professionals

Asia Pacific Journal of Information Systems (APJIS), a Scopus and ABDC indexed journal, is a
flagship journal of the information systems (IS) field in the Asia Pacific region.

ISSN 2288-5404 (Print) / ISSN 2288-6818 (Online)

Editor : Seung Hyun Kim

View full editorial board

menu05_sub01_ov.gif

Share this page

Most Downloaded Articles

Date December 2007
Vol. No. Vol. 17 No. 4
DOI
Page 187~206
Title Pre-Evaluation for Prediction Accuracy by Using the Customers Ratings in Collaborative Filtering
Author Seok Jun Lee, Sun Ok Kim
Keyword Recommender System, Collaborative Filtering, Pre-evaluation for prediction, Pre-information
Abstract The development of computer and information technology has been combined with the information superhighway internet infrastructure, so information widely spreads not only in special fields but also in the daily lives of people. Information ubiquity influences the traditional way of transaction, and leads a new E-commerce which distinguishes the existing E-commerce. Not only goods as physical but also service as non-physical come into E-commerce. As the scale of E-Commerce is being enlarged as well. It keeps people finding information they want. Recommender systems are now becoming the main tools for E-Commerce to mitigate the information overload.Recommender systems can be defined as systems for suggesting some Items (goods or service) considering customers interests or tastes. They are being used by E-commerce web sites to suggest products to their customers who want to find something for them and to provide them with information to help them decide which to purchase. There are several approaches of recommending goods to customer in recommender system but in this study, the main subject is focused on collaborative filtering technique. This study presents a possibility of pre-evaluation for the prediction performance of customers preference in collaborative filtering before the process of customers preference prediction. Pre-evaluation for the pre-diction performance of each customer low performance is classified by using the statistical features of ratings rated by each customer is conducted before the prediction process.In this study, MovieLens 100K dataset is used to analyze the accuracy of classification. The classification criteria are set by using the training sets divided 80% the 100K dataset. In the process of classification, the customers are divided into two groups, classified group and non classified group. To compare the prediction performance of classified group and non classified group, the prediction process runs the 20% test set through the Neighborhood Based Collaborative Filtering Algorithm and Correspondence Mean Algorithm. The prediction errors those prediction algorithm are allocated to each customer and compared with each users error.Research hypothesisTwo research hypotheses are formulated in this study to test the accuracy of the classification criterion as follows.Hypothesis 1: The estimation accuracy of groups classified according to the standard deviation of each users ratings has significant difference.To test the Hypothesis 1, the standard deviation is calculated for each user in training set which is divided 80% MovieLens 100K dataset. Four groups are classified according to the quartile of the each users standard deviations. It is compared to test the estimation errors of each group which results test set are significantly different.Hypothesis 2: The estimation accuracy of groups that are classified according to the distribution of each users ratings have significant differences.To test the Hypothesis 2, the distributions of each users ratings are compared with the distribution of ratings of all customers in training set which is divided 80% MovieLens 100K dataset. It assumes that the customers whose ratings distribution are different that of all customers would have low performance, so six types of different distributions are set to be compared. The test groups are classified into fit group or non-fit group according to the each type of different distribution assumed. The degrees in accordance with each type of distribution and each customers distributions are tested by the test of ¡¡¡¡ goodness-of-fit and classified two groups for testing the difference of the mean of errors. Also, the degree of goodness-of-fit with the distribution of each users ratings and the average distribution of the ratings in the training set are closely related to the prediction errors those prediction algorithms. Through this study, the customers who have lower performance of prediction than the rest in the system are classified by those two criteria, which are set by statistical features of customers ratings in the training set, before the prediction process.


Home     l      Site Map      l       Abstracting/Indexing      l      FAQ      l      Publisher      l       Contact Us     l       Admin Login

© 2013 The Korean Society of Management Information Systems. All rights reserved.