Menu
Home
Log in / Register
 
Home arrow Computer Science arrow Social Computing, Behavioral-Cultural Modeling and Prediction
< Prev   CONTENTS   Next >

4.3 Experimental Design and Results

In the first set of experiments, we compare the performance of different methods as a function of the percentage of actors in the network with known labels. For each choice of the percentage of labeled actors, we randomly select the corresponding fraction of labeled data for each node label for training and the rest for testing. We repeat this process 10 times and report the average accuracy. The length of the random walk was set equal to 1 and the linking preferences were set to be equal (i.e., TPWs of a type were set to be equal).

Figure 1 shows the results of the first set of experiments. In particular, in Last.fm data set, HGK significantly (p< 0.05) outperforms all other methods when at least 4% of the actors are labeled. nLB-Assort does not work well when the fraction of actors with known labels is less than 10%; This can be explained by the fact that it relies on the statistics of labels aggregated from the neighbors of an actor to label an actor. Not surprisingly, EdgeCluster-Cont which uses more information than EdgeCluster outperforms EdgeCluster. In the Flickr data set, AGK and HGK outperform other methods when the fraction of actors with known labels is less than 10%. Furthermore, AGK significantly outperforms HGK when the fraction of actors with known labels is less than 7% and both HGK and AGK significantly outperform other methods when the fraction

Fig. 1. Accuracies of six methods on Last.fm (left) and on Flickr (right)

of actors with known labels is between 7% to 10%. HGK significantly outperforms all other methods with labeled data when the fraction of actors with known labels ranges between 10% and 60%. Both HGK and nLB-Assort outperform other methods when the fraction of actors with known labels is between 70% and 80%. On both data sets, HGK often significantly outperforms, or is at least competitive with all other methods. This can be explained by the fact that HGK is able to exploit information provided by multiple node and link types to uncover multi-relational latent information to reliably discriminate between different actor labels.

The second set of experiments explores the sensitivity of kernel methods (HGK and AGK) as a function of the length of the of random walk. The length l of the random walk is varied from 0 to 10 with the linking preferences to be equal (across all the links from an actor). We report results averaged over 10-fold cross validation runs.

Table 1. Accuracies (%) of kernel methods with different lengths of walk. Bold numbers represent best results based on paired t-test (p < 0.05) on 10-fold cross validation.

l

0

1

2

3

4

5

6

7

8

9

10

Last.fm

HGK

61.4

63.8

62.9

61.0

59.9

58.8

58.3

56.7

56.6

55.4

55.5

AGK

39.0

43.8

51.2

53.1

54.5

55.1

55.0

55.1

54.9

54.2

54.0

Flickr

HGK

49.8

49.7

46.1

46.0

46.5

44.8

44.6

43.2

42.8

41.8

41.3

AGK

33.9

38.1

42.1

42.2

41.2

41.6

40.9

41.1

41.1

41.1

41.2

Table 1 shows that kernel methods work well at some shorter walks (e.g., l = 1, 2). As the walk becomes longer, the performances of HGK and AGK decrease or remain the same. This indicates that the further the neighbor is from the node to be classified, the less informative it is for prediction. HGK significantly outperforms AGK with l ≤ 6 (Results for l = 0 correspond to simply using the similarity values given by Rp).

The last set of experiments examines the performance of the learned model by fixing the length of random walk (i.e., l = 1) and varying the linking preferences of actors. Specifically, in Last.fm, let w1 = wuseruser and w2 = wusertrack be the TPWs from type user. We examine the performance of the model by changing ratio w1 : w2 (1:5, 1:4, 1:3, 1:2, 1:1, 2:1, ..., 9:1). We do the same for Flickr with w1 = wuseruser and w2 = wuserphoto . We report classification accuracy averaged over a 10-fold cross-validation runs.

Figure 2 shows the results of the last experiment set which investigates the influence of linking preferences on the performance of the HGK. In the case of Last.fm, the model performs better when the linking preference between a user and a track is higher than that between users whereas in the case of Flickr, the situation is reversed. Our results suggest that in the case of Last.fm, a surfer (exploring music) is likely to move from a user to a track more often than from a user to another user, with the opposite being true in the case of a surfer exploring pictures in Flickr.

 
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Philosophy
Political science
Psychology
Religion
Sociology
Travel