Log in / Register
Home arrow Computer Science arrow Social Computing, Behavioral-Cultural Modeling and Prediction
< Prev   CONTENTS   Next >

4 Experimental Settings and Results

We describe results of experiments that compare the performance of HGK with that of several baseline classifiers. We also study the sensitivity of the performance of HGK to length of random walk and to linking preferences of type actor.

4.1 Social Media Data

We crawled two real-world heterogeneous social networks. The first data set is from music network. We manually identified 11 disjoint groups (categories of users who share similar interests in music e.g., in the case of users who enjoy Heavy Metal) that contain approximately equal number of users in the network; we then crawled users and items and the links that denote the relations among the objects in the network. In particular, the subset of the data that we use consists of 1612019 links that connect 25471 nodes. The 25471 nodes belong to one of 4 types: 10197 users (actors), 8188 tracks, 1651 artists and 5435 tags;

The 1612019 links belong to one of 5 types: 38743 user-user, 765961 user-track, 8672 track-artist, 702696 track-tag, and 95947 artist-tag links.

Our second data set is from Flickr. We manually identified 10 disjoint groups (communities of users who share the same taste in pictures e.g., in the case of users who share an interest in pictures that relate to the state of Iowa) of approximately equal numbers of users. The data set constructed by crawling the Flickr network contains 361787 links that connect 22347 nodes. The nodes are of one of three types: 6163 users, 14481 photos, and 1703 tags; and the links are of one of three types: 88052 user-user, 144627 user-photo, and 129108 photo-tag. In both data sets, we use the group memberships of users as class labels to train and test all models.

4.2 Methods

We compare the performance of SVM trained using HGK with several state-of-the-art methods for labeling actors in social networks:

1. Weighted-Vote Relational Neighbor Classifier with network data augmentation (wvRN-Assort) [2, 12, 13]: a method that first augments networked data by combining explicit links with links mined from the nodes' local attributes and then uses the augmented network as input to a Weighted-Vote Relational Neighbor Classifier.

2. Network-Only Link-Based Classification with network data augmentation (nLBAssort) [2, 12, 13]: a method which is similar to wvRN-Assort but uses the augmented network as input to a Network-Only Link-Based [4] classifier that constructs a relational feature vector for each node by aggregating the labels of its neighbors which is used to train a logistic regression model.

3. EdgeCluster [11]: a method which extracts the social dimensions of each actor, i.e., the affiliations of the actor in a number of latent social groups and uses the resulting features to generate a discriminative model to classify actors.

4. EdgeCluster-Cont [11]: a method that combines both social dimensions and features extracted from user profiles to build predictive models. For both data sets, we report results obtained using a subset of user profile features (e.g., artists, tags) that yield the best performance.

5. Augmented-Graph Kernel (AGK): a method that uses a homogeneous graph kernel

[14, 17]. We augment the network data by adding an edge between two actors if they share links to a specified number (n) of items (When n = , the method defaults to the use of homogeneous graph kernel on the unaugmented network data). For both data sets, we report results for a choice of n that yields the best performance.

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
Business & Finance
Computer Science
Language & Literature
Political science