Menu
Home
Log in / Register
 
Home arrow Computer Science arrow Social Computing, Behavioral-Cultural Modeling and Prediction
< Prev   CONTENTS   Next >

4 Experiments

4.1 The Dataset and Experimental Setting

This paper selects DBLP to evaluate how to infer the influence network and use that to predict who will adopt a new topic once it emerges. The experiment examines publications on top 20 computer science conferences in four areas,[1] and in two periods, T0 = [1991, 2000] and T1 = [2001, 2010]. All topic terms are extracted from the paper title or the abstract, and filtered by the machine learning related keyword list in Microsoft Academic website [2]. A total of 111 machine-learning related topics that were first introduced in period T0 are selected for training, and additional 57 topics from T1 are selected for testing; all terms appear in more than 10 papers. For each topic, the time an author first adopted the topic is recorded. The author will be treated as not adopting the topic if either the author did not adopt the topic at all or adopted the topic beyond the monitored period. As mentioned in Section 2.2, eight social connection features are extracted from period T0. Including the topic popularity (TP), the feature vector includes nine features in total: CI,CA,CV,CSA,CT,CICI,CIS,SCI,TP.

The experiment selects 196 authors who published at least 1 paper in both periods in these conferences as low productive authors (Low Pub Set), and an another set of 47 authors who published at least 3 papers in both periods as high productive authors (High Pub Set). The Low Pub Set is a superset of High Pub Set. The reason to select these authors is to have authors who are active in both periods, such that the relationships among them are relatively stable.

4.2 Prediction Study

Each algorithm uses the 111 topic cascades appeared in T0 as training set to infer the influence network; and uses 57 terms in T1 for testing. For each term, the authors who adopted the topic earliest are labeled as the initial authors. To predict who will follow the topic, both algorithms return the probabilities of adopting the topic for all authors. Then performance is measured by comparing all authors' adoption probabilities (excluding the initial adopted authors) to the actual adopted authors in Mean Average Precision (MAP) and Area under ROC Curve (AUC).

Table 1 shows the average MAP and AUC achieved by NetInf* and HetNetInf for the topics being tested. A high MAP value means that the author who adopted the topic is also predicted with a high probability. Looking at “High Pub Set” column in MAP, “% Authors Adopt = 0.0621” means that there are roughly only 1 out of 16 authors follow the topic. NetInf* has the chance of 0.1649 to have a correct prediction, while HetNetInf increases the chance to 0.2121. These numbers are significantly higher than 0.0621, and HetNetInf outperforms NetInf*. These numbers, however are not close to 1.0, which would be ideal for perfect prediction. This is because each past cascade only covers less than 10% of authors. The MAP results are worse for “Low Pub Set”, because the same number of cascades are used to explain more authors.

In terms of AUC, HetNetInf also exhibits better performance than NetInf* for both sets. Interestingly, both algorithms achieve better AUC for “Low Pub Set” than for “High Pub Set”. This is because, as the network becomes larger, though the percentage of authors adopt the topic decreases, the absolute number of authors followed the topic increases. It also increases the number of positive samples that true positive rate increases more smoothly, and thus, improves the AUC.

Table 1. Topic Adoption Prediction Performance

MAP

AUC

High Pub Set

Low Pub Set

High Pub Set

Low Pub Set

NetInf*

0.1649

0.0970

0.5624

0.6333

HetNetInf

0.2121

0.1044

0.6188

0.6376

% Authors Adopt

0.0621

0.0305

Fig. 2 and Fig. 3 show the MAP for each topic being tested on “High Pub Set” and “Low Pub Set”, respectively. As the figures show, the result varies from topic to topic. The topic “gene expression data” shows an example of how social connections help predict the adoption of novel topic. In this case, authors S and M published about the topic in 2003 and 2007, respectively. In T0, M only followed S once among 111 topics. Compared to other authors who followed S many times, the influence from S to M is relative weak in NetInf*. However, there is very strong social connection between them. As a result, this helps increase the probability for M to follow S using HetNetInf. In this case, HetNetInf has MAP = 1; while NetInf* has MAP = 0.026.

Fig. 2. Prediction Performance

Fig. 3. Prediction Performance

HetNetInf also provides information about each author's preference (βi) in which factors would decide the adoption process. For example, independent researchers may be used to discover topics by themselves, while other researchers step into a new topic under the influence of their research community. Table 2 shows two authors' normalized features weights. These two authors have very different following behaviors: author X 's action is mostly affected by other authors' work in the same area; and author Y is mostly affected by the topic popularity.

Table 2. Individual Features Weight

Author

CI

CA

CV

CSA

CT

CICI

CIS

SCI

TP

X

0.4471

0.6248

0.0103

0.6396

0.0104

0.0103

0.0102

0.0102

0.0103

Y

0.2686

0.2514

0.2548

0.2654

0.3155

0.2540

0.2897

0.2517

0.6465

Fig. 4 and 5 further show the normalized feature weight distributions. There is no one feature that dominates the adoption process for all authors, and the weight distribution varies significantly from author to author. This also validates the approach of HetNetInf that uses different βi for different authors.

  • [1] Data mining: KDD, PKDD, ICDM, SDM, PAKDD; Database: SIGMOD Conference, VLDB, ICDE, PODS, EDBT; Information Retrieval: SIGIR, ECIR, ACL, WWW, CIKM; and Machine Learning: NIPS, ICML, ECML, AAAI, IJCAI
  • [2] academic.research.microsoft.com
 
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Philosophy
Political science
Psychology
Religion
Sociology
Travel