Traditional network inference algorithm such as NetInf [2] uses the direct observation i.e., adoption time difference to estimate the influence probability, where the infection probability for each link (ij) on cascade c is calculated as:

where Δij is the adoption time difference between i and j, and α is set to 1 according to [2]. Based on that, NetInf selects links which contribute mostly on infection probability over all cascades with a greedy algorithm. However, the purpose of NetInf is only to infer the most likely influence network based on past cascades. To do the prediction on future topic, NetInf* is developed as an extension based on NetInf that it estimates the influence probability for all links as:

where P rij is the probability i will adopt after j for any topic, and the transition probability matrix can be set accordingly.

3.2 HetNetInf

The second algorithm HetNetInf uses the indirect observation, i.e., social connections and topic popularity to generate the influence probability. Every author can be infected (adopt a topic) by her infected neighbor with a probability. This neighbor could be the actual peer with social connections, or a virtual author which represents the topic popularity. So the problem becomes how to use the social connection features and topic popularity to represent the infection probability for each author-pair, and to reflect the time order they adopt the topic. Given authors i and j, author i is infected at time ti and j is infected at time tj (ti > tj ). Let Xij be the feature vector between author i and j, and Xij = [Xij,f 1Xij,f 2...Xij,f N Xpt]. Xij,f k is the kth social connection feature and Xpt is the popularity of topic t.

if j is the virtual author:Xij = [0...0Xpt]

if j is a normal author:Xij = [Xij,f 1Xij,f 2...Xij,f N 0]

The probability author j infects author i in topic cascade c is defined as:

where Xij is the feature vector between authors i and j; and β > 0 is the weight which transfers the features to the influence probability.

For every infected neighbor of i, they all have chances to infect author i. Then the likelihood of infecting author i in cascade c is:

where K(i) is the neighbor set of author i, I is the indicate function, and T is the observation period. If ti ≥ T , this means author i is not infected in the observation window and the Lc is interpreted as the probability i is not infected by any neighbor; otherwise, Lc stands for the probability i is infected.

Two concerns remain. Firstly, it is unclear whether one should use the same parameter β for all authors, which assumes the same adoption behavior, i.e., weight distribution for all authors. So, each author has its own βi, and HetNetInf runs the optimization for each author independently. Equation (3) is updated as:

For cascade set C, the likelihood of author i being infected for all cascades c ∈ C is:

Substituting (4) in the (6), the log likelihood is:

Secondly, because the influence network is sparse, β need to be penalized to have fewer links which is same to have some links with very small link weight. Take it into consideration, the optimization is to minimize following function:

The ridge regularization term of βi 2 is the penalty for the influence network sparsity. For each author i, ten optimizations with random initial points will be run to find the local optimals, and the best one is selected.

Found a mistake? Please highlight the word and press Shift + Enter