A social network can be considered as a heterogeneous network of multiple types of objects and links. Formally, we define a social network as follows.

Definition 1. Heterogeneous Social Network.A heterogeneous social network with multiple types of objects and links is represented by a graph G = (V, E) in which V = {V1 ∪ V2 ∪ ... ∪ Vm} is a vertex set where Vp denotes the set of vertices of type p; and E = {∪Epq |1 ≤ p, q ≤ m} is an edge set where Epq = {(x, y) |x ∈ Vp,y ∈ Vq } is a set of edges between the two objects of types p and q, respectively.

We define the transition probability between a node x and its neighbors as being inversely proportional to the number of x's neighbors and the probability of remaining at node x with a certain stopping probability. Let Tpq be the transition probability matrix between nodes in Vp and Vq (note that Tpq /= Tqp) and Spq be the stopping probability matrix of nodes in Vq . Let A ∈ {V1, V2, ••• , Vm } be a set of actors in a network, we formally describe our problem as follows. Consider a social network in which each actor x ∈ A belongs to one category in C = {c1, c2, ••• , cN } and a subset of labeled actors AL, our task is to complete the labeling for unlabeled actors in the subset AU = A−AL.

3 Heterogeneous Graph Kernel

3.1 Kernel Function

We consider two objects to be similar if they occur in similar contexts. We generalize the marginalized kernel [14] that has been used for computing similarity between two objects in a homogeneous networks to the heterogeneous network setting as follows.

Let x be a node in a heterogeneous network G, and let hx = x − x1 − x2 − ••• − xl be a random walk starting from x with the length of l. The probability of hx is defined as: P (hx) = Pt (x1|x) Pt (x2|x1) ••• Pt (xl|xl−1) Ps (xl) where Ps is stopping probability and Pt is transition probability.

We assume that the stopping probability for all nodes of all types is Ps = ρ (0 <ρ < 1). Let Nq (xi−1) be the neighbors of type q of xi−1, then we have:

Suppose that xi−1 is an object of type p. Let wpq be the transition probability weight (TPW) from type p to type q where ), wpq = 1. Then, )xi ∈Nq (xi−1 )

Pt (xi|xi−1) = (1 − ρ) wpq . Transition probabilities from xi−1 to neighbors xi of type q are assumed to be equal, i.e., Pt (xi|xi−1) = |Nq (xi 1 )| . We define the linking preferences of type p to be proportional to the TPWs of type p, i.e., wp1 : wp2 : ... : wpm . In the absence of prior knowledge, we assume that ∀q : 1 ≤ q ≤ m, wpq are equal.

We define the kernel induced similarity between two objects x and y of type p as follows.

where Kp (x, y) is a kernel function; Rh (hx, hy ), the similarity between two paths hx and hy , is equal to 0 if they are of different lengths; otherwise, Rh = nlR0 (xi, yi).

If xi and yi are of the same type (say p), then R0 (xi, yi) = Rp (xi, yi) where Rp (xi, yi) is defined using Jaccard similarity coefficient on the sets of directed neighbors of xi and yi as follows.

where Nq (xi ) and Nq (yi) are neighbors of type q of objects xi and yi, respectively.

Otherwise, R0 (xi, yi) = 0.

3.2 Efficient Computation

Computing the kernel value between two nodes by generating random walks starting from the two nodes is computationally expensive. We adapt a technique introduced in [14, 17] for efficient computation of the random walk graph kernel in the case of homogeneous networks to the heterogeneous network setting as follows.

where Rp (x, y) is recursively defined using matrix form as follows: Rp = ),mTpq(Rq0 ◦ Rl−1pq and R1 = ),(Tpq ◦ Spq ) R0(Tpq ◦ Spq ) , Tpqis transpose of Tpq and “◦” is Hadamard product (see the Appendix A and B for the formation and convergence proof of (4), respectively). Let d = max (|V1|,..., |Vm|), then the time for computing kernel matrix corresponding to a random walk of length of 1 (i.e., R1) is O ( Vp|d2). The time for com-puting kernel matrix Rl (l > 1) is O (d3). As a result, time complexity for computing kernel matrix for type p is O (ld3).

3.3 Learning to Label Actors in Social Network

We first compute the kernel matrix that captures the pair-wise similarity between actors and normalize it to obtain: Kˆ p (x, y) = Kp (x,y)/√Kp(x,x)Kp (y,y). We train a support vector machine (SVM^{[1]}) using HGK for labeling actors in social networks^{[2]}.