2 Background for Generalized Dependency Network and the Techniques for Learning the Network

As an alternative modeling tool to Bayesian network or graphical model based on directed acyclic graphs (DAG)[5], dependency network is a probabilistic model that is represented by a product of conditional probabilities p(xi|pa(xi)), where

xi denotes a variable in the network, and pa(xi) denotes the parents of xi . Consistent dependency networks are cyclic graphical models in which all the conditional probabilities can be derived from a joint probability distribution. The consistent dependency network has not attracted much attention because of the restriction that the set of conditional distributions has to be consistent with a joint probability distribution. Developed by a team of scientists at Microsoft Research [3], the GDN removes the consistency requirement. A (pseudo) Gibbs sampler for inference was also proposed to reconcile inconsistency between conditional distributions. The GDN is especially useful for encoding predictive relationships and provides better predictive accuracy because of its collective inference approach as opposed to inference based on individual conditional distributions.

To illustrate the idea, consider the simple (cyclic) GDN in Fig. 1 in which X , Y , and Z are variables. The parents for X , Y , and Z are, respectively, (Y, Z), X , and Y . Consider the following three regression models that are individually speciﬁed according to the GDN in Fig. 1: x = α0 + α1y + α2z + E1, y = β0 + β1x + E, and z = γ0 + γ1y + E3, where E1, E2, and E3 represent Gaussian noise, and α, β, γ denote regression coeﬃcients. When empirical data from a data set are used to independently estimate the three submodels, the result may not lead to a proper joint distribution. It is argued that if a GDN is estimated from the same data set, which is assumed to be generated from a proper joint distribution, then the set of conditional distributions would be consistent when the sample size is large; in that case a generic Gibbs sampler would also accurately recover the joint distribution. By consistent (or compatible), we refer to the property that there exists a proper probability distribution that can be used to derive each individual conditional distribution. The Gibbs sampler is a simulation based estimation algorithm that iteratively draws sample xi from the conditional distributions p(xi|pa(xi)). It can be proved that this simple scheme, regardless of where one starts, will converge and lead to samples drawn from the underlying joint distribution [6]. In Fig. 1, for example, the Gibbs sample would take the form of iteratively sampling from the following conditional distributions: p(x|y, z), p(z|y), and p(y|x). It is worth pointing out that when creating a GDN, there is no restriction on the predictive tool e.g., decision tree or neural network can be used instead of linear regression.

Fig. 1. A simple general dependency network

The system-subsystem dependency network we propose in this paper is even more general; we extend “variable” in GDN to “subsystem”. Without resorting to formality, we illustrate the approach using a simple two-subsystem example from an aging study. Assume that there are two subsystems, S1 of self eﬃcacy about functions, and S2 of physical performance as measured objectively by several variables, and that there is a single measure, indicated by variable X0, that is common to both subsystems (Fig. 2). Furthermore, assume that there exist two distinct data sets D1 and D2 that are respectively associated with the two subsystems. Following the GDN approach for variables, the systemsubsystem GDN proceeds as follows: (1) learn a GDN for subsystem S1 using data set D1 ; likewise, learn (independently) a GDN for subsystem S2 using data set D2 ; (2) apply a system-level pseudo-Gibbs sampler p(S1|S2) , p(S2|S1) , deﬁned by the following implementation, which we call the ordered pseudo-Gibbs sampler: (a) for p(S1|S2) , draw a sample from the three conditional distributions p(X1|X0, X2) , p(X2|X0, X1) , and then p(X0|X1, X2) ; and (b) for p(S2|S1) , draw samples from p(X3|X0, X4, X5) , p(X4|X0, X3, X5) , p(X5|X0, X3, X4) , and then p(X0|X3, X4, X5) ; (3) iterate; (4) when the ordered pseudo-Gibbs sampler reaches convergence, collect the sample X* = (X * ,X* ,X* ,X* ,X* ,X* ) for inference for the entire system.

Fig. 2. Two subsystems with a common variable

The implementation of the ordered pseudo-Gibbs sampler for system-subsystem described above is an extension of the ﬁxed-scan procedure in GDN for potentially incompatible distributions. Fixed-scan Gibbs samplers follow a preﬁxed order of sampling e.g., for 3 variables, X1 → X2 → X3 → X1 and so on [6]. However, ﬁxed-scan pseudo-Gibbs sampler, which is commonly used in Bayesian estimation, is not optimal for potentially incompatible conditional distributions in the sense that the conditional distributions derived from the inferred joint distribution may have large error variances [7,8]. It has been shown that by exploiting the full range of possible scan orders, either through a technique called the Gibbs ensemble, or through randomizing the scan order during sampling [9] in the pseudo-Gibbs procedure, one can substantially reduce the level of incompatibility measured in error variances.

Found a mistake? Please highlight the word and press Shift + Enter