The GDN has the promise to become an important tool set for system science because it is highly ﬂexible, convenient to implement, and has the potential to integrate heterogeneous data sources. One contribution of this paper is to extend the concept of GDN from variable to subsystem. Currently the standard method for learning the joint distribution and making subsequent inference is the pseudo Gibbs sampler, which was also adopted in the current work to a system of componential networks, or subsystems. One surprising ﬁnding both from this work and previous work is that the usual ﬁxed scan Gibbs sampling method does not work well. Analogous to the local optima problem in optimization, the ﬁxed scan Gibbs sampler for potentially incompatible conditional distributions may be trapped in suboptimal paths, and as a result either does not converge or converges to a biased solution or solutions of high variance. The random scan method oﬀers a relatively simple and quick ﬁx. In this paper we oﬀer further evidence that the pseudo Gibbs sampler can be enhanced when random scan is applied to the subsystem level. A third contribution of the paper is the illustration of using the GDN for integrating diﬀerent data sources, as demonstrated in the joint analysis of the HABC and the LookAHEAD data sets.

The intuition behind the system-subsystem GDN is that it is often easier to study components of a system and subsequently integrate the components. The process of integration is typically “messy” so approximate method needs to be used, and the pseudo Gibbs sampler emerges as a powerful tool for this purpose. The idea of using the Gibbs sampler in a subsystem analysis is to allow information from one subsystem to “diﬀuse” to another subsystem through the repeated draws of samples from the common, or the overlapping part of the subsystems such that potentially incongruent subsystems will “reconcile” with each other. There are however many challenges that remain to be studied and solved.

First, by restricting the relationship between variables to be bidirectional and thus associative, we have not exploited the full modeling strength of the GDN. The GDN in general allows directed arcs in the corresponding graph. While it is true that typically only for a few well studied pathways of which causality could be well understood, in many cases only correlation could be readily established, resulting in a “picture” that contains a mixture of directed causal pathways and undirected relationships. The examples we used did not address possible causal relationships or mixture of relationships. We also have not used variable selection methods for simplifying potentially complex structures. In the pseudo Gibbs sampler, we used the full set of conditional variables. However, this limitation, while aﬀecting interpretation of the structure, is not likely to have any meaningful impact on the evaluation of the performance of the methods.

The second challenge has to do more with the inherent heterogeneity that typically exist across diﬀerent data sets. In Analysis 2, we observed that the data from the LookAHEAD study in the second subsystem (S2) did not seem to aﬀect the parameters in the variables in the ﬁrst subsystem (S1). The information exchange between subsystems might be more meaningful if for example, both diabetic and non-diabetic individuals are present in both studies. Heterogeneity of data from diﬀerent sources is a signiﬁcant issue that is beyond the scope of this paper and will require further research.

Acknowledgment. The study is supported by NIH grants 1R21AG042761-01 and 1U01HL101066-01 (PI: Ip).

References

1. Sterman, J.D.: Learning from evidence in a complex world. Am. J. Public Health 96, 505–514 (2006)

2. Vandenbroeck, P., Goossens, J., Clemens, M.: Foresight Tacking Obesity: Future Choices Building the Obesity System Map. Government Oﬃce for Science, UK (2013), foresight.gov.uk (last retrieved November 13, 2013)

3. Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency networks for inference, collaborative ﬁltering, and data visualization. Mach. Learn. Res. 1, 49–75 (2000)

4. Lawrence, R.H., Jette, A.M.: Disentangling the disablement process. J. Gerontol.