Menu
Home
Log in / Register
 
Home arrow Computer Science arrow Social Computing, Behavioral-Cultural Modeling and Prediction
< Prev   CONTENTS   Next >

5 Discussion

The GDN has the promise to become an important tool set for system science because it is highly flexible, convenient to implement, and has the potential to integrate heterogeneous data sources. One contribution of this paper is to extend the concept of GDN from variable to subsystem. Currently the standard method for learning the joint distribution and making subsequent inference is the pseudo Gibbs sampler, which was also adopted in the current work to a system of componential networks, or subsystems. One surprising finding both from this work and previous work is that the usual fixed scan Gibbs sampling method does not work well. Analogous to the local optima problem in optimization, the fixed scan Gibbs sampler for potentially incompatible conditional distributions may be trapped in suboptimal paths, and as a result either does not converge or converges to a biased solution or solutions of high variance. The random scan method offers a relatively simple and quick fix. In this paper we offer further evidence that the pseudo Gibbs sampler can be enhanced when random scan is applied to the subsystem level. A third contribution of the paper is the illustration of using the GDN for integrating different data sources, as demonstrated in the joint analysis of the HABC and the LookAHEAD data sets.

The intuition behind the system-subsystem GDN is that it is often easier to study components of a system and subsequently integrate the components. The process of integration is typically “messy” so approximate method needs to be used, and the pseudo Gibbs sampler emerges as a powerful tool for this purpose. The idea of using the Gibbs sampler in a subsystem analysis is to allow information from one subsystem to “diffuse” to another subsystem through the repeated draws of samples from the common, or the overlapping part of the subsystems such that potentially incongruent subsystems will “reconcile” with each other. There are however many challenges that remain to be studied and solved.

First, by restricting the relationship between variables to be bidirectional and thus associative, we have not exploited the full modeling strength of the GDN. The GDN in general allows directed arcs in the corresponding graph. While it is true that typically only for a few well studied pathways of which causality could be well understood, in many cases only correlation could be readily established, resulting in a “picture” that contains a mixture of directed causal pathways and undirected relationships. The examples we used did not address possible causal relationships or mixture of relationships. We also have not used variable selection methods for simplifying potentially complex structures. In the pseudo Gibbs sampler, we used the full set of conditional variables. However, this limitation, while affecting interpretation of the structure, is not likely to have any meaningful impact on the evaluation of the performance of the methods.

The second challenge has to do more with the inherent heterogeneity that typically exist across different data sets. In Analysis 2, we observed that the data from the LookAHEAD study in the second subsystem (S2) did not seem to affect the parameters in the variables in the first subsystem (S1). The information exchange between subsystems might be more meaningful if for example, both diabetic and non-diabetic individuals are present in both studies. Heterogeneity of data from different sources is a significant issue that is beyond the scope of this paper and will require further research.

Acknowledgment. The study is supported by NIH grants 1R21AG042761-01 and 1U01HL101066-01 (PI: Ip).

References

1. Sterman, J.D.: Learning from evidence in a complex world. Am. J. Public Health 96, 505–514 (2006)

2. Vandenbroeck, P., Goossens, J., Clemens, M.: Foresight Tacking Obesity: Future Choices Building the Obesity System Map. Government Office for Science, UK (2013), foresight.gov.uk (last retrieved November 13, 2013)

3. Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency networks for inference, collaborative filtering, and data visualization. Mach. Learn. Res. 1, 49–75 (2000)

4. Lawrence, R.H., Jette, A.M.: Disentangling the disablement process. J. Gerontol.

B-Psychol. 51, 173–182 (1996)

5. Lauritzen, S.L.: Graphical models. Oxford Press (1996)

6. Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46, 167–174 (1992)

7. Chen, S.H., Ip, E.H., Wang, Y.: Gibbs ensembles for nearly compatible and incompatible conditional models. COMPUT. Stat. Data An. 55, 1760–1769 (2010)

8. Chen, S.H., Ip, E.H., Wang, Y.: Gibbs ensembles for incompatible dependency networks. WIREs Comp. Stat. 5, 475–485 (2013)

9. Levine, R.A., Casella, G.: Optimizing random scan Gibbs samplers. J. Multivariate

Ana. 97, 2071–2100 (2006)

 
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Philosophy
Political science
Psychology
Religion
Sociology
Travel