Log in / Register
Home arrow Computer Science arrow Social Computing, Behavioral-Cultural Modeling and Prediction
< Prev   CONTENTS   Next >

4 Analysis 2: Learning from Two Data Sets

The second example illustrates how the system-subsystem GDN approach could be used to learning networks from different data sources. In this example, besides some common variables (age, gender, BMI), we selected a few variables (lift index and narrow walking ratio) from the HABC study as well as a few variables (the physical component summary (PCS) and mental component summary (MCS)) from the Action For Health in Diabetes (LookAHEAD) study, a multi-site randomized clinical trial. We used the entire sample of HABC as in Analysis 1 and a sample N=770 from the LookAHEAD trial in our analysis. Unlike the HABC sample, which contains 18% of individuals with diabetes, the LookAHEAD sample only contains participants with diabetes. The LookAHEAD sample also represents a younger cohort than the HABC sample (50% below 71 years old in LookAHEAD versus 29% in HABC) as well as a more obese cohort (BMI > 30 is 78% in LookAHEAD versus 25% in HABC). Such differences in the samples arose from the design of the respective studies e.g., only individuals with diabetes and were overweight were eligible for LookAHEAD. As a result, the distributions in many of the variables considered in this analysis were not similar. We ran the several methods described above using both data sets and then estimated the strengths of associations between various factors included in the system-subsystem GDN. Because the variables were all discretized, we followed a commonly used approach in bioinfomatics and ordered their p-values. Then we used the magnitude of the p-values to indicate the strength of association.

4.1 Results from Analysis 2

An inspection of the error rates across the four methods showed that there were less of a difference between the error rates across the methods than in Analysis 1, although the overall trend still favored random scan methods. Because of limited space and given the evidence regarding the strength of the RR* method in the first analysis, here we only report results from the RR* method. Fig. 4 shows the relative strengths of the associations (bidirectional) between the variables within the two subsystems (indicated by boxes in Fig. 4). Diabetes status was included as a “common” variable because it was needed for both data sets for indicating the status of a participant. The strongest association appear to be between the variable pairs BMI and diabetes, and lift index and gender (p< 1030). Moderate association exist between PCS and MCS in subsystem S2. However, both PCS and MCS do not appear to have either strong or moderate associations with the other variables. We shall further discuss this in the Discussion section.

Fig. 4. Structure of two subsystems using data from two different sources

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
Business & Finance
Computer Science
Language & Literature
Political science