Structural learning determines the dependence and independence of variables and suggests a direction of causation (or association), in other words, the position of the

Table 7.1b: Posterior probability table for the transport mode choice variable.

The calculation of P*(Choice, Gender, Near, Driving License) = P(Choice, Gender, Near, Driving Licence | Mode Choice = car)

Gender

Male

Female

Driving license

Number of cars Mode choice bike Mode choice car

1

0

0,146

>1

0

0.291

1

0

0.036

>1

0

0.291

1

0

0.036

>1

0

0.049

1

0

0.036

>1

0

0.113

links in the network. Experts can provide the structure of the network using domain knowledge. However, the structure can also be extracted from empirical data. Especially the latter option offers important and interesting opportunities for transportation travel demand modelling because it enables one to visually identify which variable or combination of variables influences the target variable of interest. It has to be noted however that algorithms which learn the structure of the network can sometimes have difficulties in capturing the correct (causal) relationships. Causality is extremely difficult to model, often involving human reasoning and it cannot be captured efficiently by any machine learning algorithm. Therefore, the intuitive interpretation of some directions of arrows might look strange. Therefore, it is better to consider the directed arc as an association rather than as a causality relationship per se.

Structural learning can be divided into two categories: search & scoring methods and dependency analysis methods. Algorithms, belonging to the first category interpret the learning problem as a search for the structure that best fits the data. Different scoring criteria have been suggested to evaluate the structure, such as the Bayesian scoring method (Cooper & Herskovits, 1992; Heckerman, Geiger, & Chickering, 1995) and minimum description length (Lam & Bacchus, 1994). The underlying principle behind these scores is the well-known Ockham's razor: the best model to describe a phenomenon is the one which best balances accuracy and complexity. The scores basically include two terms: one for accuracy and the other for complexity and the philosophy of the score is to fmd/select a model that rightly balances these terms.

A Bayesian network is essentially a descriptive probabilistic graphical model that is potentially well suited for unsupervised learning. Unsupervised learning can be defined as the search for a useful structure without labelled classes, optimization criterion or any other information beyond the raw data. Unsupervised learning can help researchers to discover the whole set of probabilistic relationships existing within the data (association discovery) instead of only developing a learning function for one specific dependent variable (supervised learning). By tuning the technique, it also becomes suitable for the latter task (supervised (or classification) learning); just like other more traditional supervised learning algorithms like decision trees, neural networks or for instance support vector machines. A number of Bayesian network classifiers (e.g. Naive Bayes, Tree augmented Naive Bayes, General Bayesian network) have been developed for this purpose.

Found a mistake? Please highlight the word and press Shift + Enter