A knowledge engineer can build a Bayesian network. There are a number of steps the knowledge engineer needs to take while building it.
Example problem − Lung cancer. A patient has been suffering from breathlessness. He visits the doctor, suspecting he has lung cancer. The doctor knows that barring lung cancer, there are various other possible diseases the patient might have such as tuberculosis and bronchitis.
Gather Relevant Information of Problem
● Is the patient a smoker? If yes, then high chances of cancer and bronchitis.
● Is the patient exposed to air pollution? If yes, what sort of air pollution?
● Take an X-Ray positive X-ray would indicate either TB or lung cancer.
Identify Interesting Variables
The knowledge engineer tries to answer the questions −
● Which nodes to represent?
● What values can they take? In which state can they be?
For now let us consider nodes, with only discrete values. The variable must take on exactly one of these values at a time.
Common types of discrete nodes are −
● Boolean nodes − They represent propositions, taking binary values TRUE (T) and FALSE (F).
● Ordered values − A node Pollution might represent and take values from {low, medium, high} describing degree of a patient’s exposure to pollution.
● Integral values − A node called Age might represent patient’s age with possible values from 1 to 120. Even at this early stage, modeling choices are being made.
Possible nodes and values for the lung cancer example −
Node Name |
Type |
Value |
Nodes Creation |
Polution |
Binary |
{LOW, HIGH, MEDIUM} |
|
Smoker |
Boolean |
{TRUE, FASLE} |
|
Lung-Cancer |
Boolean |
{TRUE, FASLE} |
|
X-Ray |
Binary |
{Positive, Negative} |
Create Arcs between Nodes
Topology of the network should capture qualitative relationships between variables.
For example, what causes a patient to have lung cancer? - Pollution and smoking. Then add arcs from node Pollution and node Smoker to node Lung-Cancer.
Similarly if patient has lung cancer, then X-ray result will be positive. Then add arcs from node Lung-Cancer to node X-Ray.
Specify Topology
Conventionally, BNs are laid out so that the arcs point from top to bottom. The set of parent nodes of a node X is given by Parents(X).
The Lung-Cancer node has two parents (reasons or causes): Pollution and Smoker, while node Smoker is an ancestor of node X-Ray. Similarly, X-Ray is a child (consequence or effects) of node Lung-Cancer and successor of nodes Smoker and Pollution.
Conditional Probabilities
Now quantify the relationships between connected nodes: this is done by specifying a conditional probability distribution for each node. As only discrete variables are considered here, this takes the form of a Conditional Probability Table (CPT).
First, for each node we need to look at all the possible combinations of values of those parent nodes. Each such combination is called an instantiation of the parent set. For each distinct instantiation of parent node values, we need to specify the probability that the child will take.
For example, the Lung-Cancer node’s parents are Pollution and Smoking. They take the possible values = { (H,T), ( H,F), (L,T), (L,F)}. The CPT specifies the probability of cancer for each of these cases as <0.05, 0.02, 0.03, 0.001> respectively.
Each node will have conditional probability associated as follows −