Infer.NET user guide : Tutorials and examples

## Tutorial 5: Clinical trial

This tutorial shows how to do Bayesian model selection in Infer.NET to determine if a new medical treatment is effective. We will construct two models, corresponding to an effective or ineffective treatment, and use model selection to determine the posterior probability of each, given some fictional clinical trial data.

You can run the code in this tutorial either using the Examples Browser or by opening the Tutorials solution in Visual Studio and uncommenting the line to execute ClinicalTrial.cs. Code is also available in F# and Python.

### A healthy challenge

The data in this tutorial consists of the outcomes for individuals who took part in a fictional clinical trial. Each individual was either given the new treatment or given a placebo (individuals given a placebo are in the control group). A good outcome is indicated by **true** and a bad one by **false**. Here is the data:

```
// Data from clinical trial
VariableArray<bool> controlGroup =
Variable.Observed(new bool[] { false, false, true, false, false });
VariableArray<bool> treatedGroup =
Variable.Observed(new bool[] { true, false, true, true, true });
Range i = controlGroup.Range; Range j = treatedGroup.Range;
```

Notice that we have also set up a couple of ranges **i** and **j**, which range over the people in the control group and in the treated group respectively. We’ll use these later.

To determine whether the treament is effective, we will build two models of this data: one which assumes the treatment has an effect and one which doesn’t. To perform Bayesian model selection, we need to introduce a boolean random variable which switches between the two models. In this analysis, we will give this variable a uniform prior. What this prior should be in the case of a real clinical trial would require some thought - what is the *a priori* effectiveness of a new treatment?

```
// Prior on being effective treatment
Variable<bool> isEffective = Variable.Bernoulli(0.5);
```

### Cause and effect

First, let us consider if the treatment has an effect on the outcome. In this case the probability of a good outcome will be different for people in the control group and the treated group. Because we don’t know these two probabilities, we define random variables for them with Beta priors and learn them during inference. The code for model is shown in the snippet below. To achieve model selection, we put this modelling code in an *if block*, so that the model only applies if **isEffective** is true.

**See also:** Branching on variables to create mixture models and Computing model evidence for model selection.

```
Variable<double> probIfTreated, probIfControl;
using (Variable.If(isEffective))
{ // Model if treatment is effective
probIfControl = Variable.Beta(1, 1);
controlGroup[i] = Variable.Bernoulli(probIfControl).ForEach(i);
probIfTreated = Variable.Beta(1, 1);
treatedGroup[j] = Variable.Bernoulli(probIfTreated).ForEach(j);
}
```

The variables **probIfTreated** and **probIfControl** are declared outside of the if block but defined inside. This means the variables can be referred to outside of the *using* statement, which will allow us to infer their values later.

Notice that we have not specified whether the treatment has a *good* effect or not, only that it has some effect. We will be able to see if it is a good effect by comparing the posterior distributions over **probIfTreated** and **probIfControl**.

### A bit of background

Now let us consider the alternative model, where the treatment has no effect *i.e.* the background model. In this case, the probability of a good outcome will be the same for people in both groups. Again, the value of this probability is unknown, so we will put a Beta prior on it. This time we use `Variable.IfNot`

to create the surrounding if block, so that the model will apply in the case where **isEffective** is false. You can think of this as being the **else** clause for the previous **if** block.

```
using (Variable.IfNot(isEffective))
{ // Model if treatment is not effective
Variable<double> probAll = Variable.Beta(1, 1);
controlGroup[i] = Variable.Bernoulli(probAll).ForEach(i);
treatedGroup[j] = Variable.Bernoulli(probAll).ForEach(j);
}
```

The variable **probAll** is both declared and defined inside the if block, since we will not be using it later on.

### Clinical accuracy

We have now fully defined the model and can go ahead and infer the distributions of interest.

```
InferenceEngine engine = new InferenceEngine();
Console.WriteLine("Probability treatment has an effect = " + engine.Infer(isEffective));
Console.WriteLine("Probability of good outcome if given treatment = "
+ (float)engine.Infer<Beta>(probIfTreated).GetMean());
Console.WriteLine("Probability of good outcome if control = "
+ (float)engine.Infer<Beta>(probIfControl).GetMean());
```

When we run this code, it prints out:

```
Probability treatment has an effect = Bernoulli(0.7549)
Probability of good outcome if given treatment = 0.7142857
Probability of good outcome if control = 0.2857143
```

Hence, there is some evidence from this data that the treatment has an effect and, furthermore, the effect is a positive one.

### Factor graph

This is what the factor graph of this model *should* look like (and what you will get if you save it in DGML format):

However, if you tick the box in the Examples Browser to show the factor graph, or equivalently set `engine.ShowFactorGraph = true`

, you will see the following:

Due to a limitation of the graph drawing tool, the ‘if’ blocks in the code are not explicitly drawn. Instead, the variable **isEffective** points to **controlGroup** and **treatedGroup** via a **condition** edge. The condition edge selects which parent of **controlGroup** and **treatedGroup** is active.

If you find these tutorials to be effective, you can move on to the next.