$\newcommand{\entropfrac}[2]{\frac{#1}{#2} \log \left( \frac{#1}{#2} \right)}$

## Mututal Information (MI)

### Introduction

Mutual Information (MI) distance is used to measure the distance between two genes vectors, for example $x_1 = {1, 0, 1, 1, 1, 1, 0}$ and $y_1 = {0, 1, 1, 1, 1, 1, 0}$. It is easily to transfer the two vectors into a binary table:

X/Y 1(Presence) 0(Absence) Sum
1(Presence) a b a+b
0(Absence) c d c+d
Sum a+c b+d n=a+b+c+d

Typically, here we give the example of two discrete variables, the mutual information between $x_1$ and $y_1$ is

The $\eqref{eq:1}$ is equal to

$p(x)$ is the probability that a symbol (here is 0 or 1) appears in the gene vector X regardless that what the symbol is in gene vector Y. $p(y)$ has a similar definition of $p(x).$$p(x, y)$ is probability of a symbol combination appears in gene vector X and Y. In this example, there are four kinds of symbol combination $(1, 1)$, $(1, 0)$, $(0, 1)$ and $(0, 0)$.

If we use the binary table to illustrate this equation, the $\eqref{eq:1}$ is:

The $\eqref{eq:3}$ is mathmatically equal to:

### Example

We can use R to directly calculate the MI between two gene vectors mentioned above.

1. Use basic R function
1. Use R package bioDist

02/11/2016