$\newcommand{\entropfrac}[2]{\frac{#1}{#2} \log \left( \frac{#1}{#2} \right)}$

Mututal Information (MI)

Introduction

Mutual Information (MI) distance is used to measure the distance between two genes vectors, for example $x_1 = {1, 0, 1, 1, 1, 1, 0}$ and $y_1 = {0, 1, 1, 1, 1, 1, 0}$. It is easily to transfer the two vectors into a binary table:


X/Y 1(Presence) 0(Absence) Sum
1(Presence) a b a+b
0(Absence) c d c+d
Sum a+c b+d n=a+b+c+d

Typically, here we give the example of two discrete variables, the mutual information between $x_1$ and $y_1$ is

The $\eqref{eq:1}$ is equal to

$p(x)$ is the probability that a symbol (here is 0 or 1) appears in the gene vector X regardless that what the symbol is in gene vector Y. $p(y)$ has a similar definition of $p(x). $$p(x, y)$ is probability of a symbol combination appears in gene vector X and Y. In this example, there are four kinds of symbol combination $(1, 1)$, $(1, 0)$, $(0, 1)$ and $(0, 0)$.

If we use the binary table to illustrate this equation, the $\eqref{eq:1}$ is:

The $\eqref{eq:3}$ is mathmatically equal to:

Example

We can use R to directly calculate the MI between two gene vectors mentioned above.

  1. Use basic R function
1
2
3
4
5
6
7
8
9
10
x1 <- c(1, 0, 1, 1, 1, 1, 0)
y1 <- c(0, 1, 1, 1, 1, 1, 0)
table(x1, y1)
   y1
x1  0 1
  0 1 1
  1 1 4
# calculate MI
4/7 * log(28/25) + 1/7 * log(7/10) + 1/7 * log(7/10) + 1/7 * log(7/4)
[1] 0.04279723
  1. Use R package bioDist
1
2
3
4
library(bioDist)
mutualInfo(rbind(x1, y1))
           x1
y1 0.04279723

Reference

Update record

02/11/2016

Comments