一个最简单创建R包的流程。

1. 载入工具包

安装并载入devtools包和roxygen2包。devtools包提供了一些包的检查、安装和打包的基本工具。roxygen2包则使得书写R帮助文档变得轻松简单。如果习惯使用Emacs,可以结合ESS建立R包,可以将R代码和帮助文档有效组合在一起,便于管理。当然,也可以使用Rstudio

1
2
library('devtools')
library('roxygen2')

Mapping data onto a Circos figure requires that you identify what patterns in your data are (a) likely to be important and (b) likely to be present, and create a figure that exposes such patterns. Remember, if the pattern exists, you can’t afford to miss it. If it doesn’t exist, you can’t afford to be fooled into thinking that it’s there, or left wondering whether it’s occluded by other data.

1. Run circos

1
2
3
4
5
6
bin/circos
-png
-svg
-conf etc/circos.conf
-outputdir /path/to/your/output/directory
-outputfile yourimage.png

$\newcommand{\entropfrac}[2]{\frac{#1}{#2} \log \left( \frac{#1}{#2} \right)}$

Mututal Information (MI)

Introduction

Mutual Information (MI) distance is used to measure the distance between two genes vectors, for example $x_1 = {1, 0, 1, 1, 1, 1, 0}$ and $y_1 = {0, 1, 1, 1, 1, 1, 0}$. It is easily to transfer the two vectors into a binary table:


X/Y 1(Presence) 0(Absence) Sum
1(Presence) a b a+b
0(Absence) c d c+d
Sum a+c b+d n=a+b+c+d

R Package ggplot2 Notes

1. Basic grammar

1.1 Plot types

The R package ggplot2 is a famous plot tool for high quality scientific figures. The ggplot2 style figures are widely seen in papers published in high quality journals such as PNAS, Nature and Cell.

The input data should be in data frame form, and it is easily to use the function as.data.frame(). “+” is used to connect different plot statement. A typical ggplot2 plot statement is like:

EM算法的推出

考虑观测数据$Y=\{y_1, y_2, \dots, y_m\}$,其中不可观测数据为$Z=\{z_1, z_2, \dots, z_k\}$,需要估计的参数为$\theta=\{\theta_1, \theta_2, \dots, \theta_t\}$。$Z$可以是离散或连续型随机变量,以下过程中假设$Z$为离散型($Z$为连续型,则全概率公式由求和改为积分)。则观测数据的对数似然函数为: