Open2
geode(python) を用いたDEG解析手法
examples/demo.py
の中身を見ていく
from geode import *
from pprint import pprint
import numpy as np
import os.path
## read expression data
mat = []
genes = []
with open (os.path.join(os.path.dirname(__file__), 'example_expression_data.txt')) as f:
next(f)
for line in f:
sl = line.strip().split('\t')
gene = sl[0]
row = list(map(float, sl[1:]))
genes.append(gene)
mat.append(row)
mat = np.array(mat)
col_labels = ['1','1','1','2','2','2']
## compute characteristic direction
chdir_res = chdir(mat, col_labels, genes, calculate_sig=0, nnull=100)
pprint(chdir_res)
# perform PAEA gene-set enrichment analysis
paea_res = paea_wrapper(chdir_res, 'GeneOntology_BP.gmt')
# look at the top enriched GO terms
pprint(paea_res[0:10])
入力
'example_expression_data.txt'
はこんな感じ。
genenames Controll Controll.1 Controll.2 Pert. Pert..1 Pert..2
MTERFD2 138.642 167.13 156.199 186.64 122.005 161.382
SCRIB 52.6538 38.9778 68.9632 94.3009 60.6343 99.0118
ZXDC 59.3739 53.9525 55.1033 82.7805 52.77 80.15
MRPL32 333.802 375.288 475.2 477.085 327.193 468.316
WDR69 0.33557 0.614205 0.989874 0.421603 0.890432 1.05624
FOXL1 1.03177 0.90172 1.64425 1.1704 1.33711 1.31237
chdir
chdir_res = chdir(mat, col_labels, genes, calculate_sig=0, nnull=100)
chdir: characteristic direction の略称
change directory と被るからやめてほしい……
def chdir(data, sampleclass, genes, gamma=1., sort=True, calculate_sig=False, nnull=10, sig_only=False, norm_vector=True):
"""
Calculate the characteristic direction for a gene expression dataset
Input:
data: numpy.array, is the data matrix of gene expression where rows correspond to genes and columns correspond to samples
sampleclass: list or numpy.array, labels of the samples, it has to be consist of 0, 1 and 2, with 0 being columns to be excluded, 1 being control and 2 being perturbation
example: sampleclass = [1,1,1,2,2,2]
genes: list or numpy.array, row labels for genes
gamma: float, regulaized term. A parameter that smooths the covariance matrix and reduces potential noise in the dataset
sort: bool, whether to sort the output by the absolute value of chdir
calculate_sig: bool, whether to calculate the significance of characteristic directions
nnull: int, number of null characteristic directions to calculate for significance
sig_only: bool, whether to return only significant genes; active only when calculate_sig is True
norm_vector: bool, whether to return a characteristic direction vector normalized to unit vector
Output:
A list of tuples sorted by the absolute value in descending order characteristic directions of genes.
If calculate_sig is set to True, each tuple contains a third element which is the ratio of characteristic directions to null ChDir
"""
……
入力
- mat: 遺伝子とサンプルに対応する遺伝子発現の行列
- col_labels: サンプルのラベル。0は除外する列、1はコントロール、2は摂動。
- genes: 遺伝子名の行ラベル
- calculate_sig: 特徴方向の有意性を計算するかどうか
- nnull: 有意性計算のための null 特徴方向の数
出力
[(-0.5769213649565056, 'MCL1'),
(0.39834648998286887, 'LIMD2'),
……
(0.001078716752045701, 'FMO4'),
(0.0005796964007400516, 'SYK')]
遺伝子名とDEGの値を含む tuple の list で返される。
paea_wrapper
paea_res = paea_wrapper(chdir_res, 'GeneOntology_BP.gmt')
Perform principal angle enrichment analysis (PAEA) のラッパー
def paea_wrapper(chdir, gmt_fn, case_sensitive=False, sort=True):
"""
A wrapper function for PAEA gene-set enrichment analysis
Input:
chdir: characteristic directions computed by chdir function
gmt_fn: file name of a gene-set library in GMT format
case_sensitive: whether gene symbols should be considered as case_sensitive
Output:
a sorted list of tuples (term, p_val)
"""
……
入力
- chdir_res: chdir の出力結果
- genr_set_file: 使用する遺伝子セットのファイル名。ここではGene Ontologyから取得したらしき遺伝子セットがある。
regulation of blood vessel endothelial cell migration (GO:0043535) na AGTR2 PDGFB APOE VEGFA NF1 AMOT THBS1 FGF2 VASH1 PLG TGFB1
positive regulation of osteoblast differentiation (GO:0045669) na BMP4 ACVR2A IL6 ACVR2B IL6ST BMPR2 IL6R BMP7 BMPR1B BMPR1A ACVR1 BMP6
vesicle docking (GO:0048278) na CAV2 RAB8A CPLX2 PLDN PLEK SNPH RABEPK VTI1B VAMP3 YKT6 SNAP25
response to estrogen stimulus (GO:0043627) na SOCS2 GPR81 STAT5B BRCA2 CRIPAK TGFB1 STAT3 GHRHR CITED1 BRCA1 GH1 GSTM3 KRT19 CYP27B1 CAV SLC25A36 GATA3 GHRL RARA TFF1 CD24 RNF14 GHR
regulation of cytokine biosynthetic process (GO:0042035) na TNF TLR1 IGF2BP1 TLR3 TNFRSF8 IGF2BP2 TLR4 IGF2BP3 TLR6 TLR7 IL10 TLR8 TLR9 AZU1 APOA2 HMOX1 IL17F SFTPD BCL3 IL1B CD4 SPN EBI3 IL1A CD28 BCL10 PRG3 IL27 CD276 INHA IL21 FOXP3 SIGIRR INHBB INHBA CD86 CD80 ASB1 NLRP12 GHRL GLMN IL12B IRF4 GHSR ELA2
出力
エンリッチメント解析の結果(GOとp値)のリストが渡される。
[('negative regulation of programmed cell death (GO:0043069)',
2.5177300868648543e-09),
('negative regulation of apoptosis (GO:0043066)', 2.5177300868648543e-09),
('regulation of apoptosis (GO:0042981)', 1.0687895787640712e-08),
('cellular homeostasis (GO:0019725)', 1.4296576080369608e-07),
('homeostatic process (GO:0042592)', 1.4296576080369608e-07),
('response to chemical stimulus (GO:0042221)', 2.607392230588065e-07),
('IL2', 2.7799212152691344e-06),
('metabolic process (GO:0008152)', 0.0027827104848820346),
('biological_process (GO:0008150)', 0.0038817674973872563),
('translation (GO:0006412)', 0.010044713151421738)]