Open2

geode(python) を用いたDEG解析手法

TsumuraTsumura

examples/demo.py の中身を見ていく

from geode import *
from pprint import pprint
import numpy as np
import os.path

## read expression data
mat = []
genes = []
with open (os.path.join(os.path.dirname(__file__), 'example_expression_data.txt')) as f:
	next(f)
	for line in f:
		sl = line.strip().split('\t')
		gene = sl[0]
		row = list(map(float, sl[1:]))
		genes.append(gene)
		mat.append(row)
mat = np.array(mat)
col_labels = ['1','1','1','2','2','2']

## compute characteristic direction
chdir_res = chdir(mat, col_labels, genes, calculate_sig=0, nnull=100)
pprint(chdir_res)
# perform PAEA gene-set enrichment analysis
paea_res = paea_wrapper(chdir_res, 'GeneOntology_BP.gmt')
# look at the top enriched GO terms
pprint(paea_res[0:10])

入力

'example_expression_data.txt' はこんな感じ。

genenames	Controll	Controll.1	Controll.2	Pert.	Pert..1	Pert..2
MTERFD2	138.642	167.13	156.199	186.64	122.005	161.382
SCRIB	52.6538	38.9778	68.9632	94.3009	60.6343	99.0118
ZXDC	59.3739	53.9525	55.1033	82.7805	52.77	80.15
MRPL32	333.802	375.288	475.2	477.085	327.193	468.316
WDR69	0.33557	0.614205	0.989874	0.421603	0.890432	1.05624
FOXL1	1.03177	0.90172	1.64425	1.1704	1.33711	1.31237

chdir

chdir_res = chdir(mat, col_labels, genes, calculate_sig=0, nnull=100)

chdir: characteristic direction の略称
change directory と被るからやめてほしい……

def chdir(data, sampleclass, genes, gamma=1., sort=True, calculate_sig=False, nnull=10, sig_only=False, norm_vector=True):
	"""
	Calculate the characteristic direction for a gene expression dataset
	
	Input:
		data: numpy.array, is the data matrix of gene expression where rows correspond to genes and columns correspond to samples
		sampleclass: list or numpy.array, labels of the samples, it has to be consist of 0, 1 and 2, with 0 being columns to be excluded, 1 being control and 2 being perturbation
				example: sampleclass = [1,1,1,2,2,2]
		genes: list or numpy.array, row labels for genes 
		gamma: float, regulaized term. A parameter that smooths the covariance matrix and reduces potential noise in the dataset
		sort: bool, whether to sort the output by the absolute value of chdir
		calculate_sig: bool, whether to calculate the significance of characteristic directions
		nnull: int, number of null characteristic directions to calculate for significance
		sig_only: bool, whether to return only significant genes; active only when calculate_sig is True
		norm_vector: bool, whether to return a characteristic direction vector normalized to unit vector
	Output:
		A list of tuples sorted by the absolute value in descending order characteristic directions of genes.
			If calculate_sig is set to True, each tuple contains a third element which is the ratio of characteristic directions to null ChDir
	"""
……

入力

  • mat: 遺伝子とサンプルに対応する遺伝子発現の行列
  • col_labels: サンプルのラベル。0は除外する列、1はコントロール、2は摂動。
  • genes: 遺伝子名の行ラベル
  • calculate_sig: 特徴方向の有意性を計算するかどうか
  • nnull: 有意性計算のための null 特徴方向の数

出力

[(-0.5769213649565056, 'MCL1'),
 (0.39834648998286887, 'LIMD2'),
……
 (0.001078716752045701, 'FMO4'),
 (0.0005796964007400516, 'SYK')]

遺伝子名とDEGの値を含む tuple の list で返される。

paea_wrapper

paea_res = paea_wrapper(chdir_res, 'GeneOntology_BP.gmt')

Perform principal angle enrichment analysis (PAEA) のラッパー

def paea_wrapper(chdir, gmt_fn, case_sensitive=False, sort=True):
	"""
	A wrapper function for PAEA gene-set enrichment analysis

	Input:
		chdir: characteristic directions computed by chdir function
		gmt_fn: file name of a gene-set library in GMT format
		case_sensitive: whether gene symbols should be considered as case_sensitive
	Output:
		a sorted list of tuples (term, p_val)
	"""
……

入力

  • chdir_res: chdir の出力結果
  • genr_set_file: 使用する遺伝子セットのファイル名。ここではGene Ontologyから取得したらしき遺伝子セットがある。
regulation of blood vessel endothelial cell migration (GO:0043535)	na	AGTR2	PDGFB	APOE	VEGFA	NF1	AMOT	THBS1	FGF2	VASH1	PLG	TGFB1
positive regulation of osteoblast differentiation (GO:0045669)	na	BMP4	ACVR2A	IL6	ACVR2B	IL6ST	BMPR2	IL6R	BMP7	BMPR1B	BMPR1A	ACVR1	BMP6
vesicle docking (GO:0048278)	na	CAV2	RAB8A	CPLX2	PLDN	PLEK	SNPH	RABEPK	VTI1B	VAMP3	YKT6	SNAP25
response to estrogen stimulus (GO:0043627)	na	SOCS2	GPR81	STAT5B	BRCA2	CRIPAK	TGFB1	STAT3	GHRHR	CITED1	BRCA1	GH1	GSTM3	KRT19	CYP27B1	CAV	SLC25A36	GATA3	GHRL	RARA	TFF1	CD24	RNF14	GHR
regulation of cytokine biosynthetic process (GO:0042035)	na	TNF	TLR1	IGF2BP1	TLR3	TNFRSF8	IGF2BP2	TLR4	IGF2BP3	TLR6	TLR7	IL10	TLR8	TLR9	AZU1	APOA2	HMOX1	IL17F	SFTPD	BCL3	IL1B	CD4	SPN	EBI3	IL1A	CD28	BCL10	PRG3	IL27	CD276	INHA	IL21	FOXP3	SIGIRR	INHBB	INHBA	CD86	CD80	ASB1	NLRP12	GHRL	GLMN	IL12B	IRF4	GHSR	ELA2

出力

エンリッチメント解析の結果(GOとp値)のリストが渡される。

[('negative regulation of programmed cell death (GO:0043069)',
  2.5177300868648543e-09),
 ('negative regulation of apoptosis (GO:0043066)', 2.5177300868648543e-09),
 ('regulation of apoptosis (GO:0042981)', 1.0687895787640712e-08),
 ('cellular homeostasis (GO:0019725)', 1.4296576080369608e-07),
 ('homeostatic process (GO:0042592)', 1.4296576080369608e-07),
 ('response to chemical stimulus (GO:0042221)', 2.607392230588065e-07),
 ('IL2', 2.7799212152691344e-06),
 ('metabolic process (GO:0008152)', 0.0027827104848820346),
 ('biological_process (GO:0008150)', 0.0038817674973872563),
 ('translation (GO:0006412)', 0.010044713151421738)]