dcRWRpipeline
is supposed to estimate sample relationships (ie.
contact strength between samples) from an input domain-sample matrix
and an input graph (such as a domain-domain semantic network). The
pipeline includes: 1) random walk restart (RWR) of the input graph
using the input matrix as seeds; 2) calculation of contact strength
(inner products of RWR-smoothed columns of input matrix); 3) estimation
of the contact signficance by a randomalisation procedure. It supports
two methods how to use RWR: 'direct' for directly applying RWR in the
given seeds; 'indirectly' for first pre-computing affinity matrix of
the input graph, and then deriving the affinity score. Parallel
computing is also supported for Linux or Mac operating systems.
dcRWRpipeline(data, g, method = c("indirect", "direct"), normalise = c("laplacian", "row", "column", "none"), restart = 0.75, normalise.affinity.matrix = c("none", "quantile"), permutation = c("random", "degree"), num.permutation = 100, p.adjust.method = c("BH", "BY", "bonferroni", "holm", "hochberg", "hommel"), adjp.cutoff = 0.05, parallel = TRUE, multicores = NULL, verbose = T)
Dnetwork
source("http://bioconductor.org/biocLite.R");
biocLite(c("foreach","doMC"))
. If not yet installed, this option will
be disabledan object of class "iContact", a list with following components:
ratio
: a symmetric matrix storing ratio (the observed
against the expected) between pairwise samples
zscore
: a symmetric matrix storing zscore between pairwise
samples
pval
: a symmetric matrix storing pvalue between pairwise
samples
adjpval
: a symmetric matrix storing adjusted pvalue
between pairwise samples
icontact
: the constructed contact graph (as an 'igraph'
object) under the cutoff of adjusted value
Amatrix
: a pre-computated affinity matrix when using
'inderect' method; NULL otherwise
call
: the call that produced this result
The choice of which method to use RWR depends on the number of seed sets and the number of permutations for statistical test. If the total product of both numbers are huge, it is better to use 'indrect' method (for a single run).
# 1) load onto.GOMF (as 'Onto' object) g <- dcRDataLoader('onto.GOMF')'onto.GOMF' (from package 'dcGOR' version 1.0.5) has been loaded into the working environment# 2) load SCOP superfamilies annotated by GOMF (as 'Anno' object) Anno <- dcRDataLoader('SCOP.sf2GOMF')'SCOP.sf2GOMF' (from package 'dcGOR' version 1.0.5) has been loaded into the working environment# 3) prepare for ontology appended with annotation information dag <- dcDAGannotate(g, annotations=Anno, path.mode="shortest_paths", verbose=TRUE)At level 15, there are 2 nodes, and 3 incoming neighbors. At level 14, there are 6 nodes, and 7 incoming neighbors. At level 13, there are 8 nodes, and 8 incoming neighbors. At level 12, there are 19 nodes, and 22 incoming neighbors. At level 11, there are 24 nodes, and 28 incoming neighbors. At level 10, there are 53 nodes, and 44 incoming neighbors. At level 9, there are 101 nodes, and 81 incoming neighbors. At level 8, there are 198 nodes, and 147 incoming neighbors. At level 7, there are 367 nodes, and 192 incoming neighbors. At level 6, there are 644 nodes, and 221 incoming neighbors. At level 5, there are 489 nodes, and 167 incoming neighbors. At level 4, there are 246 nodes, and 56 incoming neighbors. At level 3, there are 106 nodes, and 13 incoming neighbors. At level 2, there are 20 nodes, and 1 incoming neighbors. At level 1, there are 1 nodes, and 0 incoming neighbors.# 4) calculate pair-wise semantic similarity between 10 randomly chosen domains alldomains <- unique(unlist(nInfo(dag)$annotations)) domains <- sample(alldomains,10) dnetwork <- dcDAGdomainSim(g=dag, domains=domains, method.domain="BM.average", method.term="Resnik", parallel=FALSE, verbose=TRUE)Start at 2015-07-23 12:58:42 First, extract all annotatable domains (2015-07-23 12:58:42)... there are 10 input domains amongst 1083 annotatable domains Second, pre-compute semantic similarity between 188 terms (forced to be the most specific for each domain) using Resnik method (2015-07-23 12:58:52)... Last, calculate pair-wise semantic similarity between 10 domains using BM.average method (2015-07-23 12:58:57)... 1 out of 10 (2015-07-23 12:58:57) 2 out of 10 (2015-07-23 12:58:57) 3 out of 10 (2015-07-23 12:58:57) 4 out of 10 (2015-07-23 12:58:57) 5 out of 10 (2015-07-23 12:58:57) 6 out of 10 (2015-07-23 12:58:57) 7 out of 10 (2015-07-23 12:58:57) 8 out of 10 (2015-07-23 12:58:57) 9 out of 10 (2015-07-23 12:58:57) Finish at 2015-07-23 12:58:57 Runtime in total is: 15 secsdnetworkAn object of S4 class 'Dnetwork' @adjMatrix: a weighted symmetric matrix of 10 domains X 10 domains @nodeInfo (InfoDataFrame) nodeNames: 48056 48230 57889 ... 52335 57603 (10 total) nodeAttr: id# 5) estimate RWR dating based sample/term relationships # define sets of seeds as data # each seed with equal weight (i.e. all non-zero entries are '1') data <- data.frame(aSeeds=c(1,0,1,0,1), bSeeds=c(0,0,1,0,1)) rownames(data) <- id(dnetwork)[1:5] # calcualte their two contact graph coutput <- dcRWRpipeline(data=data, g=dnetwork, parallel=FALSE)Start at 2015-07-23 12:58:57 First, RWR on input graph (10 nodes and 33 edges) using input matrix (5 rows and 2 columns) as seeds (2015-07-23 12:58:57)... using 'indirect' method to do RWR (2015-07-23 12:58:57)... Second, calculate contact strength (2015-07-23 12:58:57)... Third, generate the distribution of contact strength based on 100 permutations on nodes respecting 'random' (2015-07-23 12:58:57)... 1 out of 100 (2015-07-23 12:58:57) 10 out of 100 (2015-07-23 12:58:57) 20 out of 100 (2015-07-23 12:58:57) 30 out of 100 (2015-07-23 12:58:57) 40 out of 100 (2015-07-23 12:58:57) 50 out of 100 (2015-07-23 12:58:57) 60 out of 100 (2015-07-23 12:58:57) 70 out of 100 (2015-07-23 12:58:57) 80 out of 100 (2015-07-23 12:58:57) 90 out of 100 (2015-07-23 12:58:57) 100 out of 100 (2015-07-23 12:58:57) Last, estimate the significance of contact strength: zscore, pvalue, and BH adjusted-pvalue (2015-07-23 12:58:57)... Also, construct the contact graph under the cutoff 5.0e-02 of adjusted-pvalue (2015-07-23 12:58:57)... Your input object 'icontact' of class 'igraph' has been converted into an object of class 'Cnetwork'. Finish at 2015-07-23 12:58:57 Runtime in total is: 0 secscoutputAn object of S4 class 'Coutput', containing following slots: @ratio: a matrix of 2 X 2, containing ratio @zscore: a matrix of 2 X 2, containing z-scores @pvalue: a matrix of 2 X 2, containing p-values @adjp: a matrix of 2 X 2, containing adjusted p-values @cnetwork: an object of S4 class 'Cnetwork', containing 0 interacting nodes
dcRWRpipeline.r
dcRWRpipeline.Rd
dcRWRpipeline.pdf
dcRDataLoader
, dcDAGannotate
,
dcDAGdomainSim
, dcConverter