dcAlgo
is supposed to apply dcGO algorithm to infer
domain-centric ontology from input files. It requires two input files:
1) an annotation file containing annotations between proteins/genes and
ontology terms; 2) an architecture file containing domain architectures
for proteins/genes.
dcAlgo(anno.file, architecture.file, output.file = NULL, ontology = c(NA, "GOBP", "GOMF", "GOCC", "DO", "HPPA", "HPMI", "HPON", "MP", "EC", "KW", "UP"), feature.mode = c("supra", "individual", "comb"), min.overlap = 3, fdr.cutoff = 0.001, hscore.type = c("zscore", "fdr"), parallel = TRUE, multicores = NULL, verbose = T, RData.ontology.customised = NULL, RData.location = "https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR")
RData.ontology.customised
below)-1*log_2(fdr)
)source("http://bioconductor.org/biocLite.R");
biocLite(c("foreach","doMC"))
. If not yet installed, this option will
be disableddcBuildOnto
for
how to creat this objectRData.location="."
. If RData to load is already part of package
itself, this parameter can be ignored (since this function will try to
load it via function data
first). Here is the UNIX command for
downloading all RData files (preserving the directory structure):
wget -r -l2 -A "*.RData" -np -nH --cut-dirs=0
"http://dcgor.r-forge.r-project.org/data"
a data frame containing three columns: 1st column 'Feature_id' for features, 2nd 'Term_id' for terms, and 3rd 'Score' for the hypergeometric score indicative of strength of associations beteen features and terms
When 'output.file' is specified, a tab-delimited text file is output, with the column names: 1st column 'Feature_id' for features, 2nd 'Term_id' for terms, and 3rd 'Score' for the hypergeometric score indicative of strength of associations beteen features and terms
# 1) Prepare input file: anno.file and architecture.file anno.file <- "http://dcgor.r-forge.r-project.org/data/Algo/HP_anno.txt" architecture.file <- "http://dcgor.r-forge.r-project.org/data/Algo/SCOP_architecture.txt" # 2) Do inference using built-in ontology res <- dcAlgo(anno.file, architecture.file, ontology="HPPA", feature.mode="supra", parallel=FALSE)Start at 2015-07-23 12:25:16 First, load the ontology 'HPPA' (2015-07-23 12:25:16) ... 'onto.HPPA' (from package 'dcGOR' version 1.0.5) has been loaded into the working environment Second, import files for annotations 'http://dcgor.r-forge.r-project.org/data/Algo/HP_anno.txt' and architectures 'http://dcgor.r-forge.r-project.org/data/Algo/SCOP_architecture.txt' (2015-07-23 12:25:16) ... Third, propagate annotations (2015-07-23 12:25:17) ... At level 16, there are 2 nodes, and 5 incoming neighbors. At level 15, there are 7 nodes, and 9 incoming neighbors. At level 14, there are 21 nodes, and 42 incoming neighbors. At level 13, there are 54 nodes, and 82 incoming neighbors. At level 12, there are 105 nodes, and 105 incoming neighbors. At level 11, there are 274 nodes, and 188 incoming neighbors. At level 10, there are 463 nodes, and 294 incoming neighbors. At level 9, there are 782 nodes, and 441 incoming neighbors. At level 8, there are 1004 nodes, and 538 incoming neighbors. At level 7, there are 1182 nodes, and 581 incoming neighbors. At level 6, there are 1295 nodes, and 527 incoming neighbors. At level 5, there are 940 nodes, and 290 incoming neighbors. At level 4, there are 408 nodes, and 99 incoming neighbors. At level 3, there are 114 nodes, and 21 incoming neighbors. At level 2, there are 21 nodes, and 1 incoming neighbors. At level 1, there are 1 nodes, and 0 incoming neighbors. There are 6673 terms used (2015-07-23 12:26:26). Fourth, define groups using feature mode 'supra' (2015-07-23 12:26:26) ... i) split into features (2015-07-23 12:26:26) ... 1 out of 4351 (2015-07-23 12:26:26) 436 out of 4351 (2015-07-23 12:26:27) 872 out of 4351 (2015-07-23 12:26:27) 1308 out of 4351 (2015-07-23 12:26:27) 1744 out of 4351 (2015-07-23 12:26:28) 2180 out of 4351 (2015-07-23 12:26:28) 2616 out of 4351 (2015-07-23 12:26:28) 3052 out of 4351 (2015-07-23 12:26:28) 3488 out of 4351 (2015-07-23 12:26:29) 3924 out of 4351 (2015-07-23 12:26:29) 4351 out of 4351 (2015-07-23 12:26:29) ii) obtain feature-based groups (2015-07-23 12:26:29) ... 1 out of 4351 (2015-07-23 12:26:29) 436 out of 4351 (2015-07-23 12:26:29) 872 out of 4351 (2015-07-23 12:26:29) 1308 out of 4351 (2015-07-23 12:26:29) 1744 out of 4351 (2015-07-23 12:26:29) 2180 out of 4351 (2015-07-23 12:26:29) 2616 out of 4351 (2015-07-23 12:26:29) 3052 out of 4351 (2015-07-23 12:26:29) 3488 out of 4351 (2015-07-23 12:26:29) 3924 out of 4351 (2015-07-23 12:26:29) 4351 out of 4351 (2015-07-23 12:26:29) There are 2194 features used (2015-07-23 12:26:29). Finally, estimate associations between 2194 features and 6673 terms, with 3 min overlaps and 1.0e-03 fdr cutoff (2015-07-23 12:26:29) ... 1 out of 2194 (2015-07-23 12:26:32) 220 out of 2194 (2015-07-23 12:26:33) 440 out of 2194 (2015-07-23 12:26:34) 660 out of 2194 (2015-07-23 12:26:35) 880 out of 2194 (2015-07-23 12:26:36) 1100 out of 2194 (2015-07-23 12:26:37) 1320 out of 2194 (2015-07-23 12:26:38) 1540 out of 2194 (2015-07-23 12:26:39) 1760 out of 2194 (2015-07-23 12:26:39) 1980 out of 2194 (2015-07-23 12:26:40) 2194 out of 2194 (2015-07-23 12:26:41) End at 2015-07-23 12:26:41 Runtime in total is: 85 secsres[1:5,]Feature_id Term_id Score 1 100895 HP:0004295 5.00 2 103025 HP:0001939 2.00 3 103025 HP:0003011 2.00 4 103025 HP:0011804 0.71 5 103473 HP:0004352 1.40# 3) Advanced usage: using customised ontology x <- base::load(base::url("http://dcgor.r-forge.r-project.org/data/onto.HPPA.RData"))Error: the input does not start with a magic number compatible with loading from a connectionRData.ontology.customised <- 'onto.HPPA.RData' base::save(list=x, file=RData.ontology.customised)Error in base::save(list = x, file = RData.ontology.customised): object 'x' not found#list.files(pattern='*.RData') ## you will see an RData file 'onto.HPPA.RData' in local directory res <- dcAlgo(anno.file, architecture.file, feature.mode="supra", parallel=FALSE, RData.ontology.customised=RData.ontology.customised)Start at 2015-07-23 12:26:41 First, load customised ontology 'onto.HPPA.RData' (2015-07-23 12:26:41)... Second, import files for annotations 'http://dcgor.r-forge.r-project.org/data/Algo/HP_anno.txt' and architectures 'http://dcgor.r-forge.r-project.org/data/Algo/SCOP_architecture.txt' (2015-07-23 12:26:42) ... Third, propagate annotations (2015-07-23 12:26:42) ... At level 16, there are 2 nodes, and 5 incoming neighbors. At level 15, there are 7 nodes, and 9 incoming neighbors. At level 14, there are 21 nodes, and 42 incoming neighbors. At level 13, there are 54 nodes, and 82 incoming neighbors. At level 12, there are 105 nodes, and 105 incoming neighbors. At level 11, there are 274 nodes, and 188 incoming neighbors. At level 10, there are 463 nodes, and 294 incoming neighbors. At level 9, there are 782 nodes, and 441 incoming neighbors. At level 8, there are 1004 nodes, and 538 incoming neighbors. At level 7, there are 1182 nodes, and 581 incoming neighbors. At level 6, there are 1295 nodes, and 527 incoming neighbors. At level 5, there are 940 nodes, and 290 incoming neighbors. At level 4, there are 408 nodes, and 99 incoming neighbors. At level 3, there are 114 nodes, and 21 incoming neighbors. At level 2, there are 21 nodes, and 1 incoming neighbors. At level 1, there are 1 nodes, and 0 incoming neighbors. There are 6673 terms used (2015-07-23 12:27:53). Fourth, define groups using feature mode 'supra' (2015-07-23 12:27:53) ... i) split into features (2015-07-23 12:27:53) ... 1 out of 4351 (2015-07-23 12:27:53) 436 out of 4351 (2015-07-23 12:27:53) 872 out of 4351 (2015-07-23 12:27:54) 1308 out of 4351 (2015-07-23 12:27:54) 1744 out of 4351 (2015-07-23 12:27:54) 2180 out of 4351 (2015-07-23 12:27:55) 2616 out of 4351 (2015-07-23 12:27:55) 3052 out of 4351 (2015-07-23 12:27:55) 3488 out of 4351 (2015-07-23 12:27:56) 3924 out of 4351 (2015-07-23 12:27:56) 4351 out of 4351 (2015-07-23 12:27:56) ii) obtain feature-based groups (2015-07-23 12:27:56) ... 1 out of 4351 (2015-07-23 12:27:56) 436 out of 4351 (2015-07-23 12:27:56) 872 out of 4351 (2015-07-23 12:27:56) 1308 out of 4351 (2015-07-23 12:27:56) 1744 out of 4351 (2015-07-23 12:27:56) 2180 out of 4351 (2015-07-23 12:27:56) 2616 out of 4351 (2015-07-23 12:27:56) 3052 out of 4351 (2015-07-23 12:27:56) 3488 out of 4351 (2015-07-23 12:27:56) 3924 out of 4351 (2015-07-23 12:27:56) 4351 out of 4351 (2015-07-23 12:27:56) There are 2194 features used (2015-07-23 12:27:56). Finally, estimate associations between 2194 features and 6673 terms, with 3 min overlaps and 1.0e-03 fdr cutoff (2015-07-23 12:27:56) ... 1 out of 2194 (2015-07-23 12:27:59) 220 out of 2194 (2015-07-23 12:28:00) 440 out of 2194 (2015-07-23 12:28:02) 660 out of 2194 (2015-07-23 12:28:02) 880 out of 2194 (2015-07-23 12:28:03) 1100 out of 2194 (2015-07-23 12:28:04) 1320 out of 2194 (2015-07-23 12:28:05) 1540 out of 2194 (2015-07-23 12:28:06) 1760 out of 2194 (2015-07-23 12:28:07) 1980 out of 2194 (2015-07-23 12:28:08) 2194 out of 2194 (2015-07-23 12:28:09) End at 2015-07-23 12:28:09 Runtime in total is: 88 secsres[1:5,]Feature_id Term_id Score 1 100895 HP:0004295 5.00 2 103025 HP:0001939 2.00 3 103025 HP:0003011 2.00 4 103025 HP:0011804 0.71 5 103473 HP:0004352 1.40
dcAlgo.r
dcAlgo.Rd
dcAlgo.pdf
dcRDataLoader
, dcSplitArch
,
dcConverter
, dcDuplicated
,
dcAlgoPropagate