dcAlgoPredictMain
is supposed to predict ontology terms given an
input file containing domain architectures (including individual
domains).
dcAlgoPredictMain(input.file, output.file = NULL, RData.HIS = c(NA, "Feature2GOBP.sf", "Feature2GOMF.sf", "Feature2GOCC.sf", "Feature2HPPA.sf", "Feature2GOBP.pfam", "Feature2GOMF.pfam", "Feature2GOCC.pfam", "Feature2HPPA.pfam", "Feature2GOBP.interpro", "Feature2GOMF.interpro", "Feature2GOCC.interpro", "Feature2HPPA.interpro"), merge.method = c("sum", "max", "sequential"), scale.method = c("log", "linear", "none"), feature.mode = c("supra", "individual", "comb"), slim.level = NULL, max.num = NULL, parallel = TRUE, multicores = NULL, verbose = T, RData.HIS.customised = NULL, RData.location = "https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR")
RData.HIS.customised
below)\sum_{i=1}{\frac{R_{i}}{i}}
,
where R_{i}
is the i^{th}
ranked highest hscore\frac{S - S_{min}}{S_{max} - S_{min}}
, where
S_{min}
and S_{max}
are the minimum and maximum values for
S
slim.level
source("http://bioconductor.org/biocLite.R");
biocLite(c("foreach","doMC"))
. If not yet installed, this option will
be disableddcAlgoPropagate
on how this object is createdRData.location="."
. If RData to load is already part of package
itself, this parameter can be ignored (since this function will try to
load it via function data
first). Here is the UNIX command for
downloading all RData files (preserving the directory structure):
wget -r -l2 -A "*.RData" -np -nH --cut-dirs=0
"http://dcgor.r-forge.r-project.org/data"
a data frame containing three columns: 1st column the same as the input file (e.g. 'SeqID'), 2nd for 'Term' (predicted ontology terms), 3rd for 'Score' (along with predicted scores)
When 'output.file' is specified, a tab-delimited text file is written out, with the column names: 1st column the same as the input file (e.g. 'SeqID'), 2nd for 'Term' (predicted ontology terms), 3rd for 'Score' (along with predicted scores)
# 1) Prepare an input file containing domain architectures input.file <- "http://dcgor.r-forge.r-project.org/data/Feature/hs.txt" # 2) Do prediction using built-in data output <- dcAlgoPredictMain(input.file, RData.HIS="Feature2GOMF.sf", parallel=FALSE)Start at 2015-07-23 12:32:07 Read the input file 'http://dcgor.r-forge.r-project.org/data/Feature/hs.txt' ... Predictions for 99458 sequences (with 7644 distinct architectures) using 'Feature2GOMF.sf' RData, 'sum' merge method, 'log' scale method and 'supra' feature mode (2015-07-23 12:32:08) ... ############################## 'dcAlgoPredict' is being called... ############################## Start at 2015-07-23 12:32:08 Load the HIS object 'Feature2GOMF.sf' (2015-07-23 12:32:08) ... 'Feature2GOMF.sf' (from https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR/Feature2GOMF.sf.RData?raw=true) has been loaded into the working environment Predictions for 7644 architectures using 'sum' merge method, 'log' scale method and 'supra' feature mode (2015-07-23 12:32:09)... 1 out of 7644 (2015-07-23 12:32:09) 765 out of 7644 (2015-07-23 12:32:11) 1530 out of 7644 (2015-07-23 12:32:13) 2295 out of 7644 (2015-07-23 12:32:16) 3060 out of 7644 (2015-07-23 12:32:19) 3825 out of 7644 (2015-07-23 12:32:21) 4590 out of 7644 (2015-07-23 12:32:23) 5355 out of 7644 (2015-07-23 12:32:26) 6120 out of 7644 (2015-07-23 12:32:28) 6885 out of 7644 (2015-07-23 12:32:30) 7644 out of 7644 (2015-07-23 12:32:32) End at 2015-07-23 12:32:32 Runtime in total is: 24 secs ############################## 'dcAlgoPredict' has been completed! ############################## Preparations for output (2015-07-23 12:32:32)... End at 2015-07-23 12:32:37 Runtime in total is: 30 secsoutput[1:5,]SeqID Term Score [1,] "ENSP00000477790" "GO:0003674" "1" [2,] "ENSP00000477790" "GO:0005488" "0.9808" [3,] "ENSP00000477790" "GO:0003823" "0.9667" [4,] "ENSP00000477790" "GO:0004872" "0.8886" [5,] "ENSP00000477790" "GO:0060089" "0.8454"# 3) Advanced usage: using customised data x <- base::load(base::url("http://dcgor.r-forge.r-project.org/data/Feature2GOMF.sf.RData"))Error: the input does not start with a magic number compatible with loading from a connectionRData.HIS.customised <- 'Feature2GOMF.sf.RData' base::save(list=x, file=RData.HIS.customised)Error in base::save(list = x, file = RData.HIS.customised): object 'x' not found#list.files(pattern='*.RData') ## you will see an RData file 'Feature2GOMF.sf.RData' in local directory output <- dcAlgoPredictMain(input.file, parallel=FALSE, RData.HIS.customised=RData.HIS.customised)Start at 2015-07-23 12:32:37 Read the input file 'http://dcgor.r-forge.r-project.org/data/Feature/hs.txt' ... Predictions for 99458 sequences (with 7644 distinct architectures) using 'Feature2GOMF.sf.RData' RData, 'sum' merge method, 'log' scale method and 'supra' feature mode (2015-07-23 12:32:38) ... ############################## 'dcAlgoPredict' is being called... ############################## Start at 2015-07-23 12:32:38 Load the customised HIS object 'Feature2GOMF.sf.RData' (2015-07-23 12:32:38)... Predictions for 7644 architectures using 'sum' merge method, 'log' scale method and 'supra' feature mode (2015-07-23 12:32:38)... 1 out of 7644 (2015-07-23 12:32:38) 765 out of 7644 (2015-07-23 12:32:40) 1530 out of 7644 (2015-07-23 12:32:43) 2295 out of 7644 (2015-07-23 12:32:45) 3060 out of 7644 (2015-07-23 12:32:47) 3825 out of 7644 (2015-07-23 12:32:49) 4590 out of 7644 (2015-07-23 12:32:52) 5355 out of 7644 (2015-07-23 12:32:54) 6120 out of 7644 (2015-07-23 12:32:57) 6885 out of 7644 (2015-07-23 12:32:59) 7644 out of 7644 (2015-07-23 12:33:01) End at 2015-07-23 12:33:01 Runtime in total is: 23 secs ############################## 'dcAlgoPredict' has been completed! ############################## Preparations for output (2015-07-23 12:33:01)... End at 2015-07-23 12:33:06 Runtime in total is: 29 secsoutput[1:5,]SeqID Term Score [1,] "ENSP00000477790" "GO:0003674" "1" [2,] "ENSP00000477790" "GO:0005488" "0.9808" [3,] "ENSP00000477790" "GO:0003823" "0.9667" [4,] "ENSP00000477790" "GO:0004872" "0.8886" [5,] "ENSP00000477790" "GO:0060089" "0.8454"
dcAlgoPredictMain.r
dcAlgoPredictMain.Rd
dcAlgoPredictMain.pdf
dcRDataLoader
, dcAlgoPropagate
,
dcAlgoPredict