dcAlgoPredict
is supposed to predict ontology terms given domain
architectures (including individual domains). It involves 3 steps: 1)
splitting an architecture into individual domains and all possible
consecutive domain combinations (viewed as component features); 2)
merging hscores among component features; 3) scaling merged hscores
into predictive scores across terms.
dcAlgoPredict(data, RData.HIS = c(NA, "Feature2GOBP.sf", "Feature2GOMF.sf", "Feature2GOCC.sf", "Feature2HPPA.sf", "Feature2GOBP.pfam", "Feature2GOMF.pfam", "Feature2GOCC.pfam", "Feature2HPPA.pfam", "Feature2GOBP.interpro", "Feature2GOMF.interpro", "Feature2GOCC.interpro", "Feature2HPPA.interpro"), merge.method = c("sum", "max", "sequential"), scale.method = c("log", "linear", "none"), feature.mode = c("supra", "individual", "comb"), slim.level = NULL, max.num = NULL, parallel = TRUE, multicores = NULL, verbose = T, RData.HIS.customised = NULL, RData.location = "https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR")
RData.HIS.customised
below)\sum_{i=1}{\frac{R_{i}}{i}}
,
where R_{i}
is the i^{th}
ranked highest hscore\frac{S - S_{min}}{S_{max} - S_{min}}
, where
S_{min}
and S_{max}
are the minimum and maximum values for
S
slim.level
source("http://bioconductor.org/biocLite.R");
biocLite(c("foreach","doMC"))
. If not yet installed, this option will
be disableddcAlgoPropagate
on how this object is createdRData.location="."
. If RData to load is already part of package
itself, this parameter can be ignored (since this function will try to
load it via function data
first). Here is the UNIX command for
downloading all RData files (preserving the directory structure):
wget -r -l2 -A "*.RData" -np -nH --cut-dirs=0
"http://dcgor.r-forge.r-project.org/data"
a named list of architectures, each containing predictive scores
none
# 1) randomly generate 5 domains and/or domain architectures x <- dcRDataLoader(RData="Feature2GOMF.sf")'Feature2GOMF.sf' (from https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR/Feature2GOMF.sf.RData?raw=true) has been loaded into the working environmentdata <- sample(names(x$hscore), 5) # 2) get predictive scores of all predicted terms for this domain architecture ## using 'sequential' method (by default) pscore <- dcAlgoPredict(data=data, RData.HIS="Feature2GOMF.sf", parallel=FALSE)Start at 2015-07-23 12:28:12 Load the HIS object 'Feature2GOMF.sf' (2015-07-23 12:28:12) ... 'Feature2GOMF.sf' (from https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR/Feature2GOMF.sf.RData?raw=true) has been loaded into the working environment Predictions for 5 architectures using 'sum' merge method, 'log' scale method and 'supra' feature mode (2015-07-23 12:28:13)... 1 out of 5 (2015-07-23 12:28:13) 2 out of 5 (2015-07-23 12:28:13) 3 out of 5 (2015-07-23 12:28:13) 4 out of 5 (2015-07-23 12:28:13) 5 out of 5 (2015-07-23 12:28:13) End at 2015-07-23 12:28:13 Runtime in total is: 1 secs## using 'max' method pscore_max <- dcAlgoPredict(data=data, RData.HIS="Feature2GOMF.sf", merge.method="max", parallel=FALSE)Start at 2015-07-23 12:28:13 Load the HIS object 'Feature2GOMF.sf' (2015-07-23 12:28:13) ... 'Feature2GOMF.sf' (from https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR/Feature2GOMF.sf.RData?raw=true) has been loaded into the working environment Predictions for 5 architectures using 'max' merge method, 'log' scale method and 'supra' feature mode (2015-07-23 12:28:14)... 1 out of 5 (2015-07-23 12:28:14) 2 out of 5 (2015-07-23 12:28:14) 3 out of 5 (2015-07-23 12:28:14) 4 out of 5 (2015-07-23 12:28:14) 5 out of 5 (2015-07-23 12:28:14) End at 2015-07-23 12:28:14 Runtime in total is: 1 secs## using 'sum' method pscore_sum <- dcAlgoPredict(data=data, RData.HIS="Feature2GOMF.sf", merge.method="sum", parallel=FALSE)Start at 2015-07-23 12:28:14 Load the HIS object 'Feature2GOMF.sf' (2015-07-23 12:28:14) ... 'Feature2GOMF.sf' (from https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR/Feature2GOMF.sf.RData?raw=true) has been loaded into the working environment Predictions for 5 architectures using 'sum' merge method, 'log' scale method and 'supra' feature mode (2015-07-23 12:28:15)... 1 out of 5 (2015-07-23 12:28:15) 2 out of 5 (2015-07-23 12:28:15) 3 out of 5 (2015-07-23 12:28:15) 4 out of 5 (2015-07-23 12:28:15) 5 out of 5 (2015-07-23 12:28:15) End at 2015-07-23 12:28:15 Runtime in total is: 1 secs# 3) advanced usage ## a) focus on those terms at the 2nd level (general) pscore <- dcAlgoPredict(data=data, RData.HIS="Feature2GOMF.sf", slim.level=2, parallel=FALSE)Start at 2015-07-23 12:28:15 Load the HIS object 'Feature2GOMF.sf' (2015-07-23 12:28:15) ... 'Feature2GOMF.sf' (from https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR/Feature2GOMF.sf.RData?raw=true) has been loaded into the working environment Predictions for 5 architectures using 'sum' merge method, 'log' scale method and 'supra' feature mode (2015-07-23 12:28:16)... 1 out of 5 (2015-07-23 12:28:16) 2 out of 5 (2015-07-23 12:28:16) 3 out of 5 (2015-07-23 12:28:16) 4 out of 5 (2015-07-23 12:28:16) 5 out of 5 (2015-07-23 12:28:16) Focus on predicted terms at '2' slim level(s) End at 2015-07-23 12:28:16 Runtime in total is: 1 secs## b) visualise predictive scores in the ontology hierarchy ### load the ontology g <- dcRDataLoader("onto.GOMF", verbose=FALSE) ig <- dcConverter(g, from='Onto', to='igraph', verbose=FALSE) ### do visualisation for the 1st architecture data <- pscore[[1]] subg <- dnet::dDAGinduce(ig, nodes_query=names(data), path.mode="shortest_paths") dnet::visDAG(g=subg, data=data, node.info="term_id")
dcAlgoPredict.r
dcAlgoPredict.Rd
dcAlgoPredict.pdf
dcRDataLoader
, dcSplitArch
,
dcConverter
, dcAlgoPropagate
,
dcAlgoPredictMain
, dcAlgoPredictGenome