dcNaivePredict
is supposed to perform naive prediction from
input known annotations. For each gene/protein, a term to be predicted
are simply the frequency of that term appearing in the known
annotations.
dcNaivePredict(data, GSP.file, output.file = NULL, ontology = c(NA, "GOBP", "GOMF", "GOCC", "DO", "HPPA", "HPMI", "HPON", "MP", "EC", "KW", "UP"), max.num = 1000, verbose = T, RData.ontology.customised = NULL, RData.location = "https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR")
RData.ontology.customised
below)dcBuildOnto
for
how to creat this objectRData.location="."
. If RData to load is already part of package
itself, this parameter can be ignored (since this function will try to
load it via function data
first). Here is the UNIX command for
downloading all RData files (preserving the directory structure):
wget -r -l2 -A "*.RData" -np -nH --cut-dirs=0
"http://dcgor.r-forge.r-project.org/data"
a data frame containing three columns: 1st column the same as the input file (e.g. 'SeqID'), 2nd for 'Term' (predicted ontology terms), 3rd for 'Score' (along with predicted scores)
When 'output.file' is specified, a tab-delimited text file is written out, with the column names: 1st column the same as the input file (e.g. 'SeqID'), 2nd for 'Term' (predicted ontology terms), 3rd for 'Score' (along with predicted scores).
# 1) prepare genes to be predicted input.file <- "http://dcgor.r-forge.r-project.org/data/Algo/HP_anno.txt" #input.file <- "http://dcgor.r-forge.r-project.org/data/Algo/SCOP_architecture.txt" input <- utils::read.delim(input.file, header=TRUE, sep="\t", colClasses="character") data <- unique(input[,1]) # 2) do naive prediction GSP.file <- "http://dcgor.r-forge.r-project.org/data/Algo/HP_anno.txt" res <- dcNaivePredict(data=data, GSP.file=GSP.file, ontology="HPPA")Start at 2015-07-23 12:56:25 First, load the ontology 'HPPA' (2015-07-23 12:56:25) ... 'onto.HPPA' (from package 'dcGOR' version 1.0.5) has been loaded into the working environment Second, import files for GSP (2015-07-23 12:56:25) ... Third, propagate GSP annotations (2015-07-23 12:56:26) ... At level 16, there are 2 nodes, and 5 incoming neighbors. At level 15, there are 7 nodes, and 9 incoming neighbors. At level 14, there are 21 nodes, and 42 incoming neighbors. At level 13, there are 54 nodes, and 82 incoming neighbors. At level 12, there are 105 nodes, and 105 incoming neighbors. At level 11, there are 274 nodes, and 188 incoming neighbors. At level 10, there are 463 nodes, and 294 incoming neighbors. At level 9, there are 782 nodes, and 441 incoming neighbors. At level 8, there are 1004 nodes, and 538 incoming neighbors. At level 7, there are 1182 nodes, and 581 incoming neighbors. At level 6, there are 1295 nodes, and 527 incoming neighbors. At level 5, there are 940 nodes, and 290 incoming neighbors. At level 4, there are 408 nodes, and 99 incoming neighbors. At level 3, there are 114 nodes, and 21 incoming neighbors. At level 2, there are 21 nodes, and 1 incoming neighbors. At level 1, there are 1 nodes, and 0 incoming neighbors. There are 6673 terms in GSP (2015-07-23 12:57:15). Fourth, do naive predictions for 3085 genes/proteins (2015-07-23 12:57:15) ... Focus on top 1000 predicted terms for each gene/protein End at 2015-07-23 12:57:18 Runtime in total is: 53 secsres[1:10,]SeqID Term Score [1,] "10225" "HP:0000118" "1" [2,] "10806" "HP:0000707" "0.6591" [3,] "11020" "HP:0012638" "0.6053" [4,] "1131" "HP:0000924" "0.5299" [5,] "123016" "HP:0000152" "0.5217" [6,] "129880" "HP:0011842" "0.5171" [7,] "1312" "HP:0000234" "0.5138" [8,] "1376" "HP:0000478" "0.5098" [9,] "139285" "HP:0012639" "0.4675" [10,] "145173" "HP:0000271" "0.4583"# 3) calculate Precision and Recall res_PR <- dcAlgoPredictPR(GSP.file=GSP.file, prediction.file=res, ontology="HPPA")Start at 2015-07-23 12:57:18 First, load the ontology 'HPPA' (2015-07-23 12:57:18) ... 'onto.HPPA' (from package 'dcGOR' version 1.0.5) has been loaded into the working environment Second, import files for GSP and predictions (2015-07-23 12:57:18) ... Third, propagate GSP annotations (2015-07-23 12:57:20) ... At level 16, there are 2 nodes, and 5 incoming neighbors. At level 15, there are 7 nodes, and 9 incoming neighbors. At level 14, there are 21 nodes, and 42 incoming neighbors. At level 13, there are 54 nodes, and 82 incoming neighbors. At level 12, there are 105 nodes, and 105 incoming neighbors. At level 11, there are 274 nodes, and 188 incoming neighbors. At level 10, there are 463 nodes, and 294 incoming neighbors. At level 9, there are 782 nodes, and 441 incoming neighbors. At level 8, there are 1004 nodes, and 538 incoming neighbors. At level 7, there are 1182 nodes, and 581 incoming neighbors. At level 6, there are 1295 nodes, and 527 incoming neighbors. At level 5, there are 940 nodes, and 290 incoming neighbors. At level 4, there are 408 nodes, and 99 incoming neighbors. At level 3, there are 114 nodes, and 21 incoming neighbors. At level 2, there are 21 nodes, and 1 incoming neighbors. At level 1, there are 1 nodes, and 0 incoming neighbors. There are 3048 genes/proteins in GSP (2015-07-23 12:58:08). Fourth, process input predictions (2015-07-23 12:58:08) ... There are 3085 genes/proteins in predictions (2015-07-23 12:58:14). Fifth, calculate the precision and recall for each of 3048 predicted and GSP genes/proteins (2015-07-23 12:58:14). Finally, calculate the averaged precision and recall (2015-07-23 12:58:16). In summary, Prediction coverage: 1.00 (amongst 3048 targets in GSP), and F-measure: 0.35. End at 2015-07-23 12:58:16 Runtime in total is: 58 secsres_PRPrecision Recall 1 1.00000000 0.04034783 0.901476 1.00000000 0.04034783 0.802952 1.00000000 0.04034783 0.704428 1.00000000 0.04034783 0.605904 0.82956037 0.05303582 0.50738 0.60707841 0.09848956 0.408856 0.52151001 0.15324301 0.310332 0.44000040 0.23356765 0.211808 0.32728481 0.36977436 0.113284 0.23301021 0.53028895 0.01476 0.07161905 0.84493817# 4) plot PR-curve plot(res_PR[,2], res_PR[,1], xlim=c(0,1), ylim=c(0,1), type="b", xlab="Recall", ylab="Precision")
dcNaivePredict.r
dcNaivePredict.Rd
dcNaivePredict.pdf
dcRDataLoader
, dcAlgoPropagate