dcAlgoPropagate
is supposed to propagate ontology annotations,
given an input file. This input file contains original annotations
between domains/features and ontology terms, along with the
hypergeometric scores (hscore) in support for their annotations. The
annotations are propagated to the ontology root (either retaining the
maximum hscore or additively accumulating the hscore). After the
propogation, the ontology terms of increasing levels are determined
based on the concept of Information Content (IC) to product a slim
version of ontology. It returns an object of S3 class "HIS" with three
components: "hscore", "ic" and "slim".
dcAlgoPropagate(input.file, ontology = c(NA, "GOBP", "GOMF", "GOCC", "DO", "HPPA", "HPMI", "HPON", "MP", "EC", "KW", "UP"), propagation = c("max", "sum"), output.file = "HIS.RData", verbose = T, RData.ontology.customised = NULL, RData.location = "https://github.com/hfang-bristol/RDataCentre/blob/master/dcGOR")
RData.ontology.customised
below)HIS
object as
an RData-formatted file (see 'Value' for details). If NULL, this file
will be saved into "HIS.RData" in the current working local directory.
If NA, there will be no output filedcBuildOnto
for
how to creat this objectRData.location="."
. If RData to load is already part of package
itself, this parameter can be ignored (since this function will try to
load it via function data
first). Here is the UNIX command for
downloading all RData files (preserving the directory structure):
wget -r -l2 -A "*.RData" -np -nH --cut-dirs=0
"http://dcgor.r-forge.r-project.org/data"
an object of S3 class HIS
, with following components:
hscore
: a list of features, each with a term-named vector
containing hscore
ic
: a term-named vector containing information content
(IC). Terms are ordered first by IC and then by longest-path level,
making sure that for terms with the same IC, parental terms always come
first
slim
: a list of four slims, each with a term-named vector
containing information content (IC). Slim '1' for very general terms,
'2' for general terms, '3' for specific terms, '4' for very specific
terms
None
# build an "HIS" object for GO Molecular Function input.file <- "http://dcgor.r-forge.r-project.org/data/Feature/Feature2GO.sf.txt" Feature2GOMF.sf <- dcAlgoPropagate(input.file=input.file, ontology="GOMF", output.file="Feature2GOMF.sf.RData")Start at 2015-07-23 12:34:03 Read the input file 'http://dcgor.r-forge.r-project.org/data/Feature/Feature2GO.sf.txt' (2015-07-23 12:34:03) ... Load the ontology 'GOMF' (2015-07-23 12:34:06) ... 'onto.GOMF' (from package 'dcGOR' version 1.0.5) has been loaded into the working environment Do propagation via 'max' operation (2015-07-23 12:34:11) ... At level 15, there are 3 nodes, and 4 incoming neighbors (2015-07-23 12:34:12). At level 14, there are 6 nodes, and 7 incoming neighbors (2015-07-23 12:34:12). At level 13, there are 10 nodes, and 12 incoming neighbors (2015-07-23 12:34:12). At level 12, there are 24 nodes, and 28 incoming neighbors (2015-07-23 12:34:13). At level 11, there are 32 nodes, and 34 incoming neighbors (2015-07-23 12:34:13). At level 10, there are 76 nodes, and 63 incoming neighbors (2015-07-23 12:34:13). At level 9, there are 132 nodes, and 99 incoming neighbors (2015-07-23 12:34:14). At level 8, there are 270 nodes, and 175 incoming neighbors (2015-07-23 12:34:15). At level 7, there are 459 nodes, and 229 incoming neighbors (2015-07-23 12:34:17). At level 6, there are 842 nodes, and 254 incoming neighbors (2015-07-23 12:34:20). At level 5, there are 568 nodes, and 172 incoming neighbors (2015-07-23 12:34:25). At level 4, there are 273 nodes, and 60 incoming neighbors (2015-07-23 12:34:27). At level 3, there are 120 nodes, and 13 incoming neighbors (2015-07-23 12:34:28). At level 2, there are 20 nodes, and 1 incoming neighbors (2015-07-23 12:34:29). after propagation, there are 6018 features annotated by 2836 terms. Determining IC-based slim levels (2015-07-23 12:34:29) ... 1 level with 6 terms with IC falling around 0.47 (between 0.00 and 0.94). 2 level with 38 terms with IC falling around 1.42 (between 1.18 and 1.65). 3 level with 217 terms with IC falling around 2.36 (between 2.13 and 2.60). 4 level with 838 terms with IC falling around 3.31 (between 3.07 and 3.54). An object of S3 class 'HIS' has been built and saved into '/Users/hfang/Sites/SUPERFAMILY/dcGO/dcGOR/Feature2GOMF.sf.RData'. End at 2015-07-23 12:41:49 Runtime in total is: 466 secsnames(Feature2GOMF.sf)[1] "hscore" "ic" "slim"Feature2GOMF.sf$hscore[1]$`100879` GO:0003674 GO:0003824 GO:0016740 GO:0016772 GO:0016779 GO:0017125 GO:0034061 14.83 14.83 14.83 14.83 14.83 6.85 8.34 GO:0003887 1.84Feature2GOMF.sf$ic[1:10]GO:0003674 GO:0005488 GO:0005515 GO:0003824 GO:0043167 GO:0016787 GO:0016740 0.0000000 0.2416331 0.3770188 0.3901089 0.7416274 0.7521026 0.7922330 GO:0097159 GO:1901363 GO:0043168 0.7980867 0.8008152 0.9179178Feature2GOMF.sf$slim[1]$`1` GO:0003824 GO:0005515 GO:0043167 GO:0097159 GO:1901363 GO:0004872 0.3901089 0.3770188 0.7416274 0.7980867 0.8008152 0.9331151# extract hscore as a matrix with 3 columns (Feature_id, Term_id, Score) hscore <- Feature2GOMF.sf$hscore hscore_mat <- dcList2Matrix(hscore)The input list has been converted into a matrix of 75504 X 3.colnames(hscore_mat) <- c("Feature_id", "Term_id", "Score") dim(hscore_mat)[1] 75504 3hscore_mat[1:10,]Feature_id Term_id Score [1,] "100879" "GO:0003674" "14.83" [2,] "100879" "GO:0003824" "14.83" [3,] "100879" "GO:0016740" "14.83" [4,] "100879" "GO:0016772" "14.83" [5,] "100879" "GO:0016779" "14.83" [6,] "100879" "GO:0017125" "6.85" [7,] "100879" "GO:0034061" "8.34" [8,] "100879" "GO:0003887" "1.84" [9,] "100895" "GO:0003674" "33.55" [10,] "100895" "GO:0003824" "6.73"
dcAlgoPropagate.r
dcAlgoPropagate.Rd
dcAlgoPropagate.pdf
dcRDataLoader
, dcConverter
,
dcAlgo
, dcList2Matrix