This function runs the complete workflow of CALANGO and generates the HTML5 output pages and export files.
run_CALANGO(
defs,
type = "correlation",
cores = NULL,
render.report = TRUE,
basedir = ""
)
either a CALANGO-type list object or a path to a text file containing the required definitions (see Details).
type of analysis to perform. Currently only "correlation" is supported.
positive integer, how many CPU cores to use. Setting this parameter overrides any `cores` field from `defs`.
logical: should a HTML5 report be generated?
path to base folder to which all relative paths in `defs` refer to.
Updated `defs` list, containing:
All input parameters originally passed or read from a `defs` file (see **Details**).
Derived fields loaded and preprocessed from the files indicated in `defs`.
Several statistical summaries of the data (used to render the report), including correlations, contrasts, covariances, p-values and other summary statistics.
Results are also saved to files under `defs$output.dir`.
The script expects a `CALANGO`-type list, passed either as an actual list object or as a file path. In the latter case, notice that the file must be a text file with a `field = value` format. Blank likes and lines starting with `#` are ignored. The function expects the input list to have the following fields:
annotation.files.dir
(required, string) - Folder where
annotation files are located.
output.dir
(required, string) - output folder for results
dataset.info
(required, string) - genome metadata file, it
should contain at least:
File names. Please notice this information should be the first column in metadata file;
Phenotype data (numeric, this is the value CALANGO uses to rank species when searching for associations)
Normalization data (numeric, this is the value CALANGO uses as a denominator to compute annotation term frequencies to remove potential biases caused by, for instance, over annotation of model organisms or large differences in the counts of genomic elements). Please notice that CALANGO does not require normalization data for GO, as it computes the total number of GO terms per species and uses it as a normalizing factor.
x.column
(required, numeric) - which column in "dataset.info"
contains the phenotype data?
ontology
(required, string) - which dictionary data type to
use? Possible values are "GO" and "other". For GO, CALANGO can compute
normalization data.
dict.path
(required, string) - file for dictionary file
(two-column file containing annotation IDs and their descriptions. Not
needed for GO.
column
(required, string) - which column in annotation files
should be used (column name)
denominator.column
(optional, numeric) - which column contains
normalization data (column number)
tree.path
(required, string) - path for tree file in either
newick or nexus format
tree.type
(required, string) - tree file type (either "nexus"
or "newick")
cores
(optional, numeric) - how many cores to use? If not
provided the function defaults to 1.
linear.model.cutoff
(required, numeric) - parameter that
regulates how much graphical output is produced. We configure it to
generate plots only for annotation terms with corrected q-values for
phylogenetically independent contrasts (standard: smaller than 0.5).
MHT.method
(optional, string) - type of multiple hypothesis
correction to be used. Accepts all methods listed by
`stats::p.adjust.methods()`. If not provided the function defaults to
"BH".
if (FALSE) { # \dontrun{
## Install any missing BioConductor packages for report generation
## (only needs to be done once)
# CALANGO::install_bioc_dependencies()
# Retrieve example files
basedir <- tempdir()
retrieve_data_files(target.dir = paste0(basedir, "/data"))
defs <- paste0(basedir, "/data/parameters/parameters_domain2GO_count_less_phages.txt")
# Run CALANGO
res <- run_CALANGO(defs, cores = 2)
} # }