This document lists the input parameters expected / accepted in the CALANGO definition files (or, alternatively, in the defs list).

## General parameters

#### annotation.files.dir

Type: character string

Description: path to the directory where annotation files are located

Required: YES

Default: none

#### output.dir

Type: character string

Description: path to the output directory where results should be saved

Required: YES

Default: none

#### dataset.info

Type: character string

Description: path to a file containing the genome metadata. It should contain at least, for each genome: (1) path for annotation data; (2) phenotype data (numeric); (3) normalization data (numeric) It must be a tab-separated value file with no column headers.

Required: YES

Default: none

#### x.column

Type: integer/numeric

Description: index of the column from the file specified in dataset.info containing the phenotype data, which will be used to sort the genomes and find annotation terms associated to that phenotype.

Required: YES

Default: none

#### short.name.column

Type: integer/numeric

Description: index of the column from the file specified in dataset.info containing the short names for species/lineages to be used when plotting data.

Required: YES

Default: none

#### group.column

Type: integer/numeric

Description: index of the column from the file specified in dataset.info containing the group to be used for coloring the heatmaps

Required: YES

Default: none

#### ontology

Type: character string.

Description: which dictionary data type to use? Accepts “GO” or “other”

Required: YES

Default: none

#### dict.path

Type: character string

Description: path to dictionary file (a two-column tab-separated value file containing annotation IDs and their descriptions). Not needed if ontology = "GO".

Required: NO

Default: none

#### column

Type: character string

Description: the name of the column in the annotation file that should be used.

Required: YES

Default: none

#### denominator.column

Type: integer/numeric

Description: index of the column from the file specified in dataset.info containing the normalization data.

Required: NO

Default: none

#### tree.path

Type: character string

Description: path to the tree file.

Required: YES

Default: none

#### tree.type

Type: character string

Description: tree file type. Accepts “nexus” or “newick”. Case-sensitive.

Required: YES

Default: none

#### type

Type: character string

Description: type of analysis to perform. Currently accepts only “correlation”

Required: YES

Default: none

#### MHT.method

Type: character string

Description: type of multiple hypothesis testing correction to apply. Accepts all methods listed in stats::p.adjust.methods.

Required: NO

Default: “BH”

#### cores

Type: integer/numeric

Description: Number of cores to use. Must be a positive integer.

Required: NO

Default: 1

## Cutoff values

Cutoffs are used to regulate how much graphical output is produced by CALANGO. The tab-separated value files that are generated at the end of the analysis (and saved in the output.dir) will always contain all, unfiltered results.

q-value cutoffs are used for correlation and phylogeny-aware linear models. Only entries with q-values smaller than these cutoffs will be shown.

#### spearman.qvalue.cutoff

Type: numeric between 0 and 1

Required: NO

Default: 1

#### pearson.qvalue.cutoff

Type: numeric between 0 and 1

Required: NO

Default: 1

#### kendall.qvalue.cutoff

Type: numeric between 0 and 1

Required: NO

Default: 1

#### linear_model.qvalue.cutoff

Type: numeric between 0 and 1

Required: NO

Default: 1

correlation cutoffs are used to establish thresholds of positive/negative correlation values for the graphical output. Important: these parameters are a bit counter-intuitive. Please check the example below for clarity.

#### spearman.cor.lower.cutoff / spearman.cor.upper.cutoff

Type: numeric values between 0 and 1

Description: Thresholds for Spearman correlation values. The selection criteria is: (Spearman correlation < lower.cutoff) OR (Spearman correlation > upper.cutoff)

Required: NO

Defaults: spearman.cor.upper.cutoff = -1; spearman.cor.lower.cutoff = 1 (i.e., no filtering)

Example 1: If you set spearman.cor.upper.cutoff = 0.8 and spearman.cor.lower.cutoff = -0.8, only pairs with Spearman correlation values smaller than -0.8 OR greater than 0.8 will be shown.

Example 2: If you set spearman.cor.upper.cutoff = 0 and spearman.cor.lower.cutoff = -1, pairs with Spearman correlation values smaller than -1 OR greater than 0 will be shown. Since the Spearman correlation cannot be smaller than -1, this means that only positively correlated pairs will be shown.

Example 3: If you set any values such that spearman.cor.upper.cutoff < spearman.cor.lower.cutoff, all pairs are shown (no filtering is performed).

#### pearson.cor.lower.cutoff / pearson.cor.upper.cutoff

Type: numeric values between 0 and 1

Description: Thresholds for Pearson correlation values. The selection criteria is: (Pearson correlation < lower.cutoff) OR (Pearson correlation > upper.cutoff)

Required: NO

Defaults: pearson.cor.upper.cutoff = -1; pearson.cor.lower.cutoff = 1 (i.e., no filtering)

#### kendall.cor.lower.cutoff / kendall.cor.upper.cutoff

Type: numeric values between 0 and 1

Description: Thresholds for Kendall correlation values. The selection criteria is: (Kendall correlation < lower.cutoff) OR (Kendall correlation > upper.cutoff)

Required: NO

Defaults: kendall.cor.upper.cutoff = -1; kendall.cor.lower.cutoff = 1 (i.e., no filtering)

standard deviation and coefficient of variation cutoffs (only values greater than cutoff will be shown)

#### sd.cutoff

Type: non-negative numeric value

Required: NO

Default: 0

#### cv.cutoff

Type: non-negative numeric value

Required: NO

Default: 0

sum of annotation terms cutoff (only values greater than cutoff will be shown)

#### annotation_size.cutoff

Type: non-negative integer/numeric value

Required: NO

Default: 0

prevalence and heterogeneity cutoffs (only values greater than cutoff will be shown). Prevalence is defined as the percentage of lineages where annotation term was observed at least once. Heterogeneity is defined as the percentage of lineages where annotation term count is different from the median.

#### prevalence.cutoff

Type: numeric value between 0 and 1

Required: NO

Default: 0

#### heterogeneity.cutoff

Type: numeric value between 0 and 1

Required: NO

Default: 0

## Advanced configurations

#### raw_data_sd_filter

Type: character string. Accepts “TRUE” or “FALSE”

Description: If “TRUE” all annotation terms where standard deviation for annotation raw values before normalization is zero are removed. This filter is used to remove the (quite common) bias when QPAL (phenotype) and normalizing factors are strongly associated by chance.

Required: YES

Default: “TRUE”