Configuring Janis Assistant¶

Warning

Configuring Janis had backwards incompatible changes in v0.12.0 with regards to specific keys.

When we talk about configuring Janis, we’re really talking about how to configure the Janis assistant and hence Cromwell / CWLTool to interact with batch systems (eg: Slurm / PBS Torque) and container environments (Docker / Singularity).

Janis has built in templates for the following compute environments:

Local (Docker or Singularity)
- The only environment compatible with CWLTool.
Slurm (Singularity only)
PBS / Torque (Singularity only)

In an extension of Janis (janis-templates), we’ve produced a number of location specific templates, usually with sensible defaults.

See this list for a full list of templates: https://janis.readthedocs.io/en/latest/templates/index.html.

Syntax¶

The config should be in YAML, and can contain nested dictionaries.

Note

In this guide we might use dots (.) to refer to a nested key. For example, notifications.email refers to the following structure:

notifications:
  email: <value>

Location of config¶

By default,

Your configuration directory is placed at ~/.janis/.
Your configuration path is a file called janis.conf in this directory, eg: ~/.janis/janis.conf

Both janis run / translate allow you to provide a location to a config using the -c / --config parameter.

In addition, you can also configure the following environment variables:

JANIS_CONFIGPATH - simply the path to your config file
JANIS_CONFIGDIR - The configuration path is determined by $JANIS_CONFIGDIR/janis.conf

Initialising a template¶

`bash janis init <template> [...options] `

By default this will place our config at ~/.janis/janis.conf. If there’s already a config there, it will NOT override it.

-o / --output: output path to write to, default (~/.janis/janis.conf).

-f / --force: Overwrite the config if one exists at the output path.

--stdout: Write the output to stdout.

Environment variables¶

The following environment variables allow you to customise the behaviour of Janis, without writing a configuration file. This is a convenient way to automatically configure Janis through an HPCs module system.

All the environment variables start with JANIS_, and is the value of enums declared below (and not the key name), for example:

export JANIS_CROMWELLJAR='/path/to/cromwell.jar'

class janis_assistant.management.envvariables.EnvVariables[source]¶

An enumeration.

config_dir = 'JANIS_CONFIGDIR'¶: (Default: ~/.janis) Directory of default Janis settings

config_path = 'JANIS_CONFIGPATH'¶: (Default: $JANIS_CONFIGDIR/janis.conf) Default configuration file for Janis

cromwelljar = 'JANIS_CROMWELLJAR'¶: Override the Cromwell JAR that Janis uses

default_template = 'JANIS_DEFAULTTEMPLATE'¶: Default template to use, NB this template should have NO required arguments.

exec_dir = 'JANIS_EXCECUTIONDIR'¶: Use this directory for intermediate files

output_dir = 'JANIS_OUTPUTDIR'¶: Use this directory as a BASE to generate a new output directory for each Janis run

recipe_directory = 'JANIS_RECIPEDIRECTORY'¶: Directories for which each file (ending in .yaml | .yml) is a key of input values. See the RECIPES section for more information.

recipe_paths = 'JANIS_RECIPEPATHS'¶: List of YAML recipe files (comma separated) for Janis to consume, See the RECIPES section for more information.

search_path = 'JANIS_SEARCHPATH'¶: Additional search paths (comma separated) to lookup Janis workflows in

Configuration keys¶

We use a class definition to automatically document which keys you can provide to build a Janis configuration. These keys exactly match the keys you should provide in your YAML dictionary. The type is either a Python literal, or a dictionary.

For example, the class definition below corresponds to the following (partial) YAML configuration:

# EXAMPLE CONFIGURATION ONLY
engine: cromwell
run_in_background: false
call_caching_enabled: true
cromwell:
    jar: /Users/franklinmichael/broad/cromwell-53.1.jar
    call_caching_method: fingerprint

class janis_assistant.management.configuration.JanisConfiguration(type)[source]¶

__init__(output_dir: str = None, execution_dir: str = None, call_caching_enabled: bool = True, engine: str = 'cromwell', cromwell: Union[janis_assistant.management.configuration.JanisConfigurationCromwell, dict] = None, template: Union[janis_assistant.management.configuration.JanisConfigurationTemplate, dict] = None, recipes: Union[janis_assistant.management.configuration.JanisConfigurationRecipes, dict] = None, notifications: Union[janis_assistant.management.configuration.JanisConfigurationNotifications, dict] = None, environment: Union[janis_assistant.management.configuration.JanisConfigurationEnvironment, dict] = None, run_in_background: bool = None, digest_cache_location: str = None, container: Union[str, janis_assistant.containers.base.Container] = None, search_paths: List[str] = None)[source]¶

Parameters:

engine ("cromwell" | "cwltool") – Default engine to use
template (JanisConfigurationTemplate) – Specify options for a Janis template for configuring an execution environment
cromwell (JanisConfigurationCromwell) – A dictionary for how to configure Cromwell for Janis
recipes (JanisConfigurationRecipes) – Configure recipes in Janis
notifications (JanisConfigurationNotifications) – Configure Janis notifications
environment (JanisConfigurationEnvironment) – Additional ways to configure the execution environment for Janis
output_dir – A directory that Janis will use to generate a new output directory for each janis-run
execution_dir – Move all execution to a static directory outside the regular output directory.
call_caching_enabled – (default: true) call-caching is enabled for subsequent runs, on the SAME output directory
run_in_background (bool) – By default, run workflows as a background process. In a SLURM environment, this might submit Janis as a SLURM job.
digest_cache_location (str) – A cache of docker tags to its digest that Janis uses replaces your docker tag with it’s digest.
container ("docker" | "singularity") – Container technology to use, important for checking if container environment is available and running mysql instance.
search_paths (List[str]) – A list of paths to check when looking for python files and input files

Template¶

Janis templates are a convenient way to handle configuring Janis, Cromwell and CWLTool for special environments. A number of templates are prebuilt into Janis, such as slurm_singularity, slurm_pbs, and number of additional templates for specific HPCs (like Peter Mac, Spartan at UoM) are available, and documented in the:

List of templates page.

You could use a template like the following:

# rest of janis configuration
template:
  id: slurm_singularity
  # arguments for template 'slurm_singularity', like 'container_dir'.
  container_dir: /shared/path/to/containerdir/

class janis_assistant.management.configuration.JanisConfigurationTemplate(type)[source]¶

__init__(id: str = None, **d)[source]¶

Parameters:	id (The identifier of the template) –

Cromwell¶

Sometimes Cromwell can be hard to configure from a simple template, so we’ve exposed some extra common options. Feel free to [raise an issue](https://github.com/PMCC-BioinformaticsCore/janis-assistant/issues/new) if you have questions or ideas.

class janis_assistant.management.configuration.JanisConfigurationCromwell(type)[source]¶

__init__(jar: str = None, config_path: str = None, url: str = None, memory_mb: int = None, call_caching_method: str = 'fingerprint', timeout: int = 10, polling_interval=None, db_type: janis_assistant.data.enums.dbtype.DatabaseTypeToUse = <DatabaseTypeToUse.filebased: 'filebased'>, mysql_credentials: Union[dict, janis_assistant.management.configuration.MySqlInstanceConfig] = None, additional_config_lines: str = None)[source]¶

Parameters:

url (str) – Use an existing Cromwell instance with this URL (with port). Use the BASE url, do NOT include http.
jar – Specific Cromwell JAR to use (prioritised over $JANIS_CROMWELLJAR)
config_path – Use a supplied Config path when running a Cromwell instance. Also see additional_config_lines for including specific cromwell options.
memory_mb – Amount of memory to give Cromwell instance through java -xmx <max-memory>M -jar <jar>
call_caching_method – (Default: “fingerprint”) Cromwell caching strategy to use, see Call cache strategy options for local filesystem for more information.
timeout – Suspend a Janis workflow if unable to contact cromwell for <timeout> MINTUES.
polling_interval – How often to poll Cromwell, by default this starts at 5 seconds, and gradually falls to 60 seconds over 30 minutes. For more information, see the janis_assistant.Cromwell.get_poll_interval method
db_type ("none" | "existing" | "managed" | "filebased" | "from_script") – (Default: filebased) DB type to use for Janis. “none” -> no database; “existing” -> use mysql credentials from cromwell.mysql_credentials; “managed” -> Janis will start and manage a containerised MySQL instance; “filebased”: Use the HSQLDB filebased db through Cromwell for SMALL workflows only (NB: this can produce large files, and timeout for large workflows); “from_script”: Call the script $JANIS_DBCREDENTIALSGENERATOR for credentials. See get_config_from_script for more information.
mysql_credentials (MySqlInstanceConfig) – A dictionary of MySQL credentials
additional_config_lines (str) – A string to add to the bottom of a generated Cromwell configuration. This is NOT used for an existing cromwell instance, or a config is supplied.

class janis_assistant.management.configuration.MySqlInstanceConfig(type)[source]¶

__init__(url, username, password, dbname='cromwell')[source]¶

Configuration options for a MySQL instance

Parameters:	url – URL of the mysql instance (including port if not 3036) username – Username password – Password, not this is embedded into the Cromwell configuration (<output-dir>/janis/configuration/cromwell.conf) dbname – Database name to use, default ‘cromwell’

Existing Cromwell instance¶

In addition to the previous cromwell.url method, you can also manage this through the command line. To configure Janis to submit to an existing Cromwell instance (eg: one already set up for the cloud), the CLI has a mechanism for setting the cromwell_url:

urlwithport="127.0.0.1:8000"
janis run --engine cromwell --cromwell-url $urlwithport hello

OR

`~/janis.conf`

engine: cromwell
cromwell:
  url: 127.0.0.1:8000

Overriding Cromwell JAR¶

In additional to the previous cromwell.jar, you can set the location of the Cromwell JAR through the environment variable JANIS_CROMWELLJAR:

export JANIS_CROMWELLJAR=/path/to/cromwell.jar

Recipes¶

Often between a number of different workflows, you have a set of inputs that you want to apply multiple times. For that, we have recipes.

Note

Recipes only match on the input name. Your input names _MUST_ be consistent through your pipelines for this concept to be useful.

class janis_assistant.management.configuration.JanisConfigurationRecipes(type)[source]¶

__init__(recipes: dict = None, paths: Union[str, List[str]] = None, directories: Union[str, List[str]] = None)[source]¶

Parameters:	recipes (dict) – A dictionary of input values, keyed by the recipe name. paths (List[str]) – a list of `.yaml` files, where each path contains a dictionary of input values, keyed by the recipe name, similar to the previous recipes name. directories* (List[str]) – a directory of `.yaml` files, where the `` is the recipe name.

For example, everytime I run the WGSGermlineGATK pipeline with hg38, I know I want to provide the same reference files. There are a few ways to configure this:

Recipes: A dictionary of input values, keyed by the recipe name.
Paths: a list of *.yaml files, where each path contains a dictionary of input values, keyed by the recipe name, similar to the previous recipes name.
Directories: a directory of *.yaml files, where the * is the recipe name.

The examples below, encode the following information. When we use the hg38 recipe, we want to provide an input value for reference as /path/to/hg38/reference.fasta, and input value for type as hg38. Similar for a second recipe for hg19.

Recipes dictionary¶

You can specify this recipe directly in your janis.conf:

recipes:
  recipes:
    hg38:
      reference: /path/to/hg38/reference.fasta
      type: hg38
    hg19:
      reference: /path/to/hg19/reference.fasta
      type: hg19

Recipes Paths¶

Or you could create a myrecipes.yaml with the contents:

hg38:
  reference: /path/to/hg38/reference.fasta
  type: hg38
hg19:
  reference: /path/to/hg19/reference.fasta
  type: hg19

And then instruct Janis to use this file in two ways:

In your janis.conf with:

recipes:
  paths:
  - /path/to/myrecipes.yaml

OR, you can export the comma-separated environment variable:

export JANIS_RECIPEPATHS="/path/to/myrecipes.yaml,/path/to/myrecipes2.yaml"

Recipe Directories¶

Create two files in a directory”

hg38.yaml:

reference: /path/to/hg38/reference.fasta
type: hg38

hg19.yaml:

reference: /path/to/hg19/reference.fasta
type: hg19

And similar to the paths, you can specify this directory in two ways:

In your janis.conf with:

recipes:
  directories:
  # /path/to/recipes has two files, hg38.yaml | hg19.yaml
  - /path/to/recipes/

OR, you can export the comma-separated environment variable:

export JANIS_RECIPEPATHS="/path/to/recipes/,/path/to/recipes2/"

Notifications¶

class janis_assistant.management.configuration.JanisConfigurationNotifications(type)[source]¶

__init__(email: str = None, from_email: str = 'janis-noreply@petermac.org', mail_program: str = None)[source]¶

Parameters:	email – Email address to send status updates to from_email – (Default: janis-noreply@petermac.org) mail_program – Which mail program to use to send emails. A fully formatted email will be directed as stdin (eg: sendmail -t)

Environment¶

class janis_assistant.management.configuration.JanisConfigurationEnvironment(type)[source]¶

__init__(max_cores: int = None, max_memory: int = None, max_duration: int = None)[source]¶

Additional settings to configure a Janis environment. Currently, it mostly involves restricing resources (like cores, memory, duration) to fit within specific compute requirements. Notable, these values limit the requested values if they’re a number. It doesn’t currently limit this value if it’s determined via an operator.

Parameters:	max_cores (int) – Limit the number of CPUs a job can request max_memory (int) – Limit the amount of memory (in GB) a job can request max_duration (int) – (Default: 86400) Limit the amount of time (in seconds) a job can request.