Configuring Janis Assistant

Warning

Configuring Janis had backwards incompatible changes in v0.12.0 with regards to specific keys.

When we talk about configuring Janis, we’re really talking about how to configure the Janis assistant and hence Cromwell / CWLTool to interact with batch systems (eg: Slurm / PBS Torque) and container environments (Docker / Singularity).

Janis has built in templates for the following compute environments:

  • Local (Docker or Singularity)
    • The only environment compatible with CWLTool.
  • Slurm (Singularity only)
  • PBS / Torque (Singularity only)

In an extension of Janis (janis-templates), we’ve produced a number of location specific templates, usually with sensible defaults.

See this list for a full list of templates: https://janis.readthedocs.io/en/latest/templates/index.html.

Syntax

The config should be in YAML, and can contain nested dictionaries.

Note

In this guide we might use dots (.) to refer to a nested key. For example, notifications.email refers to the following structure:

notifications:
  email: <value>

Location of config

By default,

  • Your configuration directory is placed at ~/.janis/.
  • Your configuration path is a file called janis.conf in this directory, eg: ~/.janis/janis.conf

Both janis run / translate allow you to provide a location to a config using the -c / --config parameter.

In addition, you can also configure the following environment variables:

  • JANIS_CONFIGPATH - simply the path to your config file
  • JANIS_CONFIGDIR - The configuration path is determined by $JANIS_CONFIGDIR/janis.conf

Initialising a template

`bash janis init <template> [...options] `

By default this will place our config at ~/.janis/janis.conf. If there’s already a config there, it will NOT override it.

  • -o / --output: output path to write to, default (~/.janis/janis.conf).
  • -f / --force: Overwrite the config if one exists at the output path.
  • --stdout: Write the output to stdout.

Environment variables

The following environment variables allow you to customise the behaviour of Janis, without writing a configuration file. This is a convenient way to automatically configure Janis through an HPCs module system.

All the environment variables start with JANIS_, and is the value of enums declared below (and not the key name), for example:

export JANIS_CROMWELLJAR='/path/to/cromwell.jar'
class janis_assistant.management.envvariables.EnvVariables[source]

An enumeration.

config_dir = 'JANIS_CONFIGDIR'

(Default: ~/.janis) Directory of default Janis settings

config_path = 'JANIS_CONFIGPATH'

(Default: $JANIS_CONFIGDIR/janis.conf) Default configuration file for Janis

cromwelljar = 'JANIS_CROMWELLJAR'

Override the Cromwell JAR that Janis uses

default_template = 'JANIS_DEFAULTTEMPLATE'

Default template to use, NB this template should have NO required arguments.

exec_dir = 'JANIS_EXCECUTIONDIR'

Use this directory for intermediate files

output_dir = 'JANIS_OUTPUTDIR'

Use this directory as a BASE to generate a new output directory for each Janis run

recipe_directory = 'JANIS_RECIPEDIRECTORY'

Directories for which each file (ending in .yaml | .yml) is a key of input values. See the RECIPES section for more information.

recipe_paths = 'JANIS_RECIPEPATHS'

List of YAML recipe files (comma separated) for Janis to consume, See the RECIPES section for more information.

search_path = 'JANIS_SEARCHPATH'

Additional search paths (comma separated) to lookup Janis workflows in

Configuration keys

We use a class definition to automatically document which keys you can provide to build a Janis configuration. These keys exactly match the keys you should provide in your YAML dictionary. The type is either a Python literal, or a dictionary.

For example, the class definition below corresponds to the following (partial) YAML configuration:

# EXAMPLE CONFIGURATION ONLY
engine: cromwell
run_in_background: false
call_caching_enabled: true
cromwell:
    jar: /Users/franklinmichael/broad/cromwell-53.1.jar
    call_caching_method: fingerprint
class janis_assistant.management.configuration.JanisConfiguration(type)[source]
__init__(output_dir: str = None, execution_dir: str = None, call_caching_enabled: bool = True, engine: str = 'cromwell', cromwell: Union[janis_assistant.management.configuration.JanisConfigurationCromwell, dict] = None, template: Union[janis_assistant.management.configuration.JanisConfigurationTemplate, dict] = None, recipes: Union[janis_assistant.management.configuration.JanisConfigurationRecipes, dict] = None, notifications: Union[janis_assistant.management.configuration.JanisConfigurationNotifications, dict] = None, environment: Union[janis_assistant.management.configuration.JanisConfigurationEnvironment, dict] = None, run_in_background: bool = None, digest_cache_location: str = None, container: Union[str, janis_assistant.containers.base.Container] = None, search_paths: List[str] = None)[source]
Parameters:
  • engine ("cromwell" | "cwltool") – Default engine to use
  • template (JanisConfigurationTemplate) – Specify options for a Janis template for configuring an execution environment
  • cromwell (JanisConfigurationCromwell) – A dictionary for how to configure Cromwell for Janis
  • recipes (JanisConfigurationRecipes) – Configure recipes in Janis
  • notifications (JanisConfigurationNotifications) – Configure Janis notifications
  • environment (JanisConfigurationEnvironment) – Additional ways to configure the execution environment for Janis
  • output_dir – A directory that Janis will use to generate a new output directory for each janis-run
  • execution_dir – Move all execution to a static directory outside the regular output directory.
  • call_caching_enabled – (default: true) call-caching is enabled for subsequent runs, on the SAME output directory
  • run_in_background (bool) – By default, run workflows as a background process. In a SLURM environment, this might submit Janis as a SLURM job.
  • digest_cache_location (str) – A cache of docker tags to its digest that Janis uses replaces your docker tag with it’s digest.
  • container ("docker" | "singularity") – Container technology to use, important for checking if container environment is available and running mysql instance.
  • search_paths (List[str]) – A list of paths to check when looking for python files and input files

Template

Janis templates are a convenient way to handle configuring Janis, Cromwell and CWLTool for special environments. A number of templates are prebuilt into Janis, such as slurm_singularity, slurm_pbs, and number of additional templates for specific HPCs (like Peter Mac, Spartan at UoM) are available, and documented in the:

You could use a template like the following:

# rest of janis configuration
template:
  id: slurm_singularity
  # arguments for template 'slurm_singularity', like 'container_dir'.
  container_dir: /shared/path/to/containerdir/
class janis_assistant.management.configuration.JanisConfigurationTemplate(type)[source]
__init__(id: str = None, **d)[source]
Parameters:id (The identifier of the template) –

Cromwell

Sometimes Cromwell can be hard to configure from a simple template, so we’ve exposed some extra common options. Feel free to [raise an issue](https://github.com/PMCC-BioinformaticsCore/janis-assistant/issues/new) if you have questions or ideas.

class janis_assistant.management.configuration.JanisConfigurationCromwell(type)[source]
__init__(jar: str = None, config_path: str = None, url: str = None, memory_mb: int = None, call_caching_method: str = 'fingerprint', timeout: int = 10, polling_interval=None, db_type: janis_assistant.data.enums.dbtype.DatabaseTypeToUse = <DatabaseTypeToUse.filebased: 'filebased'>, mysql_credentials: Union[dict, janis_assistant.management.configuration.MySqlInstanceConfig] = None, additional_config_lines: str = None)[source]
Parameters:
  • url (str) – Use an existing Cromwell instance with this URL (with port). Use the BASE url, do NOT include http.
  • jar – Specific Cromwell JAR to use (prioritised over $JANIS_CROMWELLJAR)
  • config_path – Use a supplied Config path when running a Cromwell instance. Also see additional_config_lines for including specific cromwell options.
  • memory_mb – Amount of memory to give Cromwell instance through java -xmx <max-memory>M -jar <jar>
  • call_caching_method – (Default: “fingerprint”) Cromwell caching strategy to use, see Call cache strategy options for local filesystem for more information.
  • timeout – Suspend a Janis workflow if unable to contact cromwell for <timeout> MINTUES.
  • polling_interval – How often to poll Cromwell, by default this starts at 5 seconds, and gradually falls to 60 seconds over 30 minutes. For more information, see the janis_assistant.Cromwell.get_poll_interval method
  • db_type ("none" | "existing" | "managed" | "filebased" | "from_script") – (Default: filebased) DB type to use for Janis. “none” -> no database; “existing” -> use mysql credentials from cromwell.mysql_credentials; “managed” -> Janis will start and manage a containerised MySQL instance; “filebased”: Use the HSQLDB filebased db through Cromwell for SMALL workflows only (NB: this can produce large files, and timeout for large workflows); “from_script”: Call the script $JANIS_DBCREDENTIALSGENERATOR for credentials. See get_config_from_script for more information.
  • mysql_credentials (MySqlInstanceConfig) – A dictionary of MySQL credentials
  • additional_config_lines (str) – A string to add to the bottom of a generated Cromwell configuration. This is NOT used for an existing cromwell instance, or a config is supplied.
class janis_assistant.management.configuration.MySqlInstanceConfig(type)[source]
__init__(url, username, password, dbname='cromwell')[source]

Configuration options for a MySQL instance

Parameters:
  • url – URL of the mysql instance (including port if not 3036)
  • username – Username
  • password – Password, not this is embedded into the Cromwell configuration (<output-dir>/janis/configuration/cromwell.conf)
  • dbname – Database name to use, default ‘cromwell’

Existing Cromwell instance

In addition to the previous cromwell.url method, you can also manage this through the command line. To configure Janis to submit to an existing Cromwell instance (eg: one already set up for the cloud), the CLI has a mechanism for setting the cromwell_url:

urlwithport="127.0.0.1:8000"
janis run --engine cromwell --cromwell-url $urlwithport hello

OR

`~/janis.conf`

engine: cromwell
cromwell:
  url: 127.0.0.1:8000

Overriding Cromwell JAR

In additional to the previous cromwell.jar, you can set the location of the Cromwell JAR through the environment variable JANIS_CROMWELLJAR:

export JANIS_CROMWELLJAR=/path/to/cromwell.jar

Recipes

Often between a number of different workflows, you have a set of inputs that you want to apply multiple times. For that, we have recipes.

Note

Recipes only match on the input name. Your input names _MUST_ be consistent through your pipelines for this concept to be useful.

class janis_assistant.management.configuration.JanisConfigurationRecipes(type)[source]
__init__(recipes: dict = None, paths: Union[str, List[str]] = None, directories: Union[str, List[str]] = None)[source]
Parameters:
  • recipes (dict) – A dictionary of input values, keyed by the recipe name.
  • paths (List[str]) – a list of *.yaml files, where each path contains a dictionary of input values, keyed by the recipe name, similar to the previous recipes name.
  • directories (List[str]) – a directory of *.yaml files, where the * is the recipe name.

For example, everytime I run the WGSGermlineGATK pipeline with hg38, I know I want to provide the same reference files. There are a few ways to configure this:

  • Recipes: A dictionary of input values, keyed by the recipe name.
  • Paths: a list of *.yaml files, where each path contains a dictionary of input values, keyed by the recipe name, similar to the previous recipes name.
  • Directories: a directory of *.yaml files, where the * is the recipe name.

The examples below, encode the following information. When we use the hg38 recipe, we want to provide an input value for reference as /path/to/hg38/reference.fasta, and input value for type as hg38. Similar for a second recipe for hg19.

Recipes dictionary

You can specify this recipe directly in your janis.conf:

recipes:
  recipes:
    hg38:
      reference: /path/to/hg38/reference.fasta
      type: hg38
    hg19:
      reference: /path/to/hg19/reference.fasta
      type: hg19

Recipes Paths

Or you could create a myrecipes.yaml with the contents:

hg38:
  reference: /path/to/hg38/reference.fasta
  type: hg38
hg19:
  reference: /path/to/hg19/reference.fasta
  type: hg19

And then instruct Janis to use this file in two ways:

  1. In your janis.conf with:
recipes:
  paths:
  - /path/to/myrecipes.yaml
  1. OR, you can export the comma-separated environment variable:
export JANIS_RECIPEPATHS="/path/to/myrecipes.yaml,/path/to/myrecipes2.yaml"

Recipe Directories

Create two files in a directory”

  1. hg38.yaml:

    reference: /path/to/hg38/reference.fasta
    type: hg38
    
  2. hg19.yaml:

    reference: /path/to/hg19/reference.fasta
    type: hg19
    

And similar to the paths, you can specify this directory in two ways:

  1. In your janis.conf with:
recipes:
  directories:
  # /path/to/recipes has two files, hg38.yaml | hg19.yaml
  - /path/to/recipes/
  1. OR, you can export the comma-separated environment variable:
export JANIS_RECIPEPATHS="/path/to/recipes/,/path/to/recipes2/"

Notifications

class janis_assistant.management.configuration.JanisConfigurationNotifications(type)[source]
__init__(email: str = None, from_email: str = 'janis-noreply@petermac.org', mail_program: str = None)[source]
Parameters:
  • email – Email address to send status updates to
  • from_email – (Default: janis-noreply@petermac.org)
  • mail_program – Which mail program to use to send emails. A fully formatted email will be directed as stdin (eg: sendmail -t)

Environment

class janis_assistant.management.configuration.JanisConfigurationEnvironment(type)[source]
__init__(max_cores: int = None, max_memory: int = None, max_duration: int = None)[source]

Additional settings to configure a Janis environment. Currently, it mostly involves restricing resources (like cores, memory, duration) to fit within specific compute requirements. Notable, these values limit the requested values if they’re a number. It doesn’t currently limit this value if it’s determined via an operator.

Parameters:
  • max_cores (int) – Limit the number of CPUs a job can request
  • max_memory (int) – Limit the amount of memory (in GB) a job can request
  • max_duration (int) – (Default: 86400) Limit the amount of time (in seconds) a job can request.