Configuring Janis Assistant¶
Warning
Configuring Janis had backwards incompatible changes in v0.12.0 with regards to specific keys.
When we talk about configuring Janis, we’re really talking about how to configure the Janis assistant and hence Cromwell / CWLTool to interact with batch systems (eg: Slurm / PBS Torque) and container environments (Docker / Singularity).
Janis has built in templates for the following compute environments:
- Local (Docker or Singularity)
- The only environment compatible with CWLTool.
- Slurm (Singularity only)
- PBS / Torque (Singularity only)
In an extension of Janis (janis-templates), we’ve produced a number of location specific templates, usually with sensible defaults.
See this list for a full list of templates: https://janis.readthedocs.io/en/latest/templates/index.html.
Syntax¶
The config should be in YAML, and can contain nested dictionaries.
Note
In this guide we might use dots (.
) to refer to a nested key. For example, notifications.email
refers to the following structure:
notifications:
email: <value>
Location of config¶
By default,
- Your configuration directory is placed at
~/.janis/
. - Your configuration path is a file called
janis.conf
in this directory, eg:~/.janis/janis.conf
Both janis run / translate
allow you to provide a location to a config using the -c / --config
parameter.
In addition, you can also configure the following environment variables:
JANIS_CONFIGPATH
- simply the path to your config fileJANIS_CONFIGDIR
- The configuration path is determined by $JANIS_CONFIGDIR/janis.conf
Initialising a template¶
`bash
janis init <template> [...options]
`
By default this will place our config at ~/.janis/janis.conf. If there’s already a config there, it will NOT override it.
-o
/--output
: output path to write to, default (~/.janis/janis.conf).-f
/--force
: Overwrite the config if one exists at the output path.--stdout
: Write the output to stdout.
Environment variables¶
The following environment variables allow you to customise the behaviour of Janis, without writing a configuration file. This is a convenient way to automatically configure Janis through an HPCs module system.
All the environment variables start with JANIS_
, and is the value of enums declared below (and not the key name), for example:
export JANIS_CROMWELLJAR='/path/to/cromwell.jar'
-
class
janis_assistant.management.envvariables.
EnvVariables
[source]¶ An enumeration.
-
config_dir
= 'JANIS_CONFIGDIR'¶ (Default: ~/.janis) Directory of default Janis settings
-
config_path
= 'JANIS_CONFIGPATH'¶ (Default:
$JANIS_CONFIGDIR/janis.conf
) Default configuration file for Janis
-
cromwelljar
= 'JANIS_CROMWELLJAR'¶ Override the Cromwell JAR that Janis uses
-
default_template
= 'JANIS_DEFAULTTEMPLATE'¶ Default template to use, NB this template should have NO required arguments.
-
exec_dir
= 'JANIS_EXCECUTIONDIR'¶ Use this directory for intermediate files
-
output_dir
= 'JANIS_OUTPUTDIR'¶ Use this directory as a BASE to generate a new output directory for each Janis run
-
recipe_directory
= 'JANIS_RECIPEDIRECTORY'¶ Directories for which each file (ending in .yaml | .yml) is a key of input values. See the RECIPES section for more information.
-
recipe_paths
= 'JANIS_RECIPEPATHS'¶ List of YAML recipe files (comma separated) for Janis to consume, See the RECIPES section for more information.
-
search_path
= 'JANIS_SEARCHPATH'¶ Additional search paths (comma separated) to lookup Janis workflows in
-
Configuration keys¶
We use a class definition to automatically document which keys you can provide to build a Janis configuration. These keys exactly match the keys you should provide in your YAML dictionary. The type is either a Python literal, or a dictionary.
For example, the class definition below corresponds to the following (partial) YAML configuration:
# EXAMPLE CONFIGURATION ONLY
engine: cromwell
run_in_background: false
call_caching_enabled: true
cromwell:
jar: /Users/franklinmichael/broad/cromwell-53.1.jar
call_caching_method: fingerprint
-
class
janis_assistant.management.configuration.
JanisConfiguration
(type)[source]¶ -
__init__
(output_dir: str = None, execution_dir: str = None, call_caching_enabled: bool = True, engine: str = 'cromwell', cromwell: Union[janis_assistant.management.configuration.JanisConfigurationCromwell, dict] = None, template: Union[janis_assistant.management.configuration.JanisConfigurationTemplate, dict] = None, recipes: Union[janis_assistant.management.configuration.JanisConfigurationRecipes, dict] = None, notifications: Union[janis_assistant.management.configuration.JanisConfigurationNotifications, dict] = None, environment: Union[janis_assistant.management.configuration.JanisConfigurationEnvironment, dict] = None, run_in_background: bool = None, digest_cache_location: str = None, container: Union[str, janis_assistant.containers.base.Container] = None, search_paths: List[str] = None)[source]¶ Parameters: - engine ("cromwell" | "cwltool") – Default engine to use
- template (JanisConfigurationTemplate) – Specify options for a Janis template for configuring an execution environment
- cromwell (JanisConfigurationCromwell) – A dictionary for how to configure Cromwell for Janis
- recipes (JanisConfigurationRecipes) – Configure recipes in Janis
- notifications (JanisConfigurationNotifications) – Configure Janis notifications
- environment (JanisConfigurationEnvironment) – Additional ways to configure the execution environment for Janis
- output_dir – A directory that Janis will use to generate a new output directory for each janis-run
- execution_dir – Move all execution to a static directory outside the regular output directory.
- call_caching_enabled – (default: true) call-caching is enabled for subsequent runs, on the SAME output directory
- run_in_background (bool) – By default, run workflows as a background process. In a SLURM environment, this might submit Janis as a SLURM job.
- digest_cache_location (str) – A cache of docker tags to its digest that Janis uses replaces your docker tag with it’s digest.
- container ("docker" | "singularity") – Container technology to use, important for checking if container environment is available and running mysql instance.
- search_paths (List[str]) – A list of paths to check when looking for python files and input files
-
Template¶
Janis templates are a convenient way to handle configuring Janis, Cromwell and CWLTool for special environments. A number of templates are prebuilt into Janis, such as slurm_singularity
, slurm_pbs
, and number of additional templates for specific HPCs (like Peter Mac, Spartan at UoM) are available, and documented in the:
- List of templates page.
You could use a template like the following:
# rest of janis configuration
template:
id: slurm_singularity
# arguments for template 'slurm_singularity', like 'container_dir'.
container_dir: /shared/path/to/containerdir/
Cromwell¶
Sometimes Cromwell can be hard to configure from a simple template, so we’ve exposed some extra common options. Feel free to [raise an issue](https://github.com/PMCC-BioinformaticsCore/janis-assistant/issues/new) if you have questions or ideas.
-
class
janis_assistant.management.configuration.
JanisConfigurationCromwell
(type)[source]¶ -
__init__
(jar: str = None, config_path: str = None, url: str = None, memory_mb: int = None, call_caching_method: str = 'fingerprint', timeout: int = 10, polling_interval=None, db_type: janis_assistant.data.enums.dbtype.DatabaseTypeToUse = <DatabaseTypeToUse.filebased: 'filebased'>, mysql_credentials: Union[dict, janis_assistant.management.configuration.MySqlInstanceConfig] = None, additional_config_lines: str = None)[source]¶ Parameters: - url (str) – Use an existing Cromwell instance with this URL (with port). Use the BASE url, do NOT include http.
- jar – Specific Cromwell JAR to use (prioritised over
$JANIS_CROMWELLJAR
) - config_path – Use a supplied Config path when running a Cromwell instance. Also see
additional_config_lines
for including specific cromwell options. - memory_mb – Amount of memory to give Cromwell instance through
java -xmx <max-memory>M -jar <jar>
- call_caching_method – (Default: “fingerprint”) Cromwell caching strategy to use, see Call cache strategy options for local filesystem for more information.
- timeout – Suspend a Janis workflow if unable to contact cromwell for <timeout> MINTUES.
- polling_interval – How often to poll Cromwell, by default this starts at 5 seconds, and gradually falls to 60 seconds over 30 minutes. For more information, see the
janis_assistant.Cromwell.get_poll_interval
method - db_type ("none" | "existing" | "managed" | "filebased" | "from_script") – (Default: filebased) DB type to use for Janis. “none” -> no database; “existing” -> use mysql credentials from
cromwell.mysql_credentials
; “managed” -> Janis will start and manage a containerised MySQL instance; “filebased”: Use the HSQLDB filebased db through Cromwell for SMALL workflows only (NB: this can produce large files, and timeout for large workflows); “from_script”: Call the script$JANIS_DBCREDENTIALSGENERATOR
for credentials. See get_config_from_script for more information. - mysql_credentials (MySqlInstanceConfig) – A dictionary of MySQL credentials
- additional_config_lines (str) – A string to add to the bottom of a generated Cromwell configuration. This is NOT used for an existing cromwell instance, or a config is supplied.
-
-
class
janis_assistant.management.configuration.
MySqlInstanceConfig
(type)[source]¶ -
__init__
(url, username, password, dbname='cromwell')[source]¶ Configuration options for a MySQL instance
Parameters: - url – URL of the mysql instance (including port if not 3036)
- username – Username
- password – Password, not this is embedded into the Cromwell configuration (<output-dir>/janis/configuration/cromwell.conf)
- dbname – Database name to use, default ‘cromwell’
-
Existing Cromwell instance¶
In addition to the previous cromwell.url
method, you can also manage this through the command line.
To configure Janis to submit to an existing Cromwell instance (eg: one already set up for the cloud), the CLI has a mechanism for setting the cromwell_url
:
urlwithport="127.0.0.1:8000"
janis run --engine cromwell --cromwell-url $urlwithport hello
OR
`~/janis.conf`
engine: cromwell
cromwell:
url: 127.0.0.1:8000
Overriding Cromwell JAR¶
In additional to the previous cromwell.jar
, you can set the location of the Cromwell JAR through the environment variable JANIS_CROMWELLJAR
:
export JANIS_CROMWELLJAR=/path/to/cromwell.jar
Recipes¶
Often between a number of different workflows, you have a set of inputs that you want to apply multiple times. For that, we have recipes.
Note
Recipes only match on the input name. Your input names _MUST_ be consistent through your pipelines for this concept to be useful.
-
class
janis_assistant.management.configuration.
JanisConfigurationRecipes
(type)[source]¶ -
__init__
(recipes: dict = None, paths: Union[str, List[str]] = None, directories: Union[str, List[str]] = None)[source]¶ Parameters: - recipes (dict) – A dictionary of input values, keyed by the recipe name.
- paths (List[str]) – a list of
*.yaml
files, where each path contains a dictionary of input values, keyed by the recipe name, similar to the previous recipes name. - directories (List[str]) – a directory of
*.yaml
files, where the*
is the recipe name.
-
For example, everytime I run the WGSGermlineGATK pipeline with hg38
, I know I want to provide the same reference files. There are a few ways to configure this:
- Recipes: A dictionary of input values, keyed by the recipe name.
- Paths: a list of
*.yaml
files, where each path contains a dictionary of input values, keyed by the recipe name, similar to the previous recipes name. - Directories: a directory of
*.yaml
files, where the*
is the recipe name.
The examples below, encode the following information. When we use the hg38
recipe, we want to provide an input value for reference as /path/to/hg38/reference.fasta
, and input value for type
as hg38
. Similar for a second recipe for hg19
.
Recipes dictionary¶
You can specify this recipe directly in your janis.conf
:
recipes:
recipes:
hg38:
reference: /path/to/hg38/reference.fasta
type: hg38
hg19:
reference: /path/to/hg19/reference.fasta
type: hg19
Recipes Paths¶
Or you could create a myrecipes.yaml
with the contents:
hg38:
reference: /path/to/hg38/reference.fasta
type: hg38
hg19:
reference: /path/to/hg19/reference.fasta
type: hg19
And then instruct Janis to use this file in two ways:
- In your
janis.conf
with:
recipes: paths: - /path/to/myrecipes.yaml
- OR, you can export the comma-separated environment variable:
export JANIS_RECIPEPATHS="/path/to/myrecipes.yaml,/path/to/myrecipes2.yaml"
Recipe Directories¶
Create two files in a directory”
hg38.yaml
:reference: /path/to/hg38/reference.fasta type: hg38
hg19.yaml
:reference: /path/to/hg19/reference.fasta type: hg19
And similar to the paths, you can specify this directory in two ways:
- In your
janis.conf
with:
recipes: directories: # /path/to/recipes has two files, hg38.yaml | hg19.yaml - /path/to/recipes/
- OR, you can export the comma-separated environment variable:
export JANIS_RECIPEPATHS="/path/to/recipes/,/path/to/recipes2/"
Notifications¶
-
class
janis_assistant.management.configuration.
JanisConfigurationNotifications
(type)[source]¶ -
__init__
(email: str = None, from_email: str = 'janis-noreply@petermac.org', mail_program: str = None)[source]¶ Parameters: - email – Email address to send status updates to
- from_email – (Default: janis-noreply@petermac.org)
- mail_program – Which mail program to use to send emails. A fully formatted email will be directed as stdin (eg: sendmail -t)
-
Environment¶
-
class
janis_assistant.management.configuration.
JanisConfigurationEnvironment
(type)[source]¶ -
__init__
(max_cores: int = None, max_memory: int = None, max_duration: int = None)[source]¶ Additional settings to configure a Janis environment. Currently, it mostly involves restricing resources (like cores, memory, duration) to fit within specific compute requirements. Notable, these values limit the requested values if they’re a number. It doesn’t currently limit this value if it’s determined via an operator.
Parameters:
-