Workflow¶
Manages the connections between tools
Declaration¶
There are two major ways to construct a workflow:
- Inline using the
janis.WorkflowBuilder
class, - or Inheriting from the
janis.Workflow
class and implementing the required methods.
-
class
janis.
WorkflowBuilder
(identifier: str = None, friendly_name: str = None, version: str = None, metadata: janis_core.utils.metadata.WorkflowMetadata = None, tool_provider: str = None, tool_module: str = None, doc: str = None)[source]¶
Advanced Workflows¶
Janis allows you to dynamically create workflows based on inputs. More information can be found on the Dynamic Workflows page.
Overview¶
The janis.Workflow
and janis.WorkflowBuilder
classes exposes inputs, and manages the connections between these inputs, tools and exposes some outputs.
A janis.WorkflowBuilder
is the class used inline to declare workflows. The janis.Workflow
class should only be inherited through subclasses.
A workflow does not directly execute, but declares what inputs a janis.CommandTool should receive.
A representation of a workflow can be exported to cwl
or wdl
through the :method:`janis.Workflow.translate()` function.
Translating¶
Currently Janis supports two translation targets:
-
Workflow.
translate
(translation: Union[str, janis_core.translationdeps.supportedtranslations.SupportedTranslation], to_console=True, tool_to_console=False, to_disk=False, write_inputs_file=True, with_docker=True, with_hints=False, with_resource_overrides=False, validate=False, should_zip=True, export_path='./{name}', merge_resources=False, hints=None, allow_null_if_not_optional=True, additional_inputs: Dict[KT, VT] = None, max_cores=None, max_mem=None, max_duration=None, allow_empty_container=False, container_override: dict = None)¶
Structure of a workflow¶
A workflow has the following _nodes_:
- Inputs -
janis.Workflow.input()
- Steps -
janis.Workflow.step()
- Outputs -
janis.Workflow.output()
Once an node has been added to the workflow, it may be referenced through dot-notation on the workflow. For this reason, identifiers have certain naming restrictions. In the following examples we’re going to create an inline workflow using the WorkflowBuilder
class.
Creating an input¶
An input requires a unique identifier (string) and a janis.DataType
.
-
Workflow.
input
(identifier: str, datatype: Union[Type[Union[str, float, int, bool]], janis_core.types.data_types.DataType, Type[janis_core.types.data_types.DataType]], default: any = None, value: any = None, doc: Union[str, janis_core.tool.documentation.InputDocumentation, Dict[str, any]] = None)¶ Create an input node on a workflow :return:
The input node is returned from this function, and is also available as a property on a workflow (accessible through dot-notation OR index notation).
import janis as j
w = j.WorkflowBuilder("myworkflow")
myInput = w.input("myInput", String)
myInput == w.myInput == w["myInput"] # True
Note
Default vs Value: The input
Creating a step¶
A step requires a unique identifier (string), a mapped tool (either a janis.CommandTool
or janis.Workflow
called with it’s inputs), scattering information (if required).
-
Workflow.
step
(identifier: str, tool: janis_core.tool.tool.Tool, scatter: Union[str, List[str], janis_core.utils.scatter.ScatterDescription] = None, _foreach: Union[janis_core.operators.selectors.Selector, List[janis_core.operators.selectors.Selector]] = None, when: Optional[janis_core.operators.operator.Operator] = None, ignore_missing=False, doc: str = None)¶ Construct a step on this workflow.
Parameters: - identifier – The identifier of the step, unique within the workflow.
- tool – The tool that should run for this step.
- scatter (Union[str, ScatterDescription]) – Indicate whether a scatter should occur, on what, and how.
- when (Optional[Operator]) – An operator / condition that determines whether the step should run
- ignore_missing – Don’t throw an error if required params are missing from this function
- _foreach – NB: this is unimplemented. Iterate for each value of this resolves list, where you should use the “ForEachSelector” to select each value in this iterable.
Returns:
Janis will throw an error if all the required inputs are not provided. You can provide the parameter ignore_missing=True
to the step function to skip this check.
from janis.unix.tools.echo import Echo
# Echo has the required input: "inp": String
# https://janis.readthedocs.io/en/latest/tools/unix/echo.html
echoStep = w.step("echoStep", Echo(inp=w.myInput))
echoStep == w.echoStep == w["echoStep"] # True
Creating an output¶
An output requires a unique identifier (string), an output source and an optional janis.DataType
. If a data
type is provided, it is type-checked against the output source. Don’t be put off by the automatically generated
interface for the output method, it’s there to be exhaustive for the type definitions.
Here is the (simplified) method definition:
def output(
self,
identifier: str,
datatype: Optional[ParseableType] = None,
source: Union[Selector, ConnectionSource]=None # or List[Selector, ConnectionSource]
output_folder: Union[str, Selector, List[Union[str, Selector]]] = None,
output_name: Union[bool, str, Selector, ConnectionSource] = True, # let janis decide output name
extension: Optional[str] = None, # file extension if janis names file
doc: Union[str, OutputDocumentation] = None,
):
-
Workflow.
output
(identifier: str, datatype: Union[Type[Union[str, float, int, bool]], janis_core.types.data_types.DataType, Type[janis_core.types.data_types.DataType], None] = None, source: Union[List[Union[janis_core.operators.selectors.Selector, janis_core.graph.node.Node, janis_core.operators.selectors.StepOutputSelector, Tuple[janis_core.graph.node.Node, str]]], janis_core.operators.selectors.Selector, janis_core.graph.node.Node, janis_core.operators.selectors.StepOutputSelector, Tuple[janis_core.graph.node.Node, str]] = None, output_folder: Union[str, janis_core.operators.selectors.Selector, List[Union[janis_core.operators.selectors.Selector, str]]] = None, output_name: Union[bool, str, janis_core.operators.selectors.Selector, janis_core.graph.node.Node, janis_core.operators.selectors.StepOutputSelector, Tuple[janis_core.graph.node.Node, str]] = True, extension: Optional[str] = None, doc: Union[str, janis_core.tool.documentation.OutputDocumentation] = None)¶ Create an output on a workflow
Parameters: - identifier – The identifier for the output
- datatype – Optional data type of the output to check. This will be automatically inferred if not provided.
- source – The source of the output, must be an output to a step node
- output_folder –
Decides the output folder(s) where the output will reside. If a list is passed, it represents a structure of nested directories, the first element being the root directory.
- None (default): the assistant will copy to the root of the output directory
- Type[Selector]: will be resolved before the workflow is run, this means it may only depend on the inputs
NB: If the output_source is an array, a “shard_n” will be appended to the output_name UNLESS the output_source also resolves to an array, which the assistant can unwrap multiple dimensions of arrays ONLY if the number of elements in the output_scattered source and the number of resolved elements is equal.
- output_name –
- Decides the name of the output (without extension) that an output will have:
- True (default): the assistant will choose an output name based on output identifier (tag),
- None / False: the assistant will use the original filename (this might cause filename conflicts)
- Type[Selector]: will be resolved before the workflow is run, this means it may only depend on the inputs
- NB: If the output_source is an array, a “shard_n” will be appended to the output_name UNLESS the output_source
- also resolves to an array, which the assistant can unwrap multiple dimensions of arrays.
- extension – The extension to use if janis renames the output. By default, it will pull the extension from the inherited data type (eg: CSV -> “.csv”), or it will attempt to pull the extension from the file.
Returns: janis.WorkflowOutputNode
You are unable to connect an input node directly to an output node, and an output node cannot be referenced as a step input.
# w.echoStep added to workflow
w.output("out", source=w.echoStep)
Subclassing Workflow¶
Instead of creating inline workflows, it’s possible to subclass janis.Workflow
, implement the required methods which allows a tool to have documentation automatically generated.
Required methods:
-
Workflow.
id
() → str¶
-
Workflow.
friendly_name
()¶ Overriding this method is not required UNLESS you distribute your tool. Generating the docs will fail if your tool does not provide a name.
Returns: A friendly name of your tool
-
Workflow.
constructor
()[source]¶ A place to construct your workflows. This is called directly after initialisation. :return:
Within the constructor
method, you have access to self
to add inputs, steps and outputs.
OPTIONAL:¶
-
Workflow.
bind_metadata
()¶ A convenient place to add metadata about the tool. You are guaranteed that self.metadata will exist. It’s possible to return a new instance of the ToolMetadata / WorkflowMetadata which will be rebound. This is usually called after the initialiser, though it may be called multiple times. :return:
Examples¶
Inline example¶
import janis as j
from janis.unix.tools.echo import Echo
w = j.WorkflowBuilder("my_workflow")
w.input("my_input", String)
echoStep = w.step("echo_step", Echo(inp=w.my_input))
w.output("out", source=w.echo_step)
# Will print the CWL, input file and relevant tools to the console
w.translate("cwl", to_disk=False) # or "wdl"
Subclass example¶
import janis as j
from janis.unix.tools.echo import Echo
class MyWorkflow(j.Workflow):
def id(self):
return "my_workflow"
def friendly_name(self):
return "My workflow"
def constructor(self):
self.input("my_input", String)
echoStep = w.step("echo_step", Echo(inp=self.my_input))
self.output("out", source=self.echo_step)
# optional
def metadata(self):
self.metadata.author = "Michael Franklin"
self.metadata.version = "v1.0.0"
self.metadata.documentation = "A tool that echos the input to standard_out