Collecting tool outputs¶
This guide will roughly walk you through the different ways of collecting outputs from a CommandTool. The bread and butter of this tutorial is the ToolOutput, and how you use various Selectors / Operators.
Examples:
-
class
janis.
ToolOutput
(tag: str, output_type: Union[Type[Union[str, float, int, bool]], janis_core.types.data_types.DataType, Type[janis_core.types.data_types.DataType]], selector: Union[janis_core.operators.selectors.Selector, str, None] = None, presents_as: str = None, secondaries_present_as: Dict[str, str] = None, doc: Union[str, janis_core.tool.documentation.OutputDocumentation, None] = None, glob: Union[janis_core.operators.selectors.Selector, str, None] = None, _skip_output_quality_check=False)[source]¶ -
__init__
(tag: str, output_type: Union[Type[Union[str, float, int, bool]], janis_core.types.data_types.DataType, Type[janis_core.types.data_types.DataType]], selector: Union[janis_core.operators.selectors.Selector, str, None] = None, presents_as: str = None, secondaries_present_as: Dict[str, str] = None, doc: Union[str, janis_core.tool.documentation.OutputDocumentation, None] = None, glob: Union[janis_core.operators.selectors.Selector, str, None] = None, _skip_output_quality_check=False)[source]¶ A ToolOutput instructs the the engine how to collect an output and how it may be referenced in a workflow.
Parameters: - tag – The identifier of a output, must be unique in the inputs and outputs.
- output_type – The type of output that is being collected.
- selector – How to collect this output, can accept any
janis.Selector
. - glob – (DEPRECATED) An alias for selector
- doc – Documentation on what the output is, used to generate docs.
- _skip_output_quality_check – DO NOT USE THIS PARAMETER, it’s a scapegoat for parsing CWL ExpressionTools when an cwl.output.json is generated
-
You should note that there are key differences between how strings are coerced into Files / Directories in CWL and WDL.
- In WDL, a string is automatically coercible to a file, where the path is relative to the execution directory
- In CWL, a path is NOT automatically coercible, and instead a FILE object (
{path: "<path>", class: "File / Directory"}
) must be created. Janis shortcuts this instead, by inserting your strings as a globs, and letting CWL do this. There may be unintended side effects of this process.
Convention¶
We’ll presume in this workflow that you’ve imported Janis like the following:
import janis_core as j
Examples¶
Stderr / Stdout¶
Collecting stdout and stderr can be done by simply annotating the types. This is functionally equivalent to type File, and using Stderr / Stdout as a selector:
outputs=[
# stdout
j.ToolOutput("out_stdout_1", j.Stdout()),
j.ToolOutput("out_stdout_2", j.File(), selector=j.Stdout()),
# stderr
j.ToolOutput("out_stderr_1", j.Stderr()),
j.ToolOutput("out_stderr_2", j.File(), selector=j.Stderr()),
]
Wildcard / glob outputs¶
If it’s not practical or impossible to determine the names of the outputs, you can use a janis.WildcardSelector
to find all the files that match a particular pattern. This glob pattern is not transformed, and differences may occur between CWL / WDL depending on what glob syntax they use - please refer to their individual documentation for more information
- CWL: Globs
- WDL: Globs
You can use a glob in Janis with:
outputs=[
j.ToolOutput("out_text_files", j.Array(j.File), selector=j.WildcardSelector("*.txt")),
# the next two are functionally equivalent
j.ToolOutput("out_single_text_file_1", j.Array(j.File), selector=j.WildcardSelector("*.txt", select_first=True)),
j.ToolOutput("out_single_text_file_2", j.Array(j.File), selector=j.WildcardSelector("*.txt")[0])
]
Roughly, this is translated to the following:
WDL:
Array[File] out_txt_files = glob("*.txt")
Array[File] out_single_txt_file_1 = glob("*.txt")[0]
Array[File] out_single_txt_file_2 = glob("*.txt")[0]
CWL:
Named outputs¶
Often we’ll use a string or a Filename generator to name an output of a tool. For example, samtools sort
accepts an argument -o
which is an output filename, on top of the regular "bam"
input. We want our output to be of type Bam, so we’ll use a janis.Filename
class, this accepts a few arguments: prefix
, suffix` and extension
, and will generate a filename based on these attributes.
We want our filename to be based on the input bam to keep consistency in our naming, so let’s choose the following attributes:
prefix
- will be the Bam, but want the file extension removed (this will automatically take the basename)suffix
- “.sorted”`extension
-.bam
We can create the following ToolInput value to match this (we use a ToolInput as it means we could override it later):
ToolInput(
"outputFilename",
Filename(
prefix=InputSelector("bam", remove_file_extension=True),
suffix=".sorted",
extension=".bam",
),
position=1, # Ensure it appears before the "bam" input
prefix="-o", # Prefix, eg: '-o <output-filename>'
)
Then, on the ToolOutput, we can use the selector selector=InputSelector("outputFilename")
to get this value. This results in the final tool:
SamtoolsSort_1_9_0 = CommandToolBuilder(
tool="SamToolsSort",
base_command=["samtools", "sort"],
inputs=[
ToolInput("bam", input_type=Bam(), position=2),
ToolInput(
"outputFilename",
Filename(
prefix=InputSelector("bam", remove_file_extension=True),
suffix=".sorted",
extension=".bam",
),
position=1,
prefix="-o",
),
],
outputs=[ToolOutput("out", Bam(), selector=InputSelector("outputFilename"))],
container="quay.io/biocontainers/samtools:1.9--h8571acd_11",
version="1.9.0",
)
Looking at the relevant WDL:
task SamToolsSort {
input {
File bam
String? outputFilename
}
command <<<
set -e
samtools sort \
-o '~{select_first([outputFilename, "~{basename(bam, ".bam")}.sorted.bam"])}' \
'~{bam}'
>>>
output {
File out = select_first([outputFilename, "~{basename(bam, ".bam")}.sorted.bam"])
}
}
And CWL:
class: CommandLineTool
id: SamToolsSort
baseCommand:
- samtools
- sort
inputs:
- id: bam
type: File
inputBinding:
position: 2
- id: outputFilename
type:
- string
- 'null'
default: generated.sorted.bam
inputBinding:
prefix: -o
position: 1
valueFrom: $(inputs.bam.basename.replace(/.bam$/, "")).sorted.bam
outputs:
- id: out
type: File
outputBinding:
glob: $(inputs.bam.basename.replace(/.bam$/, "")).sorted.bam