Collecting tool outputs

This guide will roughly walk you through the different ways of collecting outputs from a CommandTool. The bread and butter of this tutorial is the ToolOutput, and how you use various Selectors / Operators.

Examples:

class janis.ToolOutput(tag: str, output_type: Union[Type[Union[str, float, int, bool]], janis_core.types.data_types.DataType, Type[janis_core.types.data_types.DataType]], selector: Union[janis_core.operators.selectors.Selector, str, None] = None, presents_as: str = None, secondaries_present_as: Dict[str, str] = None, doc: Union[str, janis_core.tool.documentation.OutputDocumentation, None] = None, glob: Union[janis_core.operators.selectors.Selector, str, None] = None, _skip_output_quality_check=False)[source]
__init__(tag: str, output_type: Union[Type[Union[str, float, int, bool]], janis_core.types.data_types.DataType, Type[janis_core.types.data_types.DataType]], selector: Union[janis_core.operators.selectors.Selector, str, None] = None, presents_as: str = None, secondaries_present_as: Dict[str, str] = None, doc: Union[str, janis_core.tool.documentation.OutputDocumentation, None] = None, glob: Union[janis_core.operators.selectors.Selector, str, None] = None, _skip_output_quality_check=False)[source]

A ToolOutput instructs the the engine how to collect an output and how it may be referenced in a workflow.

Parameters:
  • tag – The identifier of a output, must be unique in the inputs and outputs.
  • output_type – The type of output that is being collected.
  • selector – How to collect this output, can accept any janis.Selector.
  • glob – (DEPRECATED) An alias for selector
  • doc – Documentation on what the output is, used to generate docs.
  • _skip_output_quality_check – DO NOT USE THIS PARAMETER, it’s a scapegoat for parsing CWL ExpressionTools when an cwl.output.json is generated

You should note that there are key differences between how strings are coerced into Files / Directories in CWL and WDL.

  • In WDL, a string is automatically coercible to a file, where the path is relative to the execution directory
  • In CWL, a path is NOT automatically coercible, and instead a FILE object ({path: "<path>", class: "File / Directory"}) must be created. Janis shortcuts this instead, by inserting your strings as a globs, and letting CWL do this. There may be unintended side effects of this process.

Convention

We’ll presume in this workflow that you’ve imported Janis like the following:

import janis_core as j

Examples

Stderr / Stdout

Collecting stdout and stderr can be done by simply annotating the types. This is functionally equivalent to type File, and using Stderr / Stdout as a selector:

outputs=[
    # stdout
    j.ToolOutput("out_stdout_1", j.Stdout()),
    j.ToolOutput("out_stdout_2", j.File(), selector=j.Stdout()),
    # stderr
    j.ToolOutput("out_stderr_1", j.Stderr()),
    j.ToolOutput("out_stderr_2", j.File(), selector=j.Stderr()),
]

Wildcard / glob outputs

If it’s not practical or impossible to determine the names of the outputs, you can use a janis.WildcardSelector to find all the files that match a particular pattern. This glob pattern is not transformed, and differences may occur between CWL / WDL depending on what glob syntax they use - please refer to their individual documentation for more information

  • CWL: Globs
  • WDL: Globs

You can use a glob in Janis with:

outputs=[
    j.ToolOutput("out_text_files", j.Array(j.File), selector=j.WildcardSelector("*.txt")),
    # the next two are functionally equivalent
    j.ToolOutput("out_single_text_file_1", j.Array(j.File), selector=j.WildcardSelector("*.txt", select_first=True)),
     j.ToolOutput("out_single_text_file_2", j.Array(j.File), selector=j.WildcardSelector("*.txt")[0])
]

Roughly, this is translated to the following:

WDL:

Array[File] out_txt_files = glob("*.txt")
Array[File] out_single_txt_file_1 = glob("*.txt")[0]
Array[File] out_single_txt_file_2 = glob("*.txt")[0]

CWL:

Named outputs

Often we’ll use a string or a Filename generator to name an output of a tool. For example, samtools sort accepts an argument -o which is an output filename, on top of the regular "bam" input. We want our output to be of type Bam, so we’ll use a janis.Filename class, this accepts a few arguments: prefix, suffix` and extension, and will generate a filename based on these attributes.

We want our filename to be based on the input bam to keep consistency in our naming, so let’s choose the following attributes:

  • prefix - will be the Bam, but want the file extension removed (this will automatically take the basename)
  • suffix - “.sorted”`
  • extension - .bam

We can create the following ToolInput value to match this (we use a ToolInput as it means we could override it later):

ToolInput(
    "outputFilename",
    Filename(
        prefix=InputSelector("bam", remove_file_extension=True),
        suffix=".sorted",
        extension=".bam",
    ),
    position=1,     # Ensure it appears before the "bam" input
    prefix="-o",    # Prefix, eg: '-o <output-filename>'
)

Then, on the ToolOutput, we can use the selector selector=InputSelector("outputFilename") to get this value. This results in the final tool:

SamtoolsSort_1_9_0 = CommandToolBuilder(
    tool="SamToolsSort",
    base_command=["samtools", "sort"],
    inputs=[
        ToolInput("bam", input_type=Bam(), position=2),
        ToolInput(
            "outputFilename",
            Filename(
                prefix=InputSelector("bam", remove_file_extension=True),
                suffix=".sorted",
                extension=".bam",
            ),
            position=1,
            prefix="-o",
        ),
    ],
    outputs=[ToolOutput("out", Bam(), selector=InputSelector("outputFilename"))],
    container="quay.io/biocontainers/samtools:1.9--h8571acd_11",
    version="1.9.0",
)

Looking at the relevant WDL:

task SamToolsSort {
  input {
    File bam
    String? outputFilename
  }
  command <<<
    set -e
    samtools sort \
      -o '~{select_first([outputFilename, "~{basename(bam, ".bam")}.sorted.bam"])}' \
      '~{bam}'
  >>>
  output {
    File out = select_first([outputFilename, "~{basename(bam, ".bam")}.sorted.bam"])
  }
}

And CWL:

class: CommandLineTool
id: SamToolsSort
baseCommand:
- samtools
- sort

inputs:
- id: bam
type: File
inputBinding:
position: 2
- id: outputFilename
type:
- string
- 'null'
default: generated.sorted.bam
inputBinding:
prefix: -o
position: 1
valueFrom: $(inputs.bam.basename.replace(/.bam$/, "")).sorted.bam

outputs:
- id: out
type: File
outputBinding:
glob: $(inputs.bam.basename.replace(/.bam$/, "")).sorted.bam