Tutorial 2 - Wrapping a new tool¶
This tutorial builds on the content and output from Tutorial 1.
Introduction¶
A CommandTool is the interface between Janis and a program to be executed. Simply put, a CommandTool has a name, a command, inputs, outputs and a container to run in. Inputs and arguments can have a prefix and / or position, and this is used to construct the command line.
The Janis documentation for the CommandTool gives an introduction to the tool structure and a template for constructing your own tool. A tool wrapper must provide all of the information to configure and run the specified tool, this includes the base_command, janis.ToolInput, janis.ToolOutput, a container and its version.
Container¶
Further information: Containerising your tools
For portability, Janis requires that you specify an OCI compliant container (eg: Docker) for your tool. Often there will already be a container with some searching, however here’s a guide on preparing your tools in containers to ensure it works across all environments.
Preparation¶
The sample data to test this tool is computed in Tutorial 1. You can follow this tutorial, but running the example will require you to have completed and obtained the bam from the first tutorial.
Samtools flagstat¶
In this tutorial we’re going to wrap the samtools flagstat tool - flagstat counts the number of alignments for each FLAG type within a bam file.
Samtools project links¶
- Latest version:
1.9 - Project page: http://www.htslib.org/doc/samtools.html
- Github: samtools/samtools
- Docker containers: quay.io/biocontainers/samtools (automatically / community generated)
- Latest tag:
1.9--h8571acd_11
- Latest tag:
Command to build¶
We want to replicate the following command for Samtools Flagstat in Janis:
samtools flagstat [--threads n] <in.bam>
Hence, we can isolate the following information:
- Base commands:
"samtools","flagstat" - The positional
<in.bam>input - The configuration
--threadsinput
Command tool template¶
The following template is the minimum amount of information required to wrap a tool. For more information, see the CommandToolBuilder documentation.
We’ve removed the optional fields:tool_module,tool_provider,metadata,cpu,memoryfrom the following template.
We’re going to use Bam and TextFile data types, so let’s import them as well.
from janis_core import CommandToolBuilder, ToolInput, ToolOutput, Int, Stdout
ToolName = CommandToolBuilder(
tool: str="toolname",
base_command=["base", "command"],
inputs=[], # List[ToolInput]
outputs=[], # List[ToolOutput]
container="container/name:version",
version="version"
)
Tool information¶
Let’s start by creating a file with this template inside a second output directory:
mkdir -p tools
vim tools/samtoolsflagstat.py
We can start by filling in the basic information:
- Rename the variable (ToolName) to be
SamtoolsFlagstat - Fill the parameters:
tool: A unqiue tool identifier, eg:"SamtoolsFlagStat".base_commandto be["samtools", "flagstat"]containerto be"quay.io/biocontainers/samtools:1.9--h8571acd_11"versionto be"v1.9.0"
You’ll have a class definition like the following
SamtoolsFlagstat = CommandToolBuilder(
tool: str="samtoolsflagstat",
base_command=["samtools", "flagstat"],
container="quay.io/biocontainers/samtools:1.9--h8571acd_11",
version="1.9.0",
inputs=[], # List[ToolInput]
outputs=[] # List[ToolOutput]
)
Inputs¶
Further reading:ToolInput
We’ll use the ToolInput class to represent these inputs. A ToolInput provides a mechanism for binding this input onto the command line (eg: prefix, position, transformations). See the documentation for more ways to configure a ToolInput.
janis.ToolInput(
tag: str,
input_type: DataType,
position: Optional[int] = None,
prefix: Optional[str] = None,
# more configuration options
separate_value_from_prefix: bool = None,
prefix_applies_to_all_elements: bool = None,
presents_as: str = None,
secondaries_present_as: Dict[str, str] = None,
separator: str = None,
shell_quote: bool = None,
localise_file: bool = None,
default: Any = None,
doc: Optional[str] = None
)
Nb: A ToolInput must have apositionORprefixin order to be bound onto the command line. If the prefix is specified with no position, aposition=0is automatically applied.
Now we can declare our two inputs:
- Positional bam input
- Threads configuration input with the prefix
--threads
We’re going to give our inputs a name through which we can reference them by. This allows us to specify a value from the command line, or connect the result of a previous step within a workflow.
SamtoolsFlagstat = CommandToolBuilder(
# ... tool information
inputs=[
# 1. Positional bam input
ToolInput(
"bam", # name of our input
Bam,
position=1,
doc="Input bam to generate statistics for"
),
# 2. `threads` inputs
ToolInput(
"threads", # name of our input
Int(optional=True),
prefix="--threads",
doc="(-@) Number of additional threads to use [0]"
)
],
# outputs
Outputs¶
Further reading:ToolOutput
We’ll use the ToolOutput class to collect and represent these outputs. A ToolOutput has a type, and if not using stdout we can provide a glob parameter.
janis.ToolOutput(
tag: str,
output_type: DataType,
glob: Union[janis_core.types.selectors.Selector, str, None] = None,
# more configuration options
presents_as: str = None,
secondaries_present_as: Dict[str, str] = None,
doc: Optional[str] = None
)
The only output of samtools flagstat is the statistics that are written to stdout. We give this the name "stats", and collect this with the Stdout data type. We can additionally tell Janis that the Stdout has type TextFile.
SamtoolsFlagstat = CommandToolBuilder(
# ... tool information + inputs
outputs=[
ToolOutput("stats", Stdout(TextFile))
]
)
Tool definition¶
Putting this all together, you should have the following tool definition:
from janis_core import CommandToolBuilder, ToolInput, ToolOutput, Int, Stdout
from janis_unix.data_types import TextFile
from janis_bioinformatics.data_types import Bam
SamtoolsFlagstat = CommandToolBuilder(
tool="samtoolsflagstat",
base_command=["samtools", "flagstat"],
container="quay.io/biocontainers/samtools:1.9--h8571acd_11",
version="v1.9.0",
inputs=[
# 1. Positional bam input
ToolInput("bam", Bam, position=1),
# 2. `threads` inputs
ToolInput("threads", Int(optional=True), prefix="--threads"),
],
outputs=[ToolOutput("stats", Stdout(TextFile))],
)
Testing the tool¶
We can test the translation of this from the CLI:
If you have multiple command tools or workflows declared in the same file, you will need to provide the--nameparameter with the name of your tool.
janis translate tools/samtoolsflagstat.py wdl # or cwl
In the following translation, we can see the WDL representation of our tool. In particular, the command block gives us an indication of how the command line might look:
task samtoolsflagstat {
input {
Int? runtime_cpu
Int? runtime_memory
File bam
Int? threads
}
command <<<
samtools flagstat \
~{"--threads " + threads} \
~{bam}
>>>
runtime {
docker: "quay.io/biocontainers/samtools:1.9--h8571acd_11"
cpu: if defined(runtime_cpu) then runtime_cpu else 1
memory: if defined(runtime_memory) then "~{runtime_memory}G" else "4G"
preemptible: 2
}
output {
File stats = stdout()
}
}
Running the workflow¶
A reminder that the sample data for this section requires you to have completed Tutorial 1.
We can call the janis run functionality, and use the output from tutorial1:
janis run -o tutorial2 tools/samtoolsflagstat.py --bam tutorial1/out.bam
OUTPUT:
WID: f9e89f
EngId: f9e89f
Name: samtoolsflagstatWf
Engine: cwltool
Task Dir: $HOME/janis-tutorials/tutorial2/
Exec Dir: $HOME/janis-tutorials/tutorial2/janis/execution/
Status: Completed
Duration: 4s
Start: 2019-11-14T04:51:59.744526+00:00
Finish: 2019-11-14T04:52:03.869735+00:00
Updated: Just now (2019-11-14T04:52:05+00:00)
Jobs:
[✓] samtoolsflagstat (N/A)
Outputs:
- stats: $HOME/janis-tutorials/tutorial2/stats.txt
Janis (and CWLTool) said the tool executed correctly, let’s check the output file:
cat tutorial2/stats.txt
20061 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
337 + 0 supplementary
0 + 0 duplicates
19971 + 0 mapped (99.55% : N/A)
19724 + 0 paired in sequencing
9862 + 0 read1
9862 + 0 read2
18606 + 0 properly paired (94.33% : N/A)
19544 + 0 with itself and mate mapped
90 + 0 singletons (0.46% : N/A)
860 + 0 with mate mapped to a different chr
691 + 0 with mate mapped to a different chr (mapQ>=5)