Tutorial 2 - Wrapping a new tool¶
This tutorial builds on the content and output from Tutorial 1.
Introduction¶
A CommandTool is the interface between Janis and a program to be executed. Simply put, a CommandTool has a name, a command, inputs, outputs and a container to run in. Inputs and arguments can have a prefix and / or position, and this is used to construct the command line.
The Janis documentation for the CommandTool gives an introduction to the tool structure and a template for constructing your own tool. A tool wrapper must provide all of the information to configure and run the specified tool, this includes the base_command
, janis.ToolInput, janis.ToolOutput, a container
and its version.
Container¶
Further information: Containerising your tools
For portability, Janis requires that you specify an OCI compliant container
(eg: Docker) for your tool. Often there will already be a container with some searching, however here’s a guide on preparing your tools in containers to ensure it works across all environments.
Preparation¶
The sample data to test this tool is computed in Tutorial 1. You can follow this tutorial, but running the example will require you to have completed and obtained the bam from the first tutorial.
Samtools flagstat¶
In this tutorial we’re going to wrap the samtools flagstat
tool - flagstat counts the number of alignments for each FLAG type within a bam file.
Samtools project links¶
- Latest version:
1.9
- Project page: http://www.htslib.org/doc/samtools.html
- Github: samtools/samtools
- Docker containers: quay.io/biocontainers/samtools (automatically / community generated)
- Latest tag:
1.9--h8571acd_11
- Latest tag:
Command to build¶
We want to replicate the following command for Samtools Flagstat
in Janis:
samtools flagstat [--threads n] <in.bam>
Hence, we can isolate the following information:
- Base commands:
"samtools"
,"flagstat"
- The positional
<in.bam>
input - The configuration
--threads
input
Command tool template¶
The following template is the minimum amount of information required to wrap a tool. For more information, see the CommandToolBuilder documentation.
We’ve removed the optional fields:tool_module
,tool_provider
,metadata
,cpu
,memory
from the following template.
We’re going to use Bam
and TextFile
data types, so let’s import them as well.
from janis_core import CommandToolBuilder, ToolInput, ToolOutput, Int, Stdout
ToolName = CommandToolBuilder(
tool: str="toolname",
base_command=["base", "command"],
inputs=[], # List[ToolInput]
outputs=[], # List[ToolOutput]
container="container/name:version",
version="version"
)
Tool information¶
Let’s start by creating a file with this template inside a second output directory:
mkdir -p tools
vim tools/samtoolsflagstat.py
We can start by filling in the basic information:
- Rename the variable (ToolName) to be
SamtoolsFlagstat
- Fill the parameters:
tool
: A unqiue tool identifier, eg:"SamtoolsFlagStat"
.base_command
to be["samtools", "flagstat"]
container
to be"quay.io/biocontainers/samtools:1.9--h8571acd_11"
version
to be"v1.9.0"
You’ll have a class definition like the following
SamtoolsFlagstat = CommandToolBuilder(
tool: str="samtoolsflagstat",
base_command=["samtools", "flagstat"],
container="quay.io/biocontainers/samtools:1.9--h8571acd_11",
version="1.9.0",
inputs=[], # List[ToolInput]
outputs=[] # List[ToolOutput]
)
Inputs¶
Further reading:ToolInput
We’ll use the ToolInput class to represent these inputs. A ToolInput
provides a mechanism for binding this input onto the command line (eg: prefix, position, transformations). See the documentation for more ways to configure a ToolInput.
janis.ToolInput(
tag: str,
input_type: DataType,
position: Optional[int] = None,
prefix: Optional[str] = None,
# more configuration options
separate_value_from_prefix: bool = None,
prefix_applies_to_all_elements: bool = None,
presents_as: str = None,
secondaries_present_as: Dict[str, str] = None,
separator: str = None,
shell_quote: bool = None,
localise_file: bool = None,
default: Any = None,
doc: Optional[str] = None
)
Nb: A ToolInput must have aposition
ORprefix
in order to be bound onto the command line. If the prefix is specified with no position, aposition=0
is automatically applied.
Now we can declare our two inputs:
- Positional bam input
- Threads configuration input with the prefix
--threads
We’re going to give our inputs a name through which we can reference them by. This allows us to specify a value from the command line, or connect the result of a previous step within a workflow.
SamtoolsFlagstat = CommandToolBuilder(
# ... tool information
inputs=[
# 1. Positional bam input
ToolInput(
"bam", # name of our input
Bam,
position=1,
doc="Input bam to generate statistics for"
),
# 2. `threads` inputs
ToolInput(
"threads", # name of our input
Int(optional=True),
prefix="--threads",
doc="(-@) Number of additional threads to use [0]"
)
],
# outputs
Outputs¶
Further reading:ToolOutput
We’ll use the ToolOutput class to collect and represent these outputs. A ToolOutput
has a type, and if not using stdout
we can provide a glob
parameter.
janis.ToolOutput(
tag: str,
output_type: DataType,
glob: Union[janis_core.types.selectors.Selector, str, None] = None,
# more configuration options
presents_as: str = None,
secondaries_present_as: Dict[str, str] = None,
doc: Optional[str] = None
)
The only output of samtools flagstat
is the statistics that are written to stdout
. We give this the name "stats"
, and collect this with the Stdout
data type. We can additionally tell Janis that the Stdout has type TextFile
.
SamtoolsFlagstat = CommandToolBuilder(
# ... tool information + inputs
outputs=[
ToolOutput("stats", Stdout(TextFile))
]
)
Tool definition¶
Putting this all together, you should have the following tool definition:
from janis_core import CommandToolBuilder, ToolInput, ToolOutput, Int, Stdout
from janis_unix.data_types import TextFile
from janis_bioinformatics.data_types import Bam
SamtoolsFlagstat = CommandToolBuilder(
tool="samtoolsflagstat",
base_command=["samtools", "flagstat"],
container="quay.io/biocontainers/samtools:1.9--h8571acd_11",
version="v1.9.0",
inputs=[
# 1. Positional bam input
ToolInput("bam", Bam, position=1),
# 2. `threads` inputs
ToolInput("threads", Int(optional=True), prefix="--threads"),
],
outputs=[ToolOutput("stats", Stdout(TextFile))],
)
Testing the tool¶
We can test the translation of this from the CLI:
If you have multiple command tools or workflows declared in the same file, you will need to provide the--name
parameter with the name of your tool.
janis translate tools/samtoolsflagstat.py wdl # or cwl
In the following translation, we can see the WDL representation of our tool. In particular, the command
block gives us an indication of how the command line might look:
task samtoolsflagstat {
input {
Int? runtime_cpu
Int? runtime_memory
File bam
Int? threads
}
command <<<
samtools flagstat \
~{"--threads " + threads} \
~{bam}
>>>
runtime {
docker: "quay.io/biocontainers/samtools:1.9--h8571acd_11"
cpu: if defined(runtime_cpu) then runtime_cpu else 1
memory: if defined(runtime_memory) then "~{runtime_memory}G" else "4G"
preemptible: 2
}
output {
File stats = stdout()
}
}
Running the workflow¶
A reminder that the sample data for this section requires you to have completed Tutorial 1.
We can call the janis run
functionality, and use the output from tutorial1:
janis run -o tutorial2 tools/samtoolsflagstat.py --bam tutorial1/out.bam
OUTPUT:
WID: f9e89f
EngId: f9e89f
Name: samtoolsflagstatWf
Engine: cwltool
Task Dir: $HOME/janis-tutorials/tutorial2/
Exec Dir: $HOME/janis-tutorials/tutorial2/janis/execution/
Status: Completed
Duration: 4s
Start: 2019-11-14T04:51:59.744526+00:00
Finish: 2019-11-14T04:52:03.869735+00:00
Updated: Just now (2019-11-14T04:52:05+00:00)
Jobs:
[✓] samtoolsflagstat (N/A)
Outputs:
- stats: $HOME/janis-tutorials/tutorial2/stats.txt
Janis (and CWLTool) said the tool executed correctly, let’s check the output file:
cat tutorial2/stats.txt
20061 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
337 + 0 supplementary
0 + 0 duplicates
19971 + 0 mapped (99.55% : N/A)
19724 + 0 paired in sequencing
9862 + 0 read1
9862 + 0 read2
18606 + 0 properly paired (94.33% : N/A)
19544 + 0 with itself and mate mapped
90 + 0 singletons (0.46% : N/A)
860 + 0 with mate mapped to a different chr
691 + 0 with mate mapped to a different chr (mapQ>=5)