Tutorial 4: Simple Galaxy Tool -> Nextflow¶
Sections¶
- Introduction
- Running Janis Translate
- Manual Adjustments
- Running Samtools Flagstat as a Workflow
- Conclusion
Introduction¶
This section demonstrates translation of a basic samtools flagstat
Galaxy Tool Wrapper to Nextflow using janis translate
.
The Galaxy Tool Wrapper used in this section was created by contributors to the Galaxy Devteam repository of tools.
The underlying software used in the Galaxy Tool Wrapper - samtools_flagstat - displays summary information for an alignment file.
Software
Before continuing, ensure you have the following software installed:
IDE
Any IDE or a CLI text editor (VIM, nano) are sufficient for this material.
We recommend Visual Studio Code (VS Code) as it is lightweight and has rich support for extensions to add functionality.
Obtaining Janis
In this tutorial we will use a singularity container to run janis translate
.
Containers are great because they remove the need for package managers, and guarantee that the software can run on any machine.
Run the following command to pull the janis image:
singularity pull janis.sif docker://pppjanistranslate/janis-translate:0.13.0
Check your image by running the following command:
singularity exec ~/janis.sif janis translate
If the image is working, you should see the janis translate helptext.
Downloading Source Files and Sample Data
For this tutorial we will fetch all necessary data from zenodo using wget.
This archive contains sample data and the finished translations as a reference.
Run the following commands to download & decompress the zenodo archive:
wget https://zenodo.org/record/8275567/files/tutorial4.tar.gz
tar -xvf tutorial4.tar.gz
After you have decompressed the tar archive, change into the new directory:
cd tutorial4
Inside this folder we have the following structure:
tutorial4
├── data
│ └── alignments.bam
└── final
├── nextflow.config
└── samtools_flagstat.nf
We will translate the Galaxy “samtools flagstat” Tool Wrapper into nextflow using janis, then will test out our translation using the bam file in the data/
folder.
Running Janis Translate¶
To translate a tool / workflow, we use janis translate
.
janis translate --from <src> --to <dest> <filepath>
The --from
argument specifies the workflow language of the source file(s), and --to
specifies the destination we want to translate to.
In our case, this will be --from galaxy --to nextflow
.
The <filepath>
argument is the source file we will translate.
Aside from local filepaths, janis translate
can also access Galaxy Tool Wrappers using a tool ID.
We will use this method today as it is an easier way to access Tool Wrappers.
To get the samtools_flagstat
Tool ID, navigate to the tool using any usegalaxy.org server.
The following is a link to the samtools flagstat tool in Galaxy Australia:
https://usegalaxy.org.au/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/samtools_flagstat/samtools_flagstat/2.0.4.
Once here, we will copy the Tool ID.
alt text
At time of writing, the current Tool ID for the samtools_flagstat
tool wrapper is toolshed.g2.bx.psu.edu/repos/devteam/samtools_flagstat/samtools_flagstat/2.0.4
Now we have the Tool ID, we can access & translate this Galaxy Tool Wrapper to a Nextflow process.
To translate the Galaxy Tool, run the following command:
singularity exec ~/janis.sif janis translate -o samtools_flagstat --from galaxy --to nextflow toolshed.g2.bx.psu.edu/repos/devteam/samtools_flagstat/samtools_flagstat/2.0.4
Once complete, you will see a folder called translated/
appear, and a nextflow process called samtools_flagstat.nf
will be present inside.
For your own reference / interest, the actual Galaxy Tool Wrapper files will be downloaded during translation & will be presented to you in translated/source/
.
Manual Adjustments¶
The translated/samtools_flagstat.nf
file should be similar to the following:
nextflow.enable.dsl=2
process SAMTOOLS_FLAGSTAT {
container "quay.io/biocontainers/samtools:1.13--h8c37831_0"
input:
path input1
val addthreads
output:
path "output1.txt", emit: output1
script:
"""
samtools flagstat \
--output-fmt "txt" \
-@ ${addthreads} \
${input1} \
> output1.txt
"""
}
We can see that this nextflow process has two inputs, a single output, and calls samtools flagstat
.
Before continuing, let’s check the samtools flagstat documentation. In the documentation, we see the following:
samtools flagstat in.sam|in.bam|in.cram
-@ INT
Set number of additional threads to use when reading the file.
--output-fmt/-O FORMAT
Set the output format. FORMAT can be set to `default', `json' or `tsv' to select the default, JSON or tab-separated values output format. If this option is not used, the default format will be selected.
By matching up the process inputs:
section and the script:
section, we can see that:
path input1
will be the inputsam | bam | cram
val addthreads
will be the threads argument passed to-@
- the
--output-fmt
option has been assigned the default value of"txt"
We can also see that a container image is available for this tool.
This translation is correct for the samtools flagstat
Galaxy tool wrapper and needs no adjusting.
Note:
If you would like to expose the--output-fmt
option as a process input, you can do the following:
- add a
val format
input to the process- reference this input in the script, replacing the hardcoded
"txt"
value
(e.g.--output-fmt ${format}
)
Running Samtools Flagstat as a Workflow¶
Setting up nextflow.config
To run this process, we will set up a nextflow.config
file and add some lines to the top of our process definition to turn it into a workflow.
Create a new file called nextflow.config
in the translated/
folder alongside samtools_flagstat.nf
.
Copy and paste the following code into your nextflow.config
file:
nextflow.enable.dsl = 2
singularity.enabled = true
singularity.cacheDir = "$HOME/.singularity/cache"
params {
bam = "../data/alignments.bam"
threads = 1
}
This tells nextflow how to run, and sets up inputs parameters we can use to supply values to the SAMTOOLS_FLAGSTAT
process:
- The
bam
parameter is the input bam file we wish to analyse. - The
threads
parameter is an integer, and controls how many additional compute threads to use during runtime.
From here, we can refer to these inputs as params.bam
/ params.threads
in other files.
Creating Workflow & Passing Data
Now that we have the nextflow.config
file set up, we will add a few lines to samtools_flagstat.nf
to turn it into a workflow.
Copy and paste the following lines at the top of samtools_flagstat.nf
:
nextflow.enable.dsl=2
ch_bam = Channel.fromPath( params.bam )
workflow {
SAMTOOLS_FLAGSTAT(
ch_bam, // input1
params.threads // addthreads
)
SAMTOOLS_FLAGSTAT.out.output1.view()
}
The first line creates a nextflow Channel
for our bam
input.
The params.bam
global variable we set up previously is used to supply the
path to our sample data.
The new workflow {}
section declares the main workflow entry point.
When we run this file, nextflow will look for this section and run the workflow contained within.
In our case, the workflow only contains a single task, which runs the SAMTOOLS_FLAGSTAT
process defined below the workflow section. We then supply input values to SAMTOOLS_FLAGSTAT
using our ch_bams
channel we created for input1
, and params.threads
for the addthreads
input.
Adding publishDir directive
So that we can collect the output of SAMTOOLS_FLAGSTAT
when it runs, we will add a publishDir
directive to the process:
process SAMTOOLS_FLAGSTAT {
container "quay.io/biocontainers/samtools:1.13--h8c37831_0"
publishDir "./outputs"
...
}
Now that we have set up SAMTOOLS_FLAGSTAT
as a workflow, we can run it and check the output.
Running Our Workflow
Ensure you are in the translated/
working directory, where nextflow.config
and samtools_flagstat.nf
reside.
cd translated
To run the workflow using our sample data, we can now write the following command:
nextflow run samtools_flagstat.nf
Once completed, the check the ./outputs
folder inside translated/
.
If everything went well, you should see a single file called output1.txt
with the following contents:
200 + 0 in total (QC-passed reads + QC-failed reads)
200 + 0 primary
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
25 + 0 mapped (12.50% : N/A)
25 + 0 primary mapped (12.50% : N/A)
200 + 0 paired in sequencing
100 + 0 read1
100 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
25 + 0 singletons (12.50% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Conclusion¶
In this tutorial we explored how to translate a simple Galaxy Tool to a Nextflow process.
If needed, you can check the final/
folder as a reference for the translated nextflow process and config.