Tutorial 1: Simple CWL Tool -> Nextflow¶
Sections¶
Introduction¶
This section demonstrates translation of a basic samtools flagstat
tool from CWL to Nextflow using janis translate
.
The CWL tool used in this section - samtools_flagstat.cwl - is taken from the McDonnell Genome Institute (MGI) analysis-workflows repository.
This resource stores publically available analysis pipelines for genomics data.
It is a fantastic piece of research software, and the authors thank MGI for their contribution to open-source research software.
The underlying software run by this tool - Samtools Flagstat - displays summary information for an alignment file.
Software
Before continuing, ensure you have the following software installed:
IDE
Any IDE or a CLI text editor (VIM, nano) are sufficient for this material.
We recommend Visual Studio Code (VS Code) as it is lightweight and has rich support for extensions to add functionality.
Obtaining Janis
In this tutorial we will use a singularity container to run janis translate
.
Containers are great because they remove the need for package managers, and guarantee that the software can run on any machine.
Run the following command to pull the janis image:
singularity pull janis.sif docker://pppjanistranslate/janis-translate:0.13.0
Check your image by running the following command:
singularity exec ~/janis.sif janis translate
If the image is working, you should see the janis translate helptext.
Downloading Source Files and Sample Data
For this tutorial we will fetch all necessary data from zenodo using wget.
This archive contains the source CWL file, sample data, and finished translations as a reference.
Run the following commands to download & decompress the zenodo archive:
wget https://zenodo.org/record/8275567/files/tutorial1.tar.gz
tar -xvf tutorial1.tar.gz
After you have decompressed the tar archive, change into the new directory:
cd tutorial1
Inside this folder we have the following structure:
tutorial1
├── data
│ ├── 2895499223_sorted.bam
│ └── 2895499223_sorted.bam.bai
├── final
│ ├── nextflow.config
│ └── samtools_flagstat.nf
└── source
└── samtools_flagstat.cwl
We will translate the source/samtools_flagstat.cwl
tool into nextflow using janis, then will test out our translation using the indexed bam file in the data/
folder.
Running Janis Translate¶
To translate a tool / workflow, we use janis translate
.
janis translate --from <src> --to <dest> <filepath>
The --from
argument specifies the workflow language of the source file(s), and --to
specifies the destination we want to translate to.
In our case, this will be --from cwl --to nextflow
.
The <filepath>
argument is the source file we will translate.
In this tutorial, the filepath will be source/samtools_flagstat.cwl
To translate our cwl tool, run the following command:
singularity exec ~/janis.sif janis translate --from cwl --to nextflow source/samtools_flagstat.cwl
You will see a folder called translated/
appear, and a nextflow process called samtools_flagstat.nf
will be present inside.
Manual Adjustments to Translated Tool¶
The translated/samtools_flagstat.nf
file should be similar to the following:
nextflow.enable.dsl = 2
process SAMTOOLS_FLAGSTAT {
container "quay.io/biocontainers/samtools:1.11--h6270b1f_0"
input:
path bam
output:
path "${bam[0]}.flagstat", emit: flagstats
script:
def bam = bam[0]
"""
/usr/local/bin/samtools flagstat \
${bam} \
> ${bam}.flagstat \
"""
}
We can see that this nextflow process has a single input, a single output, and calls samtools flagstat
on the input bam
.
We can also see that a container image is available for this tool. In the next section we will run this process using some sample data and the specified container.
This translation is correct for the samtools_flagstat.cwl
file and needs no adjusting.
Have a look at the source CWL file to see how they match up.
Note:
def bam = bam[0]
in the script block is used to handle datatypes with secondary files.
Thebam
input is an indexed bam type, so requires a.bai
file to also be present in the working directory alongside the.bam
file.
For this reason thebam
input is supplied as an Array with 2 files - the.bam
and the.bai
.
Here thedef bam = bam[0]
is used so thatbam
refers to the.bam
file in that Array.
Running Translated Tool as a Workflow¶
Collecting Process Outputs
Let’s add a publishDir
directive to our translated process so that we can capture the outputs of this process.
process SAMTOOLS_FLAGSTAT {
container "quay.io/biocontainers/samtools:1.11--h6270b1f_0"
publishDir "./outputs" <-
....
}
Nextflow allows us to capture the outputs created by a process using the publishDir
directive seen above.
Setting up nextflow.config
To run this process, we will set up a nextflow.config
file and add some lines to the top of our process definition to turn it into a workflow.
Create a new file called nextflow.config
in the translated/
folder alongside samtools_flagstat.nf
.
Copy and paste the following code into your nextflow.config
file:
nextflow.enable.dsl = 2
singularity.enabled = true
singularity.cacheDir = "$HOME/.singularity/cache"
params {
bam = [
'../data/2895499223_sorted.bam',
'../data/2895499223_sorted.bam.bai',
]
}
This tells nextflow how to run, and sets up an input parameter for our indexed bam input.
The bam
parameter is a list which provides paths to the .bam
and .bai
sample data we will use to test the nextflow translation. From here, we can refer to the indexed bam input as params.bam
in other files.
NOTE
nextflow.enable.dsl = 2
ensures that we are using the dsl2 nextflow syntax which is the current standard.
singularity.enabled = true
tells nextflow to run processes using singularity. Oursamtools_flagstat.nf
has a directive with the formcontainer "quay.io/biocontainers/samtools:1.11--h6270b1f_0"
provided, so it will use the specified image when running this process.
singularity.cacheDir = "$HOME/.singularity/cache"
tells nextflow where singularity images are stored.
Nextflow will handle the singularity image download and stored it in the cache specified above.
Creating Workflow & Passing Data
Now that we have the nextflow.config
file set up, we will add a few lines to samtools_flagstat.nf
to turn it into a workflow.
Copy and paste the following lines at the top of samtools_flagstat.nf
:
ch_bam = Channel.fromPath( params.bam ).toList()
workflow {
SAMTOOLS_FLAGSTAT(ch_bam)
}
The first line creates a nextflow Channel
for our bam
input and ensures it is a list.
The Channel.toList()
part collects our files into a list, as both the .bam
and .bai
files must be passed together.
The params.bam
global variable we set up previously is used to supply the paths to our sample data.
The new workflow {}
section declares the main workflow entry point.
When we run this file, nextflow will look for this section and run the workflow contained within.
In our case, the workflow only contains a single task, which runs the SAMTOOLS_FLAGSTAT
process defined below the workflow section. The single SAMTOOLS_FLAGSTAT
input is being passed data from our ch_bam
channel we declared at the top of the file.
Running Our Workflow
Ensure you are in the translated/
working directory, where nextflow.config
and samtools_flagstat.nf
reside.
cd translated
To run the workflow using our sample data, we can now write the following command:
nextflow run samtools_flagstat.nf
Nextflow will automatically check if there is a nextflow.config
file in the working directory, and if so will use that to configure itself. Our inputs are supplied in nextflow.config
alongside the dsl2 & singularity config, so it should run without issue.
Once completed, we can check the outputs/
folder to view our results.
If everything went well, there should be a file called 2895499223_sorted.bam.flagstat
with the following contents:
2495 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
1 + 0 supplementary
0 + 0 duplicates
2480 + 0 mapped (99.40% : N/A)
2494 + 0 paired in sequencing
1247 + 0 read1
1247 + 0 read2
2460 + 0 properly paired (98.64% : N/A)
2464 + 0 with itself and mate mapped
15 + 0 singletons (0.60% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Conclusion¶
In this tutorial we explored how to translate a simple CWL tool to a Nextflow process.
If you ran into difficulty, you can check the tutorial1/final/
folder.
This folder contains the nextflow files we created in this tutorial as a reference.