Strelka (Somatic)¶
strelka_somatic
· 1 contributor · 2 versions
Usage: configureStrelkaSomaticWorkflow.py [options] Version: 2.9.10 This script configures Strelka somatic small variant calling. You must specify an alignment file (BAM or CRAM) for each sample of a matched tumor-normal pair. Configuration will produce a workflow run script which can execute the workflow on a single node or through sge and resume any interrupted execution.
Quickstart¶
from janis_bioinformatics.tools.illumina.strelkasomatic.strelkasomatic import StrelkaSomatic_2_9_10 wf = WorkflowBuilder("myworkflow") wf.step( "strelka_somatic_step", StrelkaSomatic_2_9_10( normalBam=None, tumorBam=None, reference=None, ) ) wf.output("configPickle", source=strelka_somatic_step.configPickle) wf.output("script", source=strelka_somatic_step.script) wf.output("stats", source=strelka_somatic_step.stats) wf.output("indels", source=strelka_somatic_step.indels) wf.output("snvs", source=strelka_somatic_step.snvs)
OR
- Install Janis
- Ensure Janis is configured to work with Docker or Singularity.
- Ensure all reference files are available:
Note
More information about these inputs are available below.
- Generate user input files for strelka_somatic:
# user inputs
janis inputs strelka_somatic > inputs.yaml
inputs.yaml
normalBam: normalBam.bam
reference: reference.fasta
tumorBam: tumorBam.bam
- Run strelka_somatic with:
janis run [...run options] \
--inputs inputs.yaml \
strelka_somatic
Information¶
ID: | strelka_somatic |
---|---|
URL: | No URL to the documentation was provided |
Versions: | 2.9.10, 2.9.9 |
Container: | michaelfranklin/strelka:2.9.10 |
Authors: | Michael Franklin |
Citations: | None |
Created: | 2019-05-27 |
Updated: | 2019-10-14 |
Outputs¶
name | type | documentation |
---|---|---|
configPickle | File | |
script | File | |
stats | tsv | A tab-delimited report of various internal statistics from the variant calling process: Runtime information accumulated for each genome segment, excluding auxiliary steps such as BAM indexing and vcf merging. Indel candidacy statistics |
indels | Gzipped<VCF> | |
snvs | Gzipped<VCF> |
Additional configuration (inputs)¶
name | type | prefix | position | documentation |
---|---|---|---|---|
normalBam | IndexedBam | –normalBam= | 1 | Normal sample BAM or CRAM file. (no default) |
tumorBam | IndexedBam | –tumourBam= | 1 | (–tumorBam) Tumor sample BAM or CRAM file. [required] (no default) |
reference | FastaFai | –referenceFasta= | 1 | samtools-indexed reference fasta file [required] |
rundir | Optional<Filename> | –runDir= | 1 | Name of directory to be created where all workflow scripts and output will be written. Each analysis requires a separate directory. (default: StrelkaSomaticWorkflow) |
region | Optional<Array<String>> | –region | 1 | Limit the analysis to one or more genome region(s) for debugging purposes. If this argument is provided multiple times the union of all specified regions will be analyzed. All regions must be non-overlapping to get a meaningful result. Examples: ‘–region chr20’ (whole chromosome), ‘–region chr2:100-2000 –region chr3:2500-3000’ (two regions)’. If this option is specified (one or more times) together with the ‘callRegions’ BED file,then all region arguments will be intersected with the callRegions BED track. |
config | Optional<File> | –config= | 1 | provide a configuration file to override defaults in global config file (/opt/strelka/bin/configureStrelkaSomaticWorkflow.py.ini) |
outputcallableregions | Optional<Boolean> | –outputCallableRegions | 1 | Output a bed file describing somatic callable regions of the genome |
indelCandidates | Optional<Array<Gzipped<VCF>>> | –indelCandidates= | 1 | Specify a VCF of candidate indel alleles. These alleles are always evaluated but only reported in the output when they are inferred to exist in the sample. The VCF must be tabix indexed. All indel alleles must be left-shifted/normalized, any unnormalized alleles will be ignored. This option may be specified more than once, multiple input VCFs will be merged. (default: None) |
forcedgt | Optional<Array<Gzipped<VCF>>> | –forcedGT= | 1 | Specify a VCF of candidate alleles. These alleles are always evaluated and reported even if they are unlikely to exist in the sample. The VCF must be tabix indexed. All indel alleles must be left- shifted/normalized, any unnormalized allele will trigger a runtime error. This option may be specified more than once, multiple input VCFs will be merged. Note that for any SNVs provided in the VCF, the SNV site will be reported (and for gVCF, excluded from block compression), but the specific SNV alleles are ignored. (default: None) |
targeted | Optional<Boolean> | –targeted | 1 | Set options for other targeted input: note in particular that this flag turns off high-depth filters |
exome | Optional<Boolean> | –exome | 1 | Set options for exome: note in particular that this flag turns off high-depth filters |
callRegions | Optional<Gzipped<bed>> | –callRegions= | 1 | Optionally provide a bgzip-compressed/tabix-indexed BED file containing the set of regions to call. No VCF output will be provided outside of these regions. The full genome will still be used to estimate statistics from the input (such as expected depth per chromosome). Only one BED file may be specified. (default: call the entire genome) |
noisevcf | Optional<Gzipped<VCF>> | –noiseVcf= | 1 | Noise vcf file (submit argument multiple times for more than one file) |
scansizemb | Optional<Integer> | –scanSizeMb= | 1 | Maximum sequence region size (in megabases) scanned by each task during genome variant calling. (default: 12) |
callmemmb | Optional<Integer> | –callMemMb= | 1 | Set variant calling task memory limit (in megabytes). It is not recommended to change the default in most cases, but this might be required for a sample of unusual depth. |
retaintempfiles | Optional<Boolean> | –retainTempFiles | 1 | Keep all temporary files (for workflow debugging) |
disableevs | Optional<Boolean> | –disableEVS | 1 | Disable empirical variant scoring (EVS). |
reportevsfeatures | Optional<Boolean> | –reportEVSFeatures | 1 | Report all empirical variant scoring features in VCF output. |
snvscoringmodelfile | Optional<File> | –snvScoringModelFile= | 1 | Provide a custom empirical scoring model file for SNVs (default: /opt/strelka/share/config/somaticSNVScoringM odels.json) |
indelscoringmodelfile | Optional<File> | –indelScoringModelFile= | 1 | Provide a custom empirical scoring model file for indels (default: /opt/strelka/share/config/somaticInde lScoringModels.json) |
mode | Optional<String> | –mode | 3 | (-m MODE) select run mode (local|sge) |
queue | Optional<String> | –queue | 3 | (-q QUEUE) specify scheduler queue name |
memGb | Optional<String> | –memGb | 3 | (-g MEMGB) gigabytes of memory available to run workflow – only meaningful in local mode, must be an integer (default: Estimate the total memory for this node for local mode, ‘unlimited’ for sge mode) |
quiet | Optional<Boolean> | –quiet | 3 | Don’t write any log output to stderr (but still write to workspace/pyflow.data/logs/pyflow_log.txt) |
Workflow Description Language¶
version development
task strelka_somatic {
input {
Int? runtime_cpu
Int? runtime_memory
Int? runtime_seconds
Int? runtime_disks
File normalBam
File normalBam_bai
File tumorBam
File tumorBam_bai
File reference
File reference_fai
String? rundir
Array[String]? region
File? config
Boolean? outputcallableregions
Array[File]? indelCandidates
Array[File]? indelCandidates_tbi
Array[File]? forcedgt
Array[File]? forcedgt_tbi
Boolean? targeted
Boolean? exome
File? callRegions
File? callRegions_tbi
File? noisevcf
File? noisevcf_tbi
Int? scansizemb
Int? callmemmb
Boolean? retaintempfiles
Boolean? disableevs
Boolean? reportevsfeatures
File? snvscoringmodelfile
File? indelscoringmodelfile
String? mode
String? queue
String? memGb
Boolean? quiet
}
command <<<
set -e
\
'configureStrelkaSomaticWorkflow.py' \
--normalBam='~{normalBam}' \
--tumourBam='~{tumorBam}' \
--referenceFasta='~{reference}' \
--runDir='~{select_first([rundir, "generated"])}' \
~{if (defined(region) && length(select_first([region])) > 0) then "--region '" + sep("' --region '", select_first([region])) + "'" else ""} \
~{if defined(config) then ("--config='" + config + "'") else ""} \
~{if (defined(outputcallableregions) && select_first([outputcallableregions])) then "--outputCallableRegions" else ""} \
~{if (defined(indelCandidates) && length(select_first([indelCandidates])) > 0) then "--indelCandidates='" + sep("' --indelCandidates='", select_first([indelCandidates])) + "'" else ""} \
~{if (defined(forcedgt) && length(select_first([forcedgt])) > 0) then "--forcedGT='" + sep("' --forcedGT='", select_first([forcedgt])) + "'" else ""} \
~{if (defined(targeted) && select_first([targeted])) then "--targeted" else ""} \
~{if (defined(exome) && select_first([exome])) then "--exome" else ""} \
~{if defined(callRegions) then ("--callRegions='" + callRegions + "'") else ""} \
~{if defined(noisevcf) then ("--noiseVcf='" + noisevcf + "'") else ""} \
~{if defined(scansizemb) then ("--scanSizeMb=" + scansizemb) else ''} \
~{if defined(callmemmb) then ("--callMemMb=" + callmemmb) else ''} \
~{if select_first([retaintempfiles, false]) then "--retainTempFiles" else ""} \
~{if (defined(disableevs) && select_first([disableevs])) then "--disableEVS" else ""} \
~{if (defined(reportevsfeatures) && select_first([reportevsfeatures])) then "--reportEVSFeatures" else ""} \
~{if defined(snvscoringmodelfile) then ("--snvScoringModelFile='" + snvscoringmodelfile + "'") else ""} \
~{if defined(indelscoringmodelfile) then ("--indelScoringModelFile='" + indelscoringmodelfile + "'") else ""} \
;~{select_first([rundir, "generated"])}/runWorkflow.py \
~{if defined(select_first([mode, "local"])) then ("--mode " + select_first([mode, "local"])) else ''} \
~{if defined(queue) then ("--queue " + queue) else ''} \
~{if defined(memGb) then ("--memGb " + memGb) else ''} \
~{if (defined(quiet) && select_first([quiet])) then "--quiet" else ""} \
--jobs ~{select_first([runtime_cpu, 4])}
>>>
runtime {
cpu: select_first([runtime_cpu, 4, 1])
disks: "local-disk ~{select_first([runtime_disks, 20])} SSD"
docker: "michaelfranklin/strelka:2.9.10"
duration: select_first([runtime_seconds, 86400])
memory: "~{select_first([runtime_memory, 4, 4])}G"
preemptible: 2
}
output {
File configPickle = (select_first([rundir, "generated"]) + "/runWorkflow.py.config.pickle")
File script = (select_first([rundir, "generated"]) + "/runWorkflow.py")
File stats = (select_first([rundir, "generated"]) + "/results/stats/runStats.tsv")
File indels = (select_first([rundir, "generated"]) + "/results/variants/somatic.indels.vcf.gz")
File indels_tbi = (select_first([rundir, "generated"]) + "/results/variants/somatic.indels.vcf.gz") + ".tbi"
File snvs = (select_first([rundir, "generated"]) + "/results/variants/somatic.snvs.vcf.gz")
File snvs_tbi = (select_first([rundir, "generated"]) + "/results/variants/somatic.snvs.vcf.gz") + ".tbi"
}
}
Common Workflow Language¶
#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
label: Strelka (Somatic)
doc: |-
Usage: configureStrelkaSomaticWorkflow.py [options]
Version: 2.9.10
This script configures Strelka somatic small variant calling.
You must specify an alignment file (BAM or CRAM) for each sample of a matched tumor-normal pair.
Configuration will produce a workflow run script which can execute the workflow on a single node or through
sge and resume any interrupted execution.
requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
dockerPull: michaelfranklin/strelka:2.9.10
inputs:
- id: normalBam
label: normalBam
doc: Normal sample BAM or CRAM file. (no default)
type: File
secondaryFiles:
- pattern: .bai
inputBinding:
prefix: --normalBam=
position: 1
separate: false
- id: tumorBam
label: tumorBam
doc: (--tumorBam) Tumor sample BAM or CRAM file. [required] (no default)
type: File
secondaryFiles:
- pattern: .bai
inputBinding:
prefix: --tumourBam=
position: 1
separate: false
- id: reference
label: reference
doc: ' samtools-indexed reference fasta file [required]'
type: File
secondaryFiles:
- pattern: .fai
inputBinding:
prefix: --referenceFasta=
position: 1
separate: false
- id: rundir
label: rundir
doc: |-
Name of directory to be created where all workflow scripts and output will be written. Each analysis requires a separate directory. (default: StrelkaSomaticWorkflow)
type:
- string
- 'null'
default: generated
inputBinding:
prefix: --runDir=
position: 1
separate: false
- id: region
label: region
doc: |-
Limit the analysis to one or more genome region(s) for debugging purposes. If this argument is provided multiple times the union of all specified regions will be analyzed. All regions must be non-overlapping to get a meaningful result. Examples: '--region chr20' (whole chromosome), '--region chr2:100-2000 --region chr3:2500-3000' (two regions)'. If this option is specified (one or more times) together with the 'callRegions' BED file,then all region arguments will be intersected with the callRegions BED track.
type:
- type: array
inputBinding:
prefix: --region
items: string
- 'null'
inputBinding:
position: 1
- id: config
label: config
doc: |-
provide a configuration file to override defaults in global config file (/opt/strelka/bin/configureStrelkaSomaticWorkflow.py.ini)
type:
- File
- 'null'
inputBinding:
prefix: --config=
position: 1
separate: false
- id: outputcallableregions
label: outputcallableregions
doc: Output a bed file describing somatic callable regions of the genome
type:
- boolean
- 'null'
inputBinding:
prefix: --outputCallableRegions
position: 1
separate: true
- id: indelCandidates
label: indelCandidates
doc: |-
Specify a VCF of candidate indel alleles. These alleles are always evaluated but only reported in the output when they are inferred to exist in the sample. The VCF must be tabix indexed. All indel alleles must be left-shifted/normalized, any unnormalized alleles will be ignored. This option may be specified more than once, multiple input VCFs will be merged. (default: None)
type:
- type: array
inputBinding:
prefix: --indelCandidates=
separate: false
items: File
- 'null'
inputBinding:
position: 1
- id: forcedgt
label: forcedgt
doc: |-
Specify a VCF of candidate alleles. These alleles are always evaluated and reported even if they are unlikely to exist in the sample. The VCF must be tabix indexed. All indel alleles must be left- shifted/normalized, any unnormalized allele will trigger a runtime error. This option may be specified more than once, multiple input VCFs will be merged. Note that for any SNVs provided in the VCF, the SNV site will be reported (and for gVCF, excluded from block compression), but the specific SNV alleles are ignored. (default: None)
type:
- type: array
inputBinding:
prefix: --forcedGT=
separate: false
items: File
- 'null'
inputBinding:
position: 1
- id: targeted
label: targeted
doc: |-
Set options for other targeted input: note in particular that this flag turns off high-depth filters
type:
- boolean
- 'null'
inputBinding:
prefix: --targeted
position: 1
separate: true
- id: exome
label: exome
doc: |-
Set options for exome: note in particular that this flag turns off high-depth filters
type:
- boolean
- 'null'
inputBinding:
prefix: --exome
position: 1
separate: true
- id: callRegions
label: callRegions
doc: |-
Optionally provide a bgzip-compressed/tabix-indexed BED file containing the set of regions to call. No VCF output will be provided outside of these regions. The full genome will still be used to estimate statistics from the input (such as expected depth per chromosome). Only one BED file may be specified. (default: call the entire genome)
type:
- File
- 'null'
secondaryFiles:
- pattern: .tbi
inputBinding:
prefix: --callRegions=
position: 1
separate: false
- id: noisevcf
label: noisevcf
doc: Noise vcf file (submit argument multiple times for more than one file)
type:
- File
- 'null'
secondaryFiles:
- pattern: .tbi
inputBinding:
prefix: --noiseVcf=
position: 1
separate: false
- id: scansizemb
label: scansizemb
doc: |-
Maximum sequence region size (in megabases) scanned by each task during genome variant calling. (default: 12)
type:
- int
- 'null'
inputBinding:
prefix: --scanSizeMb=
position: 1
separate: false
- id: callmemmb
label: callmemmb
doc: |-
Set variant calling task memory limit (in megabytes). It is not recommended to change the default in most cases, but this might be required for a sample of unusual depth.
type:
- int
- 'null'
inputBinding:
prefix: --callMemMb=
position: 1
separate: false
- id: retaintempfiles
label: retaintempfiles
doc: Keep all temporary files (for workflow debugging)
type: boolean
default: false
inputBinding:
prefix: --retainTempFiles
position: 1
separate: true
- id: disableevs
label: disableevs
doc: Disable empirical variant scoring (EVS).
type:
- boolean
- 'null'
inputBinding:
prefix: --disableEVS
position: 1
separate: true
- id: reportevsfeatures
label: reportevsfeatures
doc: ' Report all empirical variant scoring features in VCF output.'
type:
- boolean
- 'null'
inputBinding:
prefix: --reportEVSFeatures
position: 1
separate: true
- id: snvscoringmodelfile
label: snvscoringmodelfile
doc: |2-
Provide a custom empirical scoring model file for SNVs (default: /opt/strelka/share/config/somaticSNVScoringM odels.json)
type:
- File
- 'null'
inputBinding:
prefix: --snvScoringModelFile=
position: 1
separate: false
- id: indelscoringmodelfile
label: indelscoringmodelfile
doc: |2-
Provide a custom empirical scoring model file for indels (default: /opt/strelka/share/config/somaticInde lScoringModels.json)
type:
- File
- 'null'
inputBinding:
prefix: --indelScoringModelFile=
position: 1
separate: false
- id: mode
label: mode
doc: (-m MODE) select run mode (local|sge)
type: string
default: local
inputBinding:
prefix: --mode
position: 3
shellQuote: false
- id: queue
label: queue
doc: (-q QUEUE) specify scheduler queue name
type:
- string
- 'null'
inputBinding:
prefix: --queue
position: 3
shellQuote: false
- id: memGb
label: memGb
doc: |2-
(-g MEMGB) gigabytes of memory available to run workflow -- only meaningful in local mode, must be an integer (default: Estimate the total memory for this node for local mode, 'unlimited' for sge mode)
type:
- string
- 'null'
inputBinding:
prefix: --memGb
position: 3
shellQuote: false
- id: quiet
label: quiet
doc: |-
Don't write any log output to stderr (but still write to workspace/pyflow.data/logs/pyflow_log.txt)
type:
- boolean
- 'null'
inputBinding:
prefix: --quiet
position: 3
shellQuote: false
outputs:
- id: configPickle
label: configPickle
type: File
outputBinding:
glob: $((inputs.rundir + "/runWorkflow.py.config.pickle"))
outputEval: $((inputs.rundir.basename + "/runWorkflow.py.config.pickle"))
loadContents: false
- id: script
label: script
type: File
outputBinding:
glob: $((inputs.rundir + "/runWorkflow.py"))
outputEval: $((inputs.rundir.basename + "/runWorkflow.py"))
loadContents: false
- id: stats
label: stats
doc: |-
A tab-delimited report of various internal statistics from the variant calling process: Runtime information accumulated for each genome segment, excluding auxiliary steps such as BAM indexing and vcf merging. Indel candidacy statistics
type: File
outputBinding:
glob: $((inputs.rundir + "/results/stats/runStats.tsv"))
outputEval: $((inputs.rundir.basename + "/results/stats/runStats.tsv"))
loadContents: false
- id: indels
label: indels
doc: ''
type: File
secondaryFiles:
- pattern: .tbi
outputBinding:
glob: $((inputs.rundir + "/results/variants/somatic.indels.vcf.gz"))
outputEval: $((inputs.rundir.basename + "/results/variants/somatic.indels.vcf.gz"))
loadContents: false
- id: snvs
label: snvs
doc: ''
type: File
secondaryFiles:
- pattern: .tbi
outputBinding:
glob: $((inputs.rundir + "/results/variants/somatic.snvs.vcf.gz"))
outputEval: $((inputs.rundir.basename + "/results/variants/somatic.snvs.vcf.gz"))
loadContents: false
stdout: _stdout
stderr: _stderr
arguments:
- position: 0
valueFrom: configureStrelkaSomaticWorkflow.py
- position: 2
valueFrom: $(";{rundir}/runWorkflow.py".replace(/\{rundir\}/g, inputs.rundir))
shellQuote: false
- prefix: --jobs
position: 3
valueFrom: $([inputs.runtime_cpu, 4].filter(function (inner) { return inner != null
})[0])
shellQuote: false
hints:
- class: ToolTimeLimit
timelimit: |-
$([inputs.runtime_seconds, 86400].filter(function (inner) { return inner != null })[0])
id: strelka_somatic