Vardict (Somatic)

vardict_somatic · 1 contributor · 5 versions

from import VarDictSomatic_1_7_0

wf = WorkflowBuilder("myworkflow")

wf.output("out", source=vardict_somatic_step.out)


  1. Install Janis
  2. Ensure Janis is configured to work with Docker or Singularity.
  3. Ensure all reference files are available:


More information about these inputs are available below.

  1. Generate user input files for vardict_somatic:
# user inputs
janis inputs vardict_somatic > inputs.yaml


intervals: intervals.bed
normalBam: normalBam.bam
normalName: <value>
reference: reference.fasta
tumorBam: tumorBam.bam
tumorName: <value>
  1. Run vardict_somatic with:
janis run [ options] \
    --inputs inputs.yaml \


URL:No URL to the documentation was provided
Versions:1.7.0, 1.6.0, 1.5.8, 1.5.7, 1.5.6
Authors:Michael Franklin


name type documentation
out VCF  

Additional configuration (inputs)

name type prefix position documentation
tumorBam IndexedBam     The indexed BAM file
normalBam IndexedBam     The indexed BAM file
intervals bed   2  
reference FastaFai -G 1 The reference fasta. Should be indexed (.fai). Defaults to: /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa
tumorName String     The sample name to be used directly. Will overwrite -n option
normalName String     The normal sample name to use with the -b option
alleleFreqThreshold Optional<Float>     The threshold for allele frequency, default: 0.05 or 5%
outputFilename Optional<Filename> > 6  
indels3prime Optional<Boolean> -3 1 Indicate to move indels to 3-prime if alternative alignment can be achieved.
amplicon Optional<Float> -a 1 Indicate it’s amplicon based calling. Reads that don’t map to the amplicon will be skipped. A read pair is considered belonging to the amplicon if the edges are less than int bp to the amplicon, and overlap fraction is at least float. Default: 10:0.95
minReads Optional<Integer> -B 1 The minimum # of reads to determine strand bias, default 2
chromNamesAreNumbers Optional<Boolean> -C 1 Indicate the chromosome names are just numbers, such as 1, 2, not chr1, chr2
chromColumn Optional<Integer> -c 1 The column for chromosome
debug Optional<Boolean> -D 1 Debug mode. Will print some error messages and append full genotype at the end.
splitDelimeter Optional<String> -d 1 The delimiter for split region_info, default to tab ” “
geneEndCol Optional<Integer> -E 1 The column for region end, e.g. gene end
segEndCol Optional<Integer> -e 1 The column for segment ends in the region, e.g. exon ends
filter Optional<String> -F 1 The hexical to filter reads using samtools. Default: 0x500 (filter 2nd alignments and duplicates). Use -F 0 to turn it off.
geneNameCol Optional<Integer> -g 1 The column for gene name, or segment annotation
printHeaderRow Optional<Boolean> -h 1 Print a header row describing columns
indelSize Optional<Integer> -I 1 The indel size. Default: 120bp
outputSplice Optional<Boolean> -i 1 Output splicing read counts
performLocalRealignment Optional<Integer> -k 1 Indicate whether to perform local realignment. Default: 1. Set to 0 to disable it. For Ion or PacBio, 0 is recommended.
minMatches Optional<Integer> -M 1 The minimum matches for a read to be considered. If, after soft-clipping, the matched bp is less than INT, then the read is discarded. It’s meant for PCR based targeted sequencing where there’s no insert and the matching is only the primers. Default: 0, or no filtering
maxMismatches Optional<Integer> -m 1 If set, reads with mismatches more than INT will be filtered and ignored. Gaps are not counted as mismatches. Valid only for bowtie2/TopHat or BWA aln followed by sampe. BWA mem is calculated as NM - Indels. Default: 8, or reads with more than 8 mismatches will not be used.
regexSampleName Optional<String> -n 1 The regular expression to extract sample name from BAM filenames. Default to: /([^/._]+?)_[^/]*.bam/
mapq Optional<String> -O 1 The reads should have at least mean MapQ to be considered a valid variant. Default: no filtering
qratio Optional<Float> -o 1 The Qratio of (good_quality_reads)/(bad_quality_reads+0.5). The quality is defined by -q option. Default: 1.5
readPosition Optional<Float> -P 1 The read position filter. If the mean variants position is less that specified, it’s considered false positive. Default: 5
pileup Optional<Boolean> -p 1 Do pileup regardless of the frequency
minMappingQual Optional<Integer> -Q 1 If set, reads with mapping quality less than INT will be filtered and ignored
phredScore Optional<Integer> -q 1 The phred score for a base to be considered a good call. Default: 25 (for Illumina) For PGM, set it to ~15, as PGM tends to under estimate base quality.
region Optional<String> -R 1 The region of interest. In the format of chr:start-end. If end is omitted, then a single position. No BED is needed.
minVariantReads Optional<Integer> -r 1 The minimum # of variant reads, default 2
regStartCol Optional<Integer> -S 1 The column for region start, e.g. gene start
segStartCol Optional<Integer> -s 1 The column for segment starts in the region, e.g. exon starts
minReadsBeforeTrim Optional<Integer> -T 1 Trim bases after [INT] bases in the reads
removeDuplicateReads Optional<Boolean> -t 1 Indicate to remove duplicated reads. Only one pair with same start positions will be kept
threads Optional<Integer> -th 1 Threads count.
freq Optional<Integer> -V 1 The lowest frequency in the normal sample allowed for a putative somatic mutation. Defaults to 0.05
vcfFormat Optional<Boolean> -v 1 VCF format output
vs Optional<String> -VS 1 [STRICT | LENIENT | SILENT] How strict to be when reading a SAM or BAM: STRICT - throw an exception if something looks wrong. LENIENT - Emit warnings but keep going if possible. SILENT - Like LENIENT, only don’t emit warning messages. Default: LENIENT
bp Optional<Integer> -X 1 Extension of bp to look for mismatches after insersion or deletion. Default to 3 bp, or only calls when they’re within 3 bp.
extensionNucleotide Optional<Integer> -x 1 The number of nucleotide to extend for each segment, default: 0
yy Optional<Boolean> -y 1 <No content>
downsamplingFraction Optional<Integer> -Z 1 For downsampling fraction. e.g. 0.7 means roughly 70% downsampling. Default: No downsampling. Use with caution. The downsampling will be random and non-reproducible.
zeroBasedCoords Optional<Integer> -z 1 0/1 Indicate whether coordinates are zero-based, as IGV uses. Default: 1 for BED file or amplicon BED file. Use 0 to turn it off. When using the -R option, it’s set to 0

Workflow Description Language

version development

task vardict_somatic {
  input {
    Int? runtime_cpu
    Int? runtime_memory
    Int? runtime_seconds
    Int? runtime_disks
    File tumorBam
    File tumorBam_bai
    File normalBam
    File normalBam_bai
    File intervals
    File reference
    File reference_fai
    String tumorName
    String normalName
    Float? alleleFreqThreshold
    String? outputFilename
    Boolean? indels3prime
    Float? amplicon
    Int? minReads
    Boolean? chromNamesAreNumbers
    Int? chromColumn
    Boolean? debug
    String? splitDelimeter
    Int? geneEndCol
    Int? segEndCol
    String? filter
    Int? geneNameCol
    Boolean? printHeaderRow
    Int? indelSize
    Boolean? outputSplice
    Int? performLocalRealignment
    Int? minMatches
    Int? maxMismatches
    String? regexSampleName
    String? mapq
    Float? qratio
    Float? readPosition
    Boolean? pileup
    Int? minMappingQual
    Int? phredScore
    String? region
    Int? minVariantReads
    Int? regStartCol
    Int? segStartCol
    Int? minReadsBeforeTrim
    Boolean? removeDuplicateReads
    Int? threads
    Int? freq
    Boolean? vcfFormat
    String? vs
    Int? bp
    Int? extensionNucleotide
    Boolean? yy
    Int? downsamplingFraction
    Int? zeroBasedCoords
  command <<<
    set -e
    VarDict \
      -G ~{reference} \
      ~{if (defined(indels3prime) && select_first([indels3prime])) then "-3" else ""} \
      ~{if defined(amplicon) then ("-a " + amplicon) else ''} \
      ~{if defined(minReads) then ("-B " + minReads) else ''} \
      ~{if (defined(chromNamesAreNumbers) && select_first([chromNamesAreNumbers])) then "-C" else ""} \
      ~{if defined(chromColumn) then ("-c " + chromColumn) else ''} \
      ~{if (defined(debug) && select_first([debug])) then "-D" else ""} \
      ~{if defined(splitDelimeter) then ("-d " + splitDelimeter) else ''} \
      ~{if defined(geneEndCol) then ("-E " + geneEndCol) else ''} \
      ~{if defined(segEndCol) then ("-e " + segEndCol) else ''} \
      ~{if defined(filter) then ("-F " + filter) else ''} \
      ~{if defined(geneNameCol) then ("-g " + geneNameCol) else ''} \
      ~{if (defined(printHeaderRow) && select_first([printHeaderRow])) then "-h" else ""} \
      ~{if defined(indelSize) then ("-I " + indelSize) else ''} \
      ~{if (defined(outputSplice) && select_first([outputSplice])) then "-i" else ""} \
      ~{if defined(performLocalRealignment) then ("-k " + performLocalRealignment) else ''} \
      ~{if defined(minMatches) then ("-M " + minMatches) else ''} \
      ~{if defined(maxMismatches) then ("-m " + maxMismatches) else ''} \
      ~{if defined(regexSampleName) then ("-n " + regexSampleName) else ''} \
      ~{if defined(mapq) then ("-O " + mapq) else ''} \
      ~{if defined(qratio) then ("-o " + qratio) else ''} \
      ~{if defined(readPosition) then ("-P " + readPosition) else ''} \
      ~{if (defined(pileup) && select_first([pileup])) then "-p" else ""} \
      ~{if defined(minMappingQual) then ("-Q " + minMappingQual) else ''} \
      ~{if defined(phredScore) then ("-q " + phredScore) else ''} \
      ~{if defined(region) then ("-R " + region) else ''} \
      ~{if defined(minVariantReads) then ("-r " + minVariantReads) else ''} \
      ~{if defined(regStartCol) then ("-S " + regStartCol) else ''} \
      ~{if defined(segStartCol) then ("-s " + segStartCol) else ''} \
      ~{if defined(minReadsBeforeTrim) then ("-T " + minReadsBeforeTrim) else ''} \
      ~{if (defined(removeDuplicateReads) && select_first([removeDuplicateReads])) then "-t" else ""} \
      ~{if defined(select_first([threads, select_first([runtime_cpu, 1])])) then ("-th " + select_first([threads, select_first([runtime_cpu, 1])])) else ''} \
      ~{if defined(freq) then ("-V " + freq) else ''} \
      ~{if (defined(vcfFormat) && select_first([vcfFormat])) then "-v" else ""} \
      ~{if defined(vs) then ("-VS " + vs) else ''} \
      ~{if defined(bp) then ("-X " + bp) else ''} \
      ~{if defined(extensionNucleotide) then ("-x " + extensionNucleotide) else ''} \
      ~{if (defined(yy) && select_first([yy])) then "-y" else ""} \
      ~{if defined(downsamplingFraction) then ("-Z " + downsamplingFraction) else ''} \
      ~{if defined(zeroBasedCoords) then ("-z " + zeroBasedCoords) else ''} \
      -b '~{sep("|", [tumorBam, normalBam])}' \
      -N '~{tumorName}' \
      -f ~{alleleFreqThreshold} \
      ~{intervals} \
      | testsomatic.R | \ \
      -N '~{sep("|", [tumorName, normalName])}' \
      -f ~{alleleFreqThreshold} \
      > ~{select_first([outputFilename, "generated.vardict.vcf"])}
  runtime {
    cpu: select_first([runtime_cpu, 4, 1])
    disks: "local-disk ~{select_first([runtime_disks, 20])} SSD"
    docker: "michaelfranklin/vardict:1.7.0"
    duration: select_first([runtime_seconds, 86400])
    memory: "~{select_first([runtime_memory, 8, 4])}G"
    preemptible: 2
  output {
    File out = select_first([outputFilename, "generated.vardict.vcf"])

Common Workflow Language

#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
label: Vardict (Somatic)
doc: ''

- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: michaelfranklin/vardict:1.7.0

- id: tumorBam
  label: tumorBam
  doc: The indexed BAM file
  type: File
  - pattern: .bai
- id: normalBam
  label: normalBam
  doc: The indexed BAM file
  type: File
  - pattern: .bai
- id: intervals
  label: intervals
  type: File
    position: 2
    shellQuote: false
- id: reference
  label: reference
  doc: |-
    The reference fasta. Should be indexed (.fai). Defaults to: /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa
  type: File
  - pattern: .fai
    prefix: -G
    position: 1
    shellQuote: false
- id: tumorName
  label: tumorName
  doc: The sample name to be used directly.  Will overwrite -n option
  type: string
- id: normalName
  label: normalName
  doc: The normal sample name to use with the -b option
  type: string
- id: alleleFreqThreshold
  label: alleleFreqThreshold
  doc: 'The threshold for allele frequency, default: 0.05 or 5%'
  - float
  - 'null'
- id: outputFilename
  label: outputFilename
  - string
  - 'null'
  default: generated.vardict.vcf
    prefix: '>'
    position: 6
    shellQuote: false
- id: indels3prime
  label: indels3prime
  doc: Indicate to move indels to 3-prime if alternative alignment can be achieved.
  - boolean
  - 'null'
    prefix: '-3'
    position: 1
    shellQuote: false
- id: amplicon
  label: amplicon
  doc: |-
    Indicate it's amplicon based calling.  Reads that don't map to the amplicon will be skipped.  A read pair is considered belonging  to the amplicon if the edges are less than int bp to the amplicon, and overlap fraction is at least float.  Default: 10:0.95
  - float
  - 'null'
    prefix: -a
    position: 1
    shellQuote: false
- id: minReads
  label: minReads
  doc: 'The minimum # of reads to determine strand bias, default 2'
  - int
  - 'null'
    prefix: -B
    position: 1
    shellQuote: false
- id: chromNamesAreNumbers
  label: chromNamesAreNumbers
  doc: Indicate the chromosome names are just numbers, such as 1, 2, not chr1, chr2
  - boolean
  - 'null'
    prefix: -C
    position: 1
    shellQuote: false
- id: chromColumn
  label: chromColumn
  doc: The column for chromosome
  - int
  - 'null'
    prefix: -c
    position: 1
    shellQuote: false
- id: debug
  label: debug
  doc: Debug mode.  Will print some error messages and append full genotype at the
  - boolean
  - 'null'
    prefix: -D
    position: 1
    shellQuote: false
- id: splitDelimeter
  label: splitDelimeter
  doc: "The delimiter for split region_info, default to tab \"\t\""
  - string
  - 'null'
    prefix: -d
    position: 1
    shellQuote: false
- id: geneEndCol
  label: geneEndCol
  doc: The column for region end, e.g. gene end
  - int
  - 'null'
    prefix: -E
    position: 1
    shellQuote: false
- id: segEndCol
  label: segEndCol
  doc: The column for segment ends in the region, e.g. exon ends
  - int
  - 'null'
    prefix: -e
    position: 1
    shellQuote: false
- id: filter
  label: filter
  doc: |-
    The hexical to filter reads using samtools. Default: 0x500 (filter 2nd alignments and duplicates). Use -F 0 to turn it off.
  - string
  - 'null'
    prefix: -F
    position: 1
    shellQuote: false
- id: geneNameCol
  label: geneNameCol
  doc: The column for gene name, or segment annotation
  - int
  - 'null'
    prefix: -g
    position: 1
    shellQuote: false
- id: printHeaderRow
  label: printHeaderRow
  doc: Print a header row describing columns
  - boolean
  - 'null'
    prefix: -h
    position: 1
    shellQuote: false
- id: indelSize
  label: indelSize
  doc: 'The indel size.  Default: 120bp'
  - int
  - 'null'
    prefix: -I
    position: 1
    shellQuote: false
- id: outputSplice
  label: outputSplice
  doc: Output splicing read counts
  - boolean
  - 'null'
    prefix: -i
    position: 1
    shellQuote: false
- id: performLocalRealignment
  label: performLocalRealignment
  doc: |-
    Indicate whether to perform local realignment.  Default: 1.  Set to 0 to disable it. For Ion or PacBio, 0 is recommended.
  - int
  - 'null'
    prefix: -k
    position: 1
    shellQuote: false
- id: minMatches
  label: minMatches
  doc: |-
    The minimum matches for a read to be considered. If, after soft-clipping, the matched bp is less than INT, then the read is discarded. It's meant for PCR based targeted sequencing where there's no insert and the matching is only the primers. Default: 0, or no filtering
  - int
  - 'null'
    prefix: -M
    position: 1
    shellQuote: false
- id: maxMismatches
  label: maxMismatches
  doc: |-
    If set, reads with mismatches more than INT will be filtered and ignored. Gaps are not counted as mismatches. Valid only for bowtie2/TopHat or BWA aln followed by sampe. BWA mem is calculated as NM - Indels. Default: 8, or reads with more than 8 mismatches will not be used.
  - int
  - 'null'
    prefix: -m
    position: 1
    shellQuote: false
- id: regexSampleName
  label: regexSampleName
  doc: |-
    The regular expression to extract sample name from BAM filenames. Default to: /([^\/\._]+?)_[^\/]*.bam/
  - string
  - 'null'
    prefix: -n
    position: 1
    shellQuote: false
- id: mapq
  label: mapq
  doc: |-
    The reads should have at least mean MapQ to be considered a valid variant. Default: no filtering
  - string
  - 'null'
    prefix: -O
    position: 1
    shellQuote: false
- id: qratio
  label: qratio
  doc: |-
    The Qratio of (good_quality_reads)/(bad_quality_reads+0.5). The quality is defined by -q option.  Default: 1.5
  - float
  - 'null'
    prefix: -o
    position: 1
    shellQuote: false
- id: readPosition
  label: readPosition
  doc: |-
    The read position filter. If the mean variants position is less that specified, it's considered false positive.  Default: 5
  - float
  - 'null'
    prefix: -P
    position: 1
    shellQuote: false
- id: pileup
  label: pileup
  doc: Do pileup regardless of the frequency
  - boolean
  - 'null'
    prefix: -p
    position: 1
    shellQuote: false
- id: minMappingQual
  label: minMappingQual
  doc: If set, reads with mapping quality less than INT will be filtered and ignored
  - int
  - 'null'
    prefix: -Q
    position: 1
    shellQuote: false
- id: phredScore
  label: phredScore
  doc: |-
    The phred score for a base to be considered a good call.  Default: 25 (for Illumina) For PGM, set it to ~15, as PGM tends to under estimate base quality.
  - int
  - 'null'
    prefix: -q
    position: 1
    shellQuote: false
- id: region
  label: region
  doc: |-
    The region of interest.  In the format of chr:start-end.  If end is omitted, then a single position.  No BED is needed.
  - string
  - 'null'
    prefix: -R
    position: 1
    shellQuote: false
- id: minVariantReads
  label: minVariantReads
  doc: 'The minimum # of variant reads, default 2'
  - int
  - 'null'
    prefix: -r
    position: 1
    shellQuote: false
- id: regStartCol
  label: regStartCol
  doc: The column for region start, e.g. gene start
  - int
  - 'null'
    prefix: -S
    position: 1
    shellQuote: false
- id: segStartCol
  label: segStartCol
  doc: The column for segment starts in the region, e.g. exon starts
  - int
  - 'null'
    prefix: -s
    position: 1
    shellQuote: false
- id: minReadsBeforeTrim
  label: minReadsBeforeTrim
  doc: Trim bases after [INT] bases in the reads
  - int
  - 'null'
    prefix: -T
    position: 1
    shellQuote: false
- id: removeDuplicateReads
  label: removeDuplicateReads
  doc: |-
    Indicate to remove duplicated reads.  Only one pair with same start positions will be kept
  - boolean
  - 'null'
    prefix: -t
    position: 1
    shellQuote: false
- id: threads
  label: threads
  doc: Threads count.
  - int
  - 'null'
    prefix: -th
    position: 1
    valueFrom: |-
      $([inputs.runtime_cpu, 4, 1].filter(function (inner) { return inner != null })[0])
    shellQuote: false
- id: freq
  label: freq
  doc: |-
    The lowest frequency in the normal sample allowed for a putative somatic mutation. Defaults to 0.05
  - int
  - 'null'
    prefix: -V
    position: 1
    shellQuote: false
- id: vcfFormat
  label: vcfFormat
  doc: VCF format output
  - boolean
  - 'null'
    prefix: -v
    position: 1
    shellQuote: false
- id: vs
  label: vs
  doc: |-
    [STRICT | LENIENT | SILENT] How strict to be when reading a SAM or BAM: STRICT   - throw an exception if something looks wrong. LENIENT  - Emit warnings but keep going if possible. SILENT      - Like LENIENT, only don't emit warning messages. Default: LENIENT
  - string
  - 'null'
    prefix: -VS
    position: 1
    shellQuote: false
- id: bp
  label: bp
  doc: |-
    Extension of bp to look for mismatches after insersion or deletion.  Default to 3 bp, or only calls when they're within 3 bp.
  - int
  - 'null'
    prefix: -X
    position: 1
    shellQuote: false
- id: extensionNucleotide
  label: extensionNucleotide
  doc: 'The number of nucleotide to extend for each segment, default: 0'
  - int
  - 'null'
    prefix: -x
    position: 1
    shellQuote: false
- id: yy
  label: yy
  doc: <No content>
  - boolean
  - 'null'
    prefix: -y
    position: 1
    shellQuote: false
- id: downsamplingFraction
  label: downsamplingFraction
  doc: |-
    For downsampling fraction.  e.g. 0.7 means roughly 70% downsampling.  Default: No downsampling.  Use with caution.  The downsampling will be random and non-reproducible.
  - int
  - 'null'
    prefix: -Z
    position: 1
    shellQuote: false
- id: zeroBasedCoords
  label: zeroBasedCoords
  doc: |-
    0/1  Indicate whether coordinates are zero-based, as IGV uses.  Default: 1 for BED file or amplicon BED file. Use 0 to turn it off. When using the -R option, it's set to 0
  - int
  - 'null'
    prefix: -z
    position: 1
    shellQuote: false

- id: out
  label: out
  type: File
    glob: generated.vardict.vcf
    loadContents: false
stdout: _stdout
stderr: _stderr

baseCommand: VarDict
- position: 3
  valueFrom: '| testsomatic.R |'
  shellQuote: false
- position: 4
  shellQuote: false
- prefix: -b
  position: 1
  valueFrom: $([inputs.tumorBam, inputs.normalBam].join("|"))
  shellQuote: true
- prefix: -N
  position: 1
  valueFrom: $(inputs.tumorName)
  shellQuote: true
- prefix: -N
  position: 5
  valueFrom: $([inputs.tumorName, inputs.normalName].join("|"))
  shellQuote: true
- prefix: -f
  position: 5
  valueFrom: $(inputs.alleleFreqThreshold)
  shellQuote: false
- prefix: -f
  position: 1
  valueFrom: $(inputs.alleleFreqThreshold)
  shellQuote: false

- class: ToolTimeLimit
  timelimit: |-
    $([inputs.runtime_seconds, 86400].filter(function (inner) { return inner != null })[0])
id: vardict_somatic