Manta¶

manta · 1 contributor · 2 versions

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow. The method is designed for rapid analysis on standard compute hardware: NA12878 at 50x genomic coverage is analyzed in less than 20 minutes on a 20 core server, and most WGS tumor/normal analyses can be completed within 2 hours. Manta combines paired and split-read evidence during SV discovery and scoring to improve accuracy, but does not require split-reads or successful breakpoint assemblies to report a variant in cases where there is strong evidence otherwise.

It provides scoring models for germline variants in small sets of diploid samples and somatic variants in matched tumor/normal sample pairs. There is experimental support for analysis of unmatched tumor samples as well. Manta accepts input read mappings from BAM or CRAM files and reports all SV and indel inferences in VCF 4.1 format. See the user guide for a full description of capabilities and limitations.

Quickstart¶

from janis_bioinformatics.tools.illumina.manta.manta import Manta_1_5_0

wf = WorkflowBuilder("myworkflow")

wf.step(
    "manta_step",
    Manta_1_5_0(
        bam=None,
        reference=None,
    )
)
wf.output("python", source=manta_step.python)
wf.output("pickle", source=manta_step.pickle)
wf.output("candidateSV", source=manta_step.candidateSV)
wf.output("candidateSmallIndels", source=manta_step.candidateSmallIndels)
wf.output("diploidSV", source=manta_step.diploidSV)
wf.output("alignmentStatsSummary", source=manta_step.alignmentStatsSummary)
wf.output("svCandidateGenerationStats", source=manta_step.svCandidateGenerationStats)
wf.output("svLocusGraphStats", source=manta_step.svLocusGraphStats)
wf.output("somaticSVs", source=manta_step.somaticSVs)

OR

Install Janis
Ensure Janis is configured to work with Docker or Singularity.
Ensure all reference files are available:

Note

More information about these inputs are available below.

Generate user input files for manta:

# user inputs
janis inputs manta > inputs.yaml

inputs.yaml

bam: bam.bam
reference: reference.fasta

Run manta with:

janis run [...run options] \
    --inputs inputs.yaml \
    manta

Information¶

ID:	`manta`
URL:	https://github.com/Illumina/manta
Versions:	1.5.0, 1.4.0
Container:	michaelfranklin/manta:1.5.0
Authors:	Michael Franklin
Citations:	Chen, X. et al. (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32, 1220-1222. doi:10.1093/bioinformatics/btv710
DOI:	doi:10.1093/bioinformatics/btv710
Created:	2019-02-12
Updated:	2019-02-19

Outputs¶

name	type	documentation
python	File
pickle	File
candidateSV	Gzipped<VCF>
candidateSmallIndels	Gzipped<VCF>
diploidSV	Gzipped<VCF>
alignmentStatsSummary	File
svCandidateGenerationStats	tsv
svLocusGraphStats	tsv
somaticSVs	Optional<Gzipped<VCF>>

Additional configuration (inputs)¶

name	type	prefix	position	documentation
bam	IndexedBam	–bam	1	FILE Normal sample BAM or CRAM file. May be specified more than once, multiple inputs will be treated as each BAM file representing a different sample. [optional] (no default)
reference	FastaFai	–referenceFasta	1	samtools-indexed reference fasta file [required]
config	Optional<File>	–config	1	provide a configuration file to override defaults in global config file (/opt/conda/share/manta-1.2.1-0/bin/configManta.py.ini)
runDir	Optional<Filename>	–runDir	1	Run script and run output will be written to this directory [required] (default: MantaWorkflow)
tumorBam	Optional<IndexedBam>	–tumorBam	1	Tumor sample BAM or CRAM file. Only up to one tumor bam file accepted. [optional=null]
exome	Optional<Boolean>	–exome	1	Set options for WES input: turn off depth filters
rna	Optional<Boolean>	–rna	1	Set options for RNA-Seq input. Must specify exactly one bam input file
unstrandedRNA	Optional<Boolean>	–unstrandedRNA	1	Set if RNA-Seq input is unstranded: Allows splice-junctions on either strand
outputContig	Optional<Boolean>	–outputContig	1	Output assembled contig sequences in VCF file
callRegions	Optional<Gzipped<bed>>	–callRegions	1	Optionally provide a bgzip-compressed/tabix-indexed BED file containing the set of regions to call. No VCF output will be provided outside of these regions. The full genome will still be used to estimate statistics from the input (such as expected depth per chromosome). Only one BED file may be specified. (default: call the entire genome)
mode	Optional<String>	–mode	3	(-m) select run mode (local\|sge)
quiet	Optional<Boolean>	–quiet	3	Don’t write any log output to stderr (but still write to workspace/pyflow.data/logs/pyflow_log.txt)
queue	Optional<String>	–queue	3	(-q) specify scheduler queue name
memgb	Optional<Integer>	–memGb	3	(-g) gigabytes of memory available to run workflow – only meaningful in local mode, must be an integer (default: Estimate the total memory for this node for local mode, ‘unlimited’ for sge mode)
maxTaskRuntime	Optional<String>	–maxTaskRuntime	3	(format: hh:mm:ss) Specify scheduler max runtime per task, argument is provided to the ‘h_rt’ resource limit if using SGE (no default)

Workflow Description Language¶

version development

task manta {
  input {
    Int? runtime_cpu
    Int? runtime_memory
    Int? runtime_seconds
    Int? runtime_disks
    File? config
    File bam
    File bam_bai
    String? runDir
    File reference
    File reference_fai
    File? tumorBam
    File? tumorBam_bai
    Boolean? exome
    Boolean? rna
    Boolean? unstrandedRNA
    Boolean? outputContig
    File? callRegions
    File? callRegions_tbi
    String? mode
    Boolean? quiet
    String? queue
    Int? memgb
    String? maxTaskRuntime
  }
  command <<<
    set -e
     \
      configManta.py \
      ~{if defined(config) then ("--config " + config) else ''} \
      --bam ~{bam} \
      --runDir ~{select_first([runDir, "generated"])} \
      --referenceFasta ~{reference} \
      ~{if defined(tumorBam) then ("--tumorBam " + tumorBam) else ''} \
      ~{if (defined(exome) && select_first([exome])) then "--exome" else ""} \
      ~{if (defined(rna) && select_first([rna])) then "--rna" else ""} \
      ~{if (defined(unstrandedRNA) && select_first([unstrandedRNA])) then "--unstrandedRNA" else ""} \
      ~{if (defined(outputContig) && select_first([outputContig])) then "--outputContig" else ""} \
      ~{if defined(callRegions) then ("--callRegions " + callRegions) else ''} \
      ;~{select_first([runDir, "generated"])}/runWorkflow.py \
      ~{if defined(select_first([mode, "local"])) then ("--mode " + select_first([mode, "local"])) else ''} \
      ~{if (defined(quiet) && select_first([quiet])) then "--quiet" else ""} \
      ~{if defined(queue) then ("--queue " + queue) else ''} \
      ~{if defined(memgb) then ("--memGb " + memgb) else ''} \
      ~{if defined(maxTaskRuntime) then ("--maxTaskRuntime " + maxTaskRuntime) else ''} \
      -j ~{select_first([runtime_cpu, 4])}
  >>>
  runtime {
    cpu: select_first([runtime_cpu, 4, 1])
    disks: "local-disk ~{select_first([runtime_disks, 20])} SSD"
    docker: "michaelfranklin/manta:1.5.0"
    duration: select_first([runtime_seconds, 86400])
    memory: "~{select_first([runtime_memory, 4, 4])}G"
    preemptible: 2
  }
  output {
    File python = (select_first([runDir, "generated"]) + "/runWorkflow.py")
    File pickle = (select_first([runDir, "generated"]) + "/runWorkflow.py.config.pickle")
    File candidateSV = (select_first([runDir, "generated"]) + "/results/variants/candidateSV.vcf.gz")
    File candidateSV_tbi = (select_first([runDir, "generated"]) + "/results/variants/candidateSV.vcf.gz") + ".tbi"
    File candidateSmallIndels = (select_first([runDir, "generated"]) + "/results/variants/candidateSmallIndels.vcf.gz")
    File candidateSmallIndels_tbi = (select_first([runDir, "generated"]) + "/results/variants/candidateSmallIndels.vcf.gz") + ".tbi"
    File diploidSV = (select_first([runDir, "generated"]) + "/results/variants/diploidSV.vcf.gz")
    File diploidSV_tbi = (select_first([runDir, "generated"]) + "/results/variants/diploidSV.vcf.gz") + ".tbi"
    File alignmentStatsSummary = (select_first([runDir, "generated"]) + "/results/stats/alignmentStatsSummary.txt")
    File svCandidateGenerationStats = (select_first([runDir, "generated"]) + "/results/stats/svCandidateGenerationStats.tsv")
    File svLocusGraphStats = (select_first([runDir, "generated"]) + "/results/stats/svLocusGraphStats.tsv")
    File? somaticSVs = (select_first([runDir, "generated"]) + "/results/variants/somaticSV.vcf.gz")
    File? somaticSVs_tbi = if defined((select_first([runDir, "generated"]) + "/results/variants/somaticSV.vcf.gz")) then ((select_first([runDir, "generated"]) + "/results/variants/somaticSV.vcf.gz") + ".tbi") else None
  }
}

Common Workflow Language¶

#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
label: Manta
doc: |-
  Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads.
  It is optimized for analysis of germline variation in small sets of individuals and somatic
  variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs,
  medium-sized indels and large insertions within a single efficient workflow. The method is
  designed for rapid analysis on standard compute hardware: NA12878 at 50x genomic coverage is
  analyzed in less than 20 minutes on a 20 core server, and most WGS tumor/normal analyses
  can be completed within 2 hours. Manta combines paired and split-read evidence during SV
  discovery and scoring to improve accuracy, but does not require split-reads or successful
  breakpoint assemblies to report a variant in cases where there is strong evidence otherwise.

  It provides scoring models for germline variants in small sets of diploid samples and somatic
  variants in matched tumor/normal sample pairs. There is experimental support for analysis of
  unmatched tumor samples as well. Manta accepts input read mappings from BAM or CRAM files and
  reports all SV and indel inferences in VCF 4.1 format. See the user guide for a full description
  of capabilities and limitations.

requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: michaelfranklin/manta:1.5.0

inputs:
- id: config
  label: config
  doc: |-
    provide a configuration file to override defaults in global config file (/opt/conda/share/manta-1.2.1-0/bin/configManta.py.ini)
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --config
    position: 1
    shellQuote: false
- id: bam
  label: bam
  doc: |-
    FILE Normal sample BAM or CRAM file. May be specified more than once, multiple inputs will be treated as each BAM file representing a different sample. [optional] (no default)
  type: File
  secondaryFiles:
  - pattern: .bai
  inputBinding:
    prefix: --bam
    position: 1
    shellQuote: false
- id: runDir
  label: runDir
  doc: |-
    Run script and run output will be written to this directory [required] (default: MantaWorkflow)
  type:
  - string
  - 'null'
  default: generated
  inputBinding:
    prefix: --runDir
    position: 1
    shellQuote: false
- id: reference
  label: reference
  doc: samtools-indexed reference fasta file [required]
  type: File
  secondaryFiles:
  - pattern: .fai
  inputBinding:
    prefix: --referenceFasta
    position: 1
    shellQuote: false
- id: tumorBam
  label: tumorBam
  doc: |-
    Tumor sample BAM or CRAM file. Only up to one tumor bam file accepted. [optional=null]
  type:
  - File
  - 'null'
  secondaryFiles:
  - pattern: .bai
  inputBinding:
    prefix: --tumorBam
    position: 1
    shellQuote: false
- id: exome
  label: exome
  doc: 'Set options for WES input: turn off depth filters'
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --exome
    position: 1
    shellQuote: false
- id: rna
  label: rna
  doc: Set options for RNA-Seq input. Must specify exactly one bam input file
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --rna
    position: 1
    shellQuote: false
- id: unstrandedRNA
  label: unstrandedRNA
  doc: 'Set if RNA-Seq input is unstranded: Allows splice-junctions on either strand'
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --unstrandedRNA
    position: 1
    shellQuote: false
- id: outputContig
  label: outputContig
  doc: Output assembled contig sequences in VCF file
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --outputContig
    position: 1
    shellQuote: false
- id: callRegions
  label: callRegions
  doc: |-
    Optionally provide a bgzip-compressed/tabix-indexed BED file containing the set of regions to call. No VCF output will be provided outside of these regions. The full genome will still be used to estimate statistics from the input (such as expected depth per chromosome). Only one BED file may be specified. (default: call the entire genome)
  type:
  - File
  - 'null'
  secondaryFiles:
  - pattern: .tbi
  inputBinding:
    prefix: --callRegions
    position: 1
    shellQuote: false
- id: mode
  label: mode
  doc: (-m) select run mode (local|sge)
  type: string
  default: local
  inputBinding:
    prefix: --mode
    position: 3
    shellQuote: false
- id: quiet
  label: quiet
  doc: |-
    Don't write any log output to stderr (but still write to workspace/pyflow.data/logs/pyflow_log.txt)
  type:
  - boolean
  - 'null'
  inputBinding:
    prefix: --quiet
    position: 3
    shellQuote: false
- id: queue
  label: queue
  doc: (-q) specify scheduler queue name
  type:
  - string
  - 'null'
  inputBinding:
    prefix: --queue
    position: 3
    shellQuote: false
- id: memgb
  label: memgb
  doc: |-
    (-g) gigabytes of memory available to run workflow -- only meaningful in local mode, must be an integer (default: Estimate the total memory for this node for local  mode, 'unlimited' for sge mode)
  type:
  - int
  - 'null'
  inputBinding:
    prefix: --memGb
    position: 3
    shellQuote: false
- id: maxTaskRuntime
  label: maxTaskRuntime
  doc: |-
    (format: hh:mm:ss) Specify scheduler max runtime per task, argument is provided to the 'h_rt' resource limit if using SGE (no default)
  type:
  - string
  - 'null'
  inputBinding:
    prefix: --maxTaskRuntime
    position: 3
    shellQuote: false

outputs:
- id: python
  label: python
  type: File
  outputBinding:
    glob: $((inputs.runDir + "/runWorkflow.py"))
    outputEval: $((inputs.runDir.basename + "/runWorkflow.py"))
    loadContents: false
- id: pickle
  label: pickle
  type: File
  outputBinding:
    glob: $((inputs.runDir + "/runWorkflow.py.config.pickle"))
    outputEval: $((inputs.runDir.basename + "/runWorkflow.py.config.pickle"))
    loadContents: false
- id: candidateSV
  label: candidateSV
  type: File
  secondaryFiles:
  - pattern: .tbi
  outputBinding:
    glob: $((inputs.runDir + "/results/variants/candidateSV.vcf.gz"))
    outputEval: $((inputs.runDir.basename + "/results/variants/candidateSV.vcf.gz"))
    loadContents: false
- id: candidateSmallIndels
  label: candidateSmallIndels
  type: File
  secondaryFiles:
  - pattern: .tbi
  outputBinding:
    glob: $((inputs.runDir + "/results/variants/candidateSmallIndels.vcf.gz"))
    outputEval: $((inputs.runDir.basename + "/results/variants/candidateSmallIndels.vcf.gz"))
    loadContents: false
- id: diploidSV
  label: diploidSV
  type: File
  secondaryFiles:
  - pattern: .tbi
  outputBinding:
    glob: $((inputs.runDir + "/results/variants/diploidSV.vcf.gz"))
    outputEval: $((inputs.runDir.basename + "/results/variants/diploidSV.vcf.gz"))
    loadContents: false
- id: alignmentStatsSummary
  label: alignmentStatsSummary
  type: File
  outputBinding:
    glob: $((inputs.runDir + "/results/stats/alignmentStatsSummary.txt"))
    outputEval: $((inputs.runDir.basename + "/results/stats/alignmentStatsSummary.txt"))
    loadContents: false
- id: svCandidateGenerationStats
  label: svCandidateGenerationStats
  type: File
  outputBinding:
    glob: $((inputs.runDir + "/results/stats/svCandidateGenerationStats.tsv"))
    outputEval: $((inputs.runDir.basename + "/results/stats/svCandidateGenerationStats.tsv"))
    loadContents: false
- id: svLocusGraphStats
  label: svLocusGraphStats
  type: File
  outputBinding:
    glob: $((inputs.runDir + "/results/stats/svLocusGraphStats.tsv"))
    outputEval: $((inputs.runDir.basename + "/results/stats/svLocusGraphStats.tsv"))
    loadContents: false
- id: somaticSVs
  label: somaticSVs
  type:
  - File
  - 'null'
  secondaryFiles:
  - pattern: .tbi
  outputBinding:
    glob: $((inputs.runDir + "/results/variants/somaticSV.vcf.gz"))
    outputEval: $((inputs.runDir.basename + "/results/variants/somaticSV.vcf.gz"))
    loadContents: false
stdout: _stdout
stderr: _stderr
arguments:
- position: 0
  valueFrom: configManta.py
  shellQuote: false
- position: 2
  valueFrom: $(";{runDir}/runWorkflow.py".replace(/\{runDir\}/g, inputs.runDir))
  shellQuote: false
- prefix: -j
  position: 3
  valueFrom: $([inputs.runtime_cpu, 4].filter(function (inner) { return inner != null
    })[0])
  shellQuote: false

hints:
- class: ToolTimeLimit
  timelimit: |-
    $([inputs.runtime_seconds, 86400].filter(function (inner) { return inner != null })[0])
id: manta