GATK4: GetFilterMutectCalls

Gatk4FilterMutectCalls · 1 contributor · 6 versions

Filter variants in a Mutect2 VCF callset.

FilterMutectCalls applies filters to the raw output of Mutect2. Parameters are contained in M2FiltersArgumentCollection and described in https://github.com/broadinstitute/gatk/tree/master/docs/mutect/mutect.pdf. To filter based on sequence context artifacts, specify the –orientation-bias-artifact-priors [artifact priors tar.gz file] argument one or more times. This input is generated by LearnReadOrientationModel.

If given a –contamination-table file, e.g. results from CalculateContamination, the tool will additionally filter on contamination fractions. This argument may be specified with a table for one or more tumor sample. Alternatively, provide a numerical fraction to filter with the –contamination argument. FilterMutectCalls can also be given one or more –tumor-segmentation files, which are also output by CalculateContamination.

Quickstart

from janis_bioinformatics.tools.gatk4.filtermutectcalls.versions import Gatk4FilterMutectCalls_4_1_8

wf = WorkflowBuilder("myworkflow")

wf.step(
    "gatk4filtermutectcalls_step",
    Gatk4FilterMutectCalls_4_1_8(
        vcf=None,
        reference=None,
    )
)
wf.output("out", source=gatk4filtermutectcalls_step.out)

OR

  1. Install Janis
  2. Ensure Janis is configured to work with Docker or Singularity.
  3. Ensure all reference files are available:

Note

More information about these inputs are available below.

  1. Generate user input files for Gatk4FilterMutectCalls:
# user inputs
janis inputs Gatk4FilterMutectCalls > inputs.yaml

inputs.yaml

reference: reference.fasta
vcf: vcf.vcf.gz
  1. Run Gatk4FilterMutectCalls with:
janis run [...run options] \
    --inputs inputs.yaml \
    Gatk4FilterMutectCalls

Information

ID:Gatk4FilterMutectCalls
URL:https://software.broadinstitute.org/gatk/documentation/tooldocs/4.1.2.0/org_broadinstitute_hellbender_tools_walkers_mutect_Mutect2.php
Versions:4.1.8.1, 4.1.7.0, 4.1.6.0, 4.1.4.0, 4.1.3.0, 4.1.2.0
Container:broadinstitute/gatk:4.1.8.1
Authors:Hollizeck Sebastian
Citations:TBD
Created:2019-09-09
Updated:2019-09-09

Outputs

name type documentation
out Gzipped<VCF> vcf containing filtered calls

Additional configuration (inputs)

name type prefix position documentation
vcf Gzipped<VCF> -V   vcf to be filtered
reference FastaWithIndexes -R   Reference sequence file
javaOptions Optional<Array<String>>      
compression_level Optional<Integer>     Compression level for all compressed files created (e.g. BAM and VCF). Default value: 2.
contaminationTable Optional<File> –contamination-table   Tables containing contamination information.
segmentationFile Optional<File> –tumor-segmentation   Tables containing tumor segments’ minor allele fractions for germline hets emitted by CalculateContamination
statsFile Optional<File> –stats   The Mutect stats file output by Mutect2
readOrientationModel Optional<File> –orientation-bias-artifact-priors   One or more .tar.gz files containing tables of prior artifact probabilities for the read orientation filter model, one table per tumor sample
outputFilename Optional<Filename> -O 2  

Workflow Description Language

version development

task Gatk4FilterMutectCalls {
  input {
    Int? runtime_cpu
    Int? runtime_memory
    Int? runtime_seconds
    Int? runtime_disks
    Array[String]? javaOptions
    Int? compression_level
    File? contaminationTable
    File? segmentationFile
    File? statsFile
    File? readOrientationModel
    File vcf
    File vcf_tbi
    File reference
    File reference_fai
    File reference_amb
    File reference_ann
    File reference_bwt
    File reference_pac
    File reference_sa
    File reference_dict
    String? outputFilename
  }
  command <<<
    set -e
    gatk FilterMutectCalls \
      --java-options '-Xmx~{((select_first([runtime_memory, 16, 4]) * 3) / 4)}G ~{if (defined(compression_level)) then ("-Dsamjdk.compress_level=" + compression_level) else ""} ~{sep(" ", select_first([javaOptions, []]))}' \
      ~{if defined(contaminationTable) then ("--contamination-table '" + contaminationTable + "'") else ""} \
      ~{if defined(segmentationFile) then ("--tumor-segmentation '" + segmentationFile + "'") else ""} \
      ~{if defined(statsFile) then ("--stats '" + statsFile + "'") else ""} \
      ~{if defined(readOrientationModel) then ("--orientation-bias-artifact-priors '" + readOrientationModel + "'") else ""} \
      -V '~{vcf}' \
      -R '~{reference}' \
      -O '~{select_first([outputFilename, "~{basename(vcf, ".vcf.gz")}.vcf.gz"])}'
  >>>
  runtime {
    cpu: select_first([runtime_cpu, 1, 1])
    disks: "local-disk ~{select_first([runtime_disks, 20])} SSD"
    docker: "broadinstitute/gatk:4.1.8.1"
    duration: select_first([runtime_seconds, 86400])
    memory: "~{select_first([runtime_memory, 16, 4])}G"
    preemptible: 2
  }
  output {
    File out = select_first([outputFilename, "~{basename(vcf, ".vcf.gz")}.vcf.gz"])
    File out_tbi = select_first([outputFilename, "~{basename(vcf, ".vcf.gz")}.vcf.gz"]) + ".tbi"
  }
}

Common Workflow Language

#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
label: 'GATK4: GetFilterMutectCalls'
doc: |-
  Filter variants in a Mutect2 VCF callset.

  FilterMutectCalls applies filters to the raw output of Mutect2. Parameters are contained in M2FiltersArgumentCollection and described in https://github.com/broadinstitute/gatk/tree/master/docs/mutect/mutect.pdf. To filter based on sequence context artifacts, specify the --orientation-bias-artifact-priors [artifact priors tar.gz file] argument one or more times. This input is generated by LearnReadOrientationModel.

  If given a --contamination-table file, e.g. results from CalculateContamination, the tool will additionally filter on contamination fractions. This argument may be specified with a table for one or more tumor sample. Alternatively, provide a numerical fraction to filter with the --contamination argument. FilterMutectCalls can also be given one or more --tumor-segmentation files, which are also output by CalculateContamination.

requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: broadinstitute/gatk:4.1.8.1

inputs:
- id: javaOptions
  label: javaOptions
  type:
  - type: array
    items: string
  - 'null'
- id: compression_level
  label: compression_level
  doc: |-
    Compression level for all compressed files created (e.g. BAM and VCF). Default value: 2.
  type:
  - int
  - 'null'
- id: contaminationTable
  label: contaminationTable
  doc: Tables containing contamination information.
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --contamination-table
- id: segmentationFile
  label: segmentationFile
  doc: |-
    Tables containing tumor segments' minor allele fractions for germline hets emitted by CalculateContamination
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --tumor-segmentation
- id: statsFile
  label: statsFile
  doc: The Mutect stats file output by Mutect2
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --stats
- id: readOrientationModel
  label: readOrientationModel
  doc: |-
    One or more .tar.gz files containing tables of prior artifact probabilities for the read orientation filter model, one table per tumor sample
  type:
  - File
  - 'null'
  inputBinding:
    prefix: --orientation-bias-artifact-priors
- id: vcf
  label: vcf
  doc: vcf to be filtered
  type: File
  secondaryFiles:
  - pattern: .tbi
  inputBinding:
    prefix: -V
- id: reference
  label: reference
  doc: Reference sequence file
  type: File
  secondaryFiles:
  - pattern: .fai
  - pattern: .amb
  - pattern: .ann
  - pattern: .bwt
  - pattern: .pac
  - pattern: .sa
  - pattern: ^.dict
  inputBinding:
    prefix: -R
- id: outputFilename
  label: outputFilename
  type:
  - string
  - 'null'
  default: generated.vcf.gz
  inputBinding:
    prefix: -O
    position: 2
    valueFrom: $(inputs.vcf.basename.replace(/.vcf.gz$/, "")).vcf.gz

outputs:
- id: out
  label: out
  doc: vcf containing filtered calls
  type: File
  secondaryFiles:
  - pattern: .tbi
  outputBinding:
    glob: $(inputs.vcf.basename.replace(/.vcf.gz$/, "")).vcf.gz
    loadContents: false
stdout: _stdout
stderr: _stderr

baseCommand:
- gatk
- FilterMutectCalls
arguments:
- prefix: --java-options
  position: -1
  valueFrom: |-
    $("-Xmx{memory}G {compression} {otherargs}".replace(/\{memory\}/g, (([inputs.runtime_memory, 16, 4].filter(function (inner) { return inner != null })[0] * 3) / 4)).replace(/\{compression\}/g, (inputs.compression_level != null) ? ("-Dsamjdk.compress_level=" + inputs.compression_level) : "").replace(/\{otherargs\}/g, [inputs.javaOptions, []].filter(function (inner) { return inner != null })[0].join(" ")))

hints:
- class: ToolTimeLimit
  timelimit: |-
    $([inputs.runtime_seconds, 86400].filter(function (inner) { return inner != null })[0])
id: Gatk4FilterMutectCalls