GATK4: Gather VCFs¶
Gatk4GatherVcfs
· 1 contributor · 4 versions
GatherVcfs (Picard)
Gathers multiple VCF files from a scatter operation into a single VCF file. Input files must be supplied in genomic order and must not have events at overlapping positions.
Quickstart¶
from janis_bioinformatics.tools.gatk4.gathervcfs.versions import Gatk4GatherVcfs_4_1_4 wf = WorkflowBuilder("myworkflow") wf.step( "gatk4gathervcfs_step", Gatk4GatherVcfs_4_1_4( vcfs=None, ) ) wf.output("out", source=gatk4gathervcfs_step.out)
OR
- Install Janis
- Ensure Janis is configured to work with Docker or Singularity.
- Ensure all reference files are available:
Note
More information about these inputs are available below.
- Generate user input files for Gatk4GatherVcfs:
# user inputs
janis inputs Gatk4GatherVcfs > inputs.yaml
inputs.yaml
vcfs:
- vcfs_0.vcf
- vcfs_1.vcf
- Run Gatk4GatherVcfs with:
janis run [...run options] \
--inputs inputs.yaml \
Gatk4GatherVcfs
Information¶
ID: | Gatk4GatherVcfs |
---|---|
URL: | https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.12.0/picard_vcf_GatherVcfs.php |
Versions: | 4.1.4.0, 4.1.3.0, 4.1.2.0, 4.0.12.0 |
Container: | broadinstitute/gatk:4.1.4.0 |
Authors: | Michael Franklin |
Citations: | See https://software.broadinstitute.org/gatk/documentation/article?id=11027 for more information |
Created: | 2018-05-01 |
Updated: | 2019-05-01 |
Outputs¶
name | type | documentation |
---|---|---|
out | VCF |
Additional configuration (inputs)¶
name | type | prefix | position | documentation |
---|---|---|---|---|
vcfs | Array<VCF> | –INPUT | [default: []] (-I) Input VCF file(s). | |
javaOptions | Optional<Array<String>> | |||
compression_level | Optional<Integer> | Compression level for all compressed files created (e.g. BAM and VCF). Default value: 2. | ||
outputFilename | Optional<Filename> | –OUTPUT | [default: null] (-O) Output VCF file. | |
argumentsFile | Optional<Array<File>> | –arguments_file | [default: []] read one or more arguments files and add them to the command line | |
compressionLevel | Optional<Integer> | –COMPRESSION_LEVEL | [default: 5] Compression level for all compressed files created (e.g. BAM and VCF). | |
createIndex | Optional<Boolean> | –CREATE_INDEX | [default: TRUE] Whether to create a BAM index when writing a coordinate-sorted BAM file. | |
createMd5File | Optional<Boolean> | –CREATE_MD5_FILE | [default: FALSE] Whether to create an MD5 digest for any BAM or FASTQ files created. | |
ga4ghClientSecrets | Optional<File> | –GA4GH_CLIENT_SECRETS | [default: client_secrets.json] Google Genomics API client_secrets.json file path. | |
maxRecordsInRam | Optional<Integer> | –MAX_RECORDS_IN_RAM | [default: 500000] When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. | |
quiet | Optional<Boolean> | –QUIET | [default: FALSE] Whether to suppress job-summary info on System.err. | |
referenceSequence | Optional<File> | –REFERENCE_SEQUENCE | [default: null] Reference sequence file. | |
tmpDir | Optional<String> | –TMP_DIR | [default: []] One or more directories with space available to be used by this program for temporary storage of working files | |
useJdkDeflater | Optional<Boolean> | –USE_JDK_DEFLATER | [default: FALSE] (-use_jdk_deflater) Use the JDK Deflater instead of the Intel Deflater for writing compressed output | |
useJdkInflater | Optional<Boolean> | –USE_JDK_INFLATER | [default: FALSE] (-use_jdk_inflater) Use the JDK Inflater instead of the Intel Inflater for reading compressed input | |
validationStringency | Optional<String> | –VALIDATION_STRINGENCY | [default: STRICT] Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. | |
verbosity | Optional<Boolean> | –VERBOSITY | [default: INFO] Control verbosity of logging. |
Workflow Description Language¶
version development
task Gatk4GatherVcfs {
input {
Int? runtime_cpu
Int? runtime_memory
Int? runtime_seconds
Int? runtime_disks
Array[String]? javaOptions
Int? compression_level
Array[File] vcfs
String? outputFilename
Array[File]? argumentsFile
Int? compressionLevel
Boolean? createIndex
Boolean? createMd5File
File? ga4ghClientSecrets
Int? maxRecordsInRam
Boolean? quiet
File? referenceSequence
String? tmpDir
Boolean? useJdkDeflater
Boolean? useJdkInflater
String? validationStringency
Boolean? verbosity
}
command <<<
set -e
gatk GatherVcfs \
--java-options '-Xmx~{((select_first([runtime_memory, 8, 4]) * 3) / 4)}G ~{if (defined(compression_level)) then ("-Dsamjdk.compress_level=" + compression_level) else ""} ~{sep(" ", select_first([javaOptions, []]))}' \
~{if length(vcfs) > 0 then "--INPUT '" + sep("' --INPUT '", vcfs) + "'" else ""} \
--OUTPUT '~{select_first([outputFilename, "generated.gathered.vcf"])}' \
~{if (defined(argumentsFile) && length(select_first([argumentsFile])) > 0) then "--arguments_file '" + sep("' '", select_first([argumentsFile])) + "'" else ""} \
~{if defined(compressionLevel) then ("--COMPRESSION_LEVEL " + compressionLevel) else ''} \
~{if (defined(createIndex) && select_first([createIndex])) then "--CREATE_INDEX" else ""} \
~{if (defined(createMd5File) && select_first([createMd5File])) then "--CREATE_MD5_FILE" else ""} \
~{if defined(ga4ghClientSecrets) then ("--GA4GH_CLIENT_SECRETS '" + ga4ghClientSecrets + "'") else ""} \
~{if defined(maxRecordsInRam) then ("--MAX_RECORDS_IN_RAM " + maxRecordsInRam) else ''} \
~{if (defined(quiet) && select_first([quiet])) then "--QUIET" else ""} \
~{if defined(referenceSequence) then ("--REFERENCE_SEQUENCE '" + referenceSequence + "'") else ""} \
~{if defined(select_first([tmpDir, "/tmp"])) then ("--TMP_DIR '" + select_first([tmpDir, "/tmp"]) + "'") else ""} \
~{if (defined(useJdkDeflater) && select_first([useJdkDeflater])) then "--USE_JDK_DEFLATER" else ""} \
~{if (defined(useJdkInflater) && select_first([useJdkInflater])) then "--USE_JDK_INFLATER" else ""} \
~{if defined(validationStringency) then ("--VALIDATION_STRINGENCY '" + validationStringency + "'") else ""} \
~{if (defined(verbosity) && select_first([verbosity])) then "--VERBOSITY" else ""}
>>>
runtime {
cpu: select_first([runtime_cpu, 1, 1])
disks: "local-disk ~{select_first([runtime_disks, 20])} SSD"
docker: "broadinstitute/gatk:4.1.4.0"
duration: select_first([runtime_seconds, 86400])
memory: "~{select_first([runtime_memory, 8, 4])}G"
preemptible: 2
}
output {
File out = select_first([outputFilename, "generated.gathered.vcf"])
}
}
Common Workflow Language¶
#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
label: 'GATK4: Gather VCFs'
doc: |-
GatherVcfs (Picard)
Gathers multiple VCF files from a scatter operation into a single VCF file.
Input files must be supplied in genomic order and must not have events at overlapping positions.
requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
dockerPull: broadinstitute/gatk:4.1.4.0
inputs:
- id: javaOptions
label: javaOptions
type:
- type: array
items: string
- 'null'
- id: compression_level
label: compression_level
doc: |-
Compression level for all compressed files created (e.g. BAM and VCF). Default value: 2.
type:
- int
- 'null'
- id: vcfs
label: vcfs
doc: '[default: []] (-I) Input VCF file(s).'
type:
type: array
inputBinding:
prefix: --INPUT
items: File
inputBinding: {}
- id: outputFilename
label: outputFilename
doc: '[default: null] (-O) Output VCF file.'
type:
- string
- 'null'
default: generated.gathered.vcf
inputBinding:
prefix: --OUTPUT
- id: argumentsFile
label: argumentsFile
doc: '[default: []] read one or more arguments files and add them to the command
line'
type:
- type: array
items: File
- 'null'
inputBinding:
prefix: --arguments_file
- id: compressionLevel
label: compressionLevel
doc: |-
[default: 5] Compression level for all compressed files created (e.g. BAM and VCF).
type:
- int
- 'null'
inputBinding:
prefix: --COMPRESSION_LEVEL
- id: createIndex
label: createIndex
doc: |-
[default: TRUE] Whether to create a BAM index when writing a coordinate-sorted BAM file.
type:
- boolean
- 'null'
inputBinding:
prefix: --CREATE_INDEX
- id: createMd5File
label: createMd5File
doc: |-
[default: FALSE] Whether to create an MD5 digest for any BAM or FASTQ files created.
type:
- boolean
- 'null'
inputBinding:
prefix: --CREATE_MD5_FILE
- id: ga4ghClientSecrets
label: ga4ghClientSecrets
doc: |-
[default: client_secrets.json] Google Genomics API client_secrets.json file path.
type:
- File
- 'null'
inputBinding:
prefix: --GA4GH_CLIENT_SECRETS
- id: maxRecordsInRam
label: maxRecordsInRam
doc: |-
[default: 500000] When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
type:
- int
- 'null'
inputBinding:
prefix: --MAX_RECORDS_IN_RAM
- id: quiet
label: quiet
doc: '[default: FALSE] Whether to suppress job-summary info on System.err.'
type:
- boolean
- 'null'
inputBinding:
prefix: --QUIET
- id: referenceSequence
label: referenceSequence
doc: '[default: null] Reference sequence file.'
type:
- File
- 'null'
inputBinding:
prefix: --REFERENCE_SEQUENCE
- id: tmpDir
label: tmpDir
doc: |-
[default: []] One or more directories with space available to be used by this program for temporary storage of working files
type: string
default: /tmp
inputBinding:
prefix: --TMP_DIR
- id: useJdkDeflater
label: useJdkDeflater
doc: |-
[default: FALSE] (-use_jdk_deflater) Use the JDK Deflater instead of the Intel Deflater for writing compressed output
type:
- boolean
- 'null'
inputBinding:
prefix: --USE_JDK_DEFLATER
- id: useJdkInflater
label: useJdkInflater
doc: |-
[default: FALSE] (-use_jdk_inflater) Use the JDK Inflater instead of the Intel Inflater for reading compressed input
type:
- boolean
- 'null'
inputBinding:
prefix: --USE_JDK_INFLATER
- id: validationStringency
label: validationStringency
doc: |-
[default: STRICT] Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
type:
- string
- 'null'
inputBinding:
prefix: --VALIDATION_STRINGENCY
- id: verbosity
label: verbosity
doc: '[default: INFO] Control verbosity of logging.'
type:
- boolean
- 'null'
inputBinding:
prefix: --VERBOSITY
outputs:
- id: out
label: out
type: File
outputBinding:
glob: generated.gathered.vcf
loadContents: false
stdout: _stdout
stderr: _stderr
baseCommand:
- gatk
- GatherVcfs
arguments:
- prefix: --java-options
position: -1
valueFrom: |-
$("-Xmx{memory}G {compression} {otherargs}".replace(/\{memory\}/g, (([inputs.runtime_memory, 8, 4].filter(function (inner) { return inner != null })[0] * 3) / 4)).replace(/\{compression\}/g, (inputs.compression_level != null) ? ("-Dsamjdk.compress_level=" + inputs.compression_level) : "").replace(/\{otherargs\}/g, [inputs.javaOptions, []].filter(function (inner) { return inner != null })[0].join(" ")))
hints:
- class: ToolTimeLimit
timelimit: |-
$([inputs.runtime_seconds, 86400].filter(function (inner) { return inner != null })[0])
id: Gatk4GatherVcfs