Hap.py validation¶
happy_validator
· 1 contributor · 1 version
- usage: Haplotype Comparison
- [-h] [-v] [-r REF] [-o REPORTS_PREFIX] [–scratch-prefix SCRATCH_PREFIX] [–keep-scratch] [-t {xcmp,ga4gh}] [-f FP_BEDFILE] [–stratification STRAT_TSV] [–stratification-region STRAT_REGIONS] [–stratification-fixchr] [-V] [-X] [–no-write-counts] [–output-vtc] [–preserve-info] [–roc ROC] [–no-roc] [–roc-regions ROC_REGIONS] [–roc-filter ROC_FILTER] [–roc-delta ROC_DELTA] [–ci-alpha CI_ALPHA] [–no-json] [–location LOCATIONS] [–pass-only] [–filters-only FILTERS_ONLY] [-R REGIONS_BEDFILE] [-T TARGETS_BEDFILE] [-L] [–no-leftshift] [–decompose] [-D] [–bcftools-norm] [–fixchr] [–no-fixchr] [–bcf] [–somatic] [–set-gt {half,hemi,het,hom,first}] [–gender {male,female,auto,none}] [–preprocess-truth] [–usefiltered-truth] [–preprocessing-window-size PREPROCESS_WINDOW] [–adjust-conf-regions] [–no-adjust-conf-regions] [–unhappy] [-w WINDOW] [–xcmp-enumeration-threshold MAX_ENUM] [–xcmp-expand-hapblocks HB_EXPAND] [–threads THREADS] [–engine {xcmp,vcfeval,scmp-somatic,scmp-distance}] [–engine-vcfeval-path ENGINE_VCFEVAL] [–engine-vcfeval-template ENGINE_VCFEVAL_TEMPLATE] [–scmp-distance ENGINE_SCMP_DISTANCE] [–logfile LOGFILE] [–verbose | –quiet] [_vcfs [_vcfs …]]
- positional arguments:
- _vcfs Two VCF files.
Quickstart¶
from janis_bioinformatics.tools.illumina.happy.versions import HapPyValidator_0_3_9 wf = WorkflowBuilder("myworkflow") wf.step( "happy_validator_step", HapPyValidator_0_3_9( truthVCF=None, compareVCF=None, reference=None, ) ) wf.output("extended", source=happy_validator_step.extended) wf.output("summary", source=happy_validator_step.summary) wf.output("metrics", source=happy_validator_step.metrics) wf.output("vcf", source=happy_validator_step.vcf) wf.output("runinfo", source=happy_validator_step.runinfo) wf.output("rocOut", source=happy_validator_step.rocOut) wf.output("indelLocations", source=happy_validator_step.indelLocations) wf.output("indelPassLocations", source=happy_validator_step.indelPassLocations) wf.output("snpLocations", source=happy_validator_step.snpLocations) wf.output("snpPassLocations", source=happy_validator_step.snpPassLocations)
OR
- Install Janis
- Ensure Janis is configured to work with Docker or Singularity.
- Ensure all reference files are available:
Note
More information about these inputs are available below.
- Generate user input files for happy_validator:
# user inputs
janis inputs happy_validator > inputs.yaml
inputs.yaml
compareVCF: compareVCF.vcf
reference: reference.fasta
truthVCF: truthVCF.vcf
- Run happy_validator with:
janis run [...run options] \
--inputs inputs.yaml \
happy_validator
Information¶
ID: | happy_validator |
---|---|
URL: | No URL to the documentation was provided |
Versions: | v0.3.9 |
Container: | pkrusche/hap.py:v0.3.9 |
Authors: | Michael Franklin |
Citations: | None |
Created: | 2019-05-15 |
Updated: | 2019-05-15 |
Outputs¶
name | type | documentation |
---|---|---|
extended | csv | |
summary | csv | |
metrics | File | |
vcf | Gzipped<VCF> | |
runinfo | jsonFile | |
rocOut | File | |
indelLocations | File | |
indelPassLocations | File | |
snpLocations | File | |
snpPassLocations | File |
Additional configuration (inputs)¶
name | type | prefix | position | documentation |
---|---|---|---|---|
truthVCF | VCF | 1 | ||
compareVCF | VCF | 2 | ||
reference | FastaWithIndexes | –reference | (-r) Specify a reference file. | |
reportPrefix | Optional<Filename> | –report-prefix | (-o) Filename prefix for report output. | |
intervals | Optional<bed> | –target-regions | (-T) Restrict analysis to given (dense) regions (using -T in bcftools). | |
version | Optional<Boolean> | –version | (-v) Show version number and exit. | |
scratchPrefix | Optional<String> | –scratch-prefix | Directory for scratch files. | |
keepScratch | Optional<String> | –keep-scratch | Filename prefix for scratch report output. Annotation format in input VCF file. | |
falsePositives | Optional<bed> | –false-positives | (-f) False positive / confident call regions (.bed or .bed.gz). Calls outside these regions will be labelled as UNK. | |
stratification | Optional<tsv> | –stratification | Stratification file list (TSV format – first column is region name, second column is file name). | |
stratificationRegion | Optional<String> | –stratification-region | Add single stratification region, e.g. –stratification-region TEST:test.bed | |
stratificationFixchr | Optional<String> | –stratification-fixchr | Add chr prefix to stratification files if necessary | |
writeVcf | Optional<Boolean> | –write-vcf | (-V) Write an annotated VCF. | |
writeCounts | Optional<Boolean> | –write-counts | (-X) Write advanced counts and metrics. | |
noWriteCounts | Optional<Boolean> | –no-write-counts | Do not write advanced counts and metrics. | |
outputVtc | Optional<Boolean> | –output-vtc | Write VTC field in the final VCF which gives the counts each position has contributed to. | |
preserveInfo | Optional<Boolean> | –preserve-info | When using XCMP, preserve and merge the INFO fields in truth and query. Useful for ROC computation. | |
roc | Optional<String> | –roc | Select a feature to produce a ROC on (INFO feature, QUAL, GQX, …). | |
noRoc | Optional<Boolean> | –no-roc | Disable ROC computation and only output summary statistics for more concise output. | |
rocRegions | Optional<String> | –roc-regions | Select a list of regions to compute ROCs in. By default, only the ‘*’ region will produce ROC output (aggregate variant counts). | |
rocFilter | Optional<String> | –roc-filter | Select a filter to ignore when making ROCs. | |
rocDelta | Optional<Integer> | –roc-delta | Minimum spacing between ROC QQ levels. | |
ciAlpha | Optional<Integer> | –ci-alpha | Confidence level for Jeffrey’s CI for recall, precision and fraction of non-assessed calls. | |
noJson | Optional<Boolean> | –no-json | Disable JSON file output. | |
passOnly | Optional<Boolean> | –pass-only | Keep only PASS variants. | |
restrictRegions | Optional<Boolean> | –restrict-regions | (-R) Restrict analysis to given (sparse) regions (using -R in bcftools). | |
leftshift | Optional<Boolean> | –leftshift | (-L) Left-shift variants safely. | |
noLeftshift | Optional<Boolean> | –no-leftshift | Do not left-shift variants safely. | |
decompose | Optional<Boolean> | –decompose | Decompose variants into primitives. This results in more granular counts. | |
noDecompose | Optional<Boolean> | –no-decompose | (-D) Do not decompose variants into primitives. | |
bcftoolsNorm | Optional<Boolean> | –bcftools-norm | Enable preprocessing through bcftools norm -c x -D (requires external preprocessing to be switched on). | |
fixchr | Optional<Boolean> | –fixchr | Add chr prefix to VCF records where necessary (default: auto, attempt to match reference). | |
noFixchr | Optional<Boolean> | –no-fixchr | Do not add chr prefix to VCF records (default: auto, attempt to match reference). | |
bcf | Optional<Boolean> | –bcf | Use BCF internally. This is the default when the input file is in BCF format already. Using BCF can speed up temp file access, but may fail for VCF files that have broken headers or records that don’t comply with the header. | |
somatic | Optional<Boolean> | –somatic | Assume the input file is a somatic call file and squash all columns into one, putting all FORMATs into INFO + use half genotypes (see also –set-gt). This will replace all sample columns and replace them with a single one. This is used to treat Strelka somatic files Possible values for this parameter: half / hemi / het / hom / half to assign one of the following genotypes to the resulting sample: 1 | 0/1 | 1/1 | ./1. This will replace all sample columns and replace them with a single one. | |
setGT | Optional<Boolean> | –set-gt | This is used to treat Strelka somatic files Possible values for this parameter: half / hemi / het / hom / half to assign one of the following genotypes to the resulting sample: 1 | 0/1 | 1/1 | ./1. This will replace all sample columns and replace them with a single one. | |
gender | Optional<String> | –gender | ({male,female,auto,none}) Specify gender. This determines how haploid calls on chrX get treated: for male samples, all non-ref calls (in the truthset only when running through hap.py) are given a 1/1 genotype. | |
preprocessTruth | Optional<Boolean> | –preprocess-truth | Preprocess truth file with same settings as query (default is to accept truth in original format). | |
usefilteredTruth | Optional<Boolean> | –usefiltered-truth | Use filtered variant calls in truth file (by default, only PASS calls in the truth file are used) | |
preprocessingWindowSize | Optional<Boolean> | –preprocessing-window-size | Preprocessing window size (variants further apart than that size are not expected to interfere). | |
adjustConfRegions | Optional<Boolean> | –adjust-conf-regions | Adjust confident regions to include variant locations. Note this will only include variants that are included in the CONF regions already when viewing with bcftools; this option only makes sure insertions are padded correctly in the CONF regions (to capture these, both the base before and after must be contained in the bed file). | |
noAdjustConfRegions | Optional<Boolean> | –no-adjust-conf-regions | Do not adjust confident regions for insertions. | |
noHaplotypeComparison | Optional<Boolean> | –no-haplotype-comparison | (–unhappy) Disable haplotype comparison (only count direct GT matches as TP). | |
windowSize | Optional<Integer> | –window-size | (-w) Minimum distance between variants such that they fall into the same superlocus. | |
xcmpEnumerationThreshold | Optional<Integer> | –xcmp-enumeration-threshold | Enumeration threshold / maximum number of sequences to enumerate per block. | |
xcmpExpandHapblocks | Optional<String> | –xcmp-expand-hapblocks | Expand haplotype blocks by this many basepairs left and right. | |
threads | Optional<Integer> | –threads | Number of threads to use. Comparison engine to use. | |
engine | Optional<String> | –engine | {xcmp,vcfeval,scmp-somatic,scmp-distance} Comparison engine to use. | |
engineVcfevalTemplate | Optional<String> | –engine-vcfeval-template | Vcfeval needs the reference sequence formatted in its own file format (SDF – run rtg format -o ref.SDF ref.fa). You can specify this here to save time when running hap.py with vcfeval. If no SDF folder is specified, hap.py will create a temporary one. | |
scmpDistance | Optional<Integer> | –scmp-distance | For distance-based matching, this is the distance between variants to use. | |
logfile | Optional<Filename> | –logfile | Write logging information into file rather than to stderr | |
verbose | Optional<Boolean> | –verbose | Raise logging level from warning to info. | |
quiet | Optional<Boolean> | –quiet | Set logging level to output errors only. |
Workflow Description Language¶
version development
task happy_validator {
input {
Int? runtime_cpu
Int? runtime_memory
Int? runtime_seconds
Int? runtime_disks
File truthVCF
File compareVCF
String? reportPrefix
File reference
File reference_fai
File reference_amb
File reference_ann
File reference_bwt
File reference_pac
File reference_sa
File reference_dict
File? intervals
Boolean? version
String? scratchPrefix
String? keepScratch
File? falsePositives
File? stratification
String? stratificationRegion
String? stratificationFixchr
Boolean? writeVcf
Boolean? writeCounts
Boolean? noWriteCounts
Boolean? outputVtc
Boolean? preserveInfo
String? roc
Boolean? noRoc
String? rocRegions
String? rocFilter
Int? rocDelta
Int? ciAlpha
Boolean? noJson
Boolean? passOnly
Boolean? restrictRegions
Boolean? leftshift
Boolean? noLeftshift
Boolean? decompose
Boolean? noDecompose
Boolean? bcftoolsNorm
Boolean? fixchr
Boolean? noFixchr
Boolean? bcf
Boolean? somatic
Boolean? setGT
String? gender
Boolean? preprocessTruth
Boolean? usefilteredTruth
Boolean? preprocessingWindowSize
Boolean? adjustConfRegions
Boolean? noAdjustConfRegions
Boolean? noHaplotypeComparison
Int? windowSize
Int? xcmpEnumerationThreshold
String? xcmpExpandHapblocks
Int? threads
String? engine
String? engineVcfevalTemplate
Int? scmpDistance
String? logfile
Boolean? verbose
Boolean? quiet
}
command <<<
set -e
/opt/hap.py/bin/hap.py \
--report-prefix '~{select_first([reportPrefix, "generated"])}' \
--reference '~{reference}' \
~{if defined(intervals) then ("--target-regions '" + intervals + "'") else ""} \
~{if (defined(version) && select_first([version])) then "--version" else ""} \
~{if defined(scratchPrefix) then ("--scratch-prefix '" + scratchPrefix + "'") else ""} \
~{if defined(keepScratch) then ("--keep-scratch '" + keepScratch + "'") else ""} \
~{if defined(falsePositives) then ("--false-positives '" + falsePositives + "'") else ""} \
~{if defined(stratification) then ("--stratification '" + stratification + "'") else ""} \
~{if defined(stratificationRegion) then ("--stratification-region '" + stratificationRegion + "'") else ""} \
~{if defined(stratificationFixchr) then ("--stratification-fixchr '" + stratificationFixchr + "'") else ""} \
~{if (defined(writeVcf) && select_first([writeVcf])) then "--write-vcf" else ""} \
~{if (defined(writeCounts) && select_first([writeCounts])) then "--write-counts" else ""} \
~{if (defined(noWriteCounts) && select_first([noWriteCounts])) then "--no-write-counts" else ""} \
~{if (defined(outputVtc) && select_first([outputVtc])) then "--output-vtc" else ""} \
~{if (defined(preserveInfo) && select_first([preserveInfo])) then "--preserve-info" else ""} \
~{if defined(roc) then ("--roc '" + roc + "'") else ""} \
~{if (defined(noRoc) && select_first([noRoc])) then "--no-roc" else ""} \
~{if defined(rocRegions) then ("--roc-regions '" + rocRegions + "'") else ""} \
~{if defined(rocFilter) then ("--roc-filter '" + rocFilter + "'") else ""} \
~{if defined(rocDelta) then ("--roc-delta " + rocDelta) else ''} \
~{if defined(ciAlpha) then ("--ci-alpha " + ciAlpha) else ''} \
~{if (defined(noJson) && select_first([noJson])) then "--no-json" else ""} \
~{if (defined(passOnly) && select_first([passOnly])) then "--pass-only" else ""} \
~{if (defined(restrictRegions) && select_first([restrictRegions])) then "--restrict-regions" else ""} \
~{if (defined(leftshift) && select_first([leftshift])) then "--leftshift" else ""} \
~{if (defined(noLeftshift) && select_first([noLeftshift])) then "--no-leftshift" else ""} \
~{if (defined(decompose) && select_first([decompose])) then "--decompose" else ""} \
~{if (defined(noDecompose) && select_first([noDecompose])) then "--no-decompose" else ""} \
~{if (defined(bcftoolsNorm) && select_first([bcftoolsNorm])) then "--bcftools-norm" else ""} \
~{if (defined(fixchr) && select_first([fixchr])) then "--fixchr" else ""} \
~{if (defined(noFixchr) && select_first([noFixchr])) then "--no-fixchr" else ""} \
~{if (defined(bcf) && select_first([bcf])) then "--bcf" else ""} \
~{if (defined(somatic) && select_first([somatic])) then "--somatic" else ""} \
~{if (defined(setGT) && select_first([setGT])) then "--set-gt" else ""} \
~{if defined(gender) then ("--gender '" + gender + "'") else ""} \
~{if (defined(preprocessTruth) && select_first([preprocessTruth])) then "--preprocess-truth" else ""} \
~{if (defined(usefilteredTruth) && select_first([usefilteredTruth])) then "--usefiltered-truth" else ""} \
~{if (defined(preprocessingWindowSize) && select_first([preprocessingWindowSize])) then "--preprocessing-window-size" else ""} \
~{if (defined(adjustConfRegions) && select_first([adjustConfRegions])) then "--adjust-conf-regions" else ""} \
~{if (defined(noAdjustConfRegions) && select_first([noAdjustConfRegions])) then "--no-adjust-conf-regions" else ""} \
~{if (defined(noHaplotypeComparison) && select_first([noHaplotypeComparison])) then "--no-haplotype-comparison" else ""} \
~{if defined(windowSize) then ("--window-size " + windowSize) else ''} \
~{if defined(xcmpEnumerationThreshold) then ("--xcmp-enumeration-threshold " + xcmpEnumerationThreshold) else ''} \
~{if defined(xcmpExpandHapblocks) then ("--xcmp-expand-hapblocks '" + xcmpExpandHapblocks + "'") else ""} \
~{if defined(select_first([threads, select_first([runtime_cpu, 1])])) then ("--threads " + select_first([threads, select_first([runtime_cpu, 1])])) else ''} \
~{if defined(engine) then ("--engine '" + engine + "'") else ""} \
~{if defined(engineVcfevalTemplate) then ("--engine-vcfeval-template '" + engineVcfevalTemplate + "'") else ""} \
~{if defined(scmpDistance) then ("--scmp-distance " + scmpDistance) else ''} \
--logfile '~{select_first([logfile, "generated--log.txt"])}' \
~{if (defined(verbose) && select_first([verbose])) then "--verbose" else ""} \
~{if (defined(quiet) && select_first([quiet])) then "--quiet" else ""} \
'~{truthVCF}' \
'~{compareVCF}'
>>>
runtime {
cpu: select_first([runtime_cpu, 2, 1])
disks: "local-disk ~{select_first([runtime_disks, 20])} SSD"
docker: "pkrusche/hap.py:v0.3.9"
duration: select_first([runtime_seconds, 86400])
memory: "~{select_first([runtime_memory, 8, 4])}G"
preemptible: 2
}
output {
File extended = (select_first([reportPrefix, "generated"]) + ".extended.csv")
File summary = (select_first([reportPrefix, "generated"]) + ".summary.csv")
File metrics = (select_first([reportPrefix, "generated"]) + ".metrics.json.gz")
File vcf = (select_first([reportPrefix, "generated"]) + ".vcf.gz")
File vcf_tbi = (select_first([reportPrefix, "generated"]) + ".vcf.gz") + ".tbi"
File runinfo = (select_first([reportPrefix, "generated"]) + ".runinfo.json")
File rocOut = (select_first([reportPrefix, "generated"]) + ".roc.all.csv.gz")
File indelLocations = (select_first([reportPrefix, "generated"]) + ".roc.Locations.INDEL.csv.gz")
File indelPassLocations = (select_first([reportPrefix, "generated"]) + ".roc.Locations.INDEL.PASS.csv.gz")
File snpLocations = (select_first([reportPrefix, "generated"]) + ".roc.Locations.SNP.csv.gz")
File snpPassLocations = (select_first([reportPrefix, "generated"]) + ".roc.Locations.SNP.PASS.csv.gz")
}
}
Common Workflow Language¶
#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
label: Hap.py validation
doc: |-
usage: Haplotype Comparison
[-h] [-v] [-r REF] [-o REPORTS_PREFIX]
[--scratch-prefix SCRATCH_PREFIX] [--keep-scratch]
[-t {xcmp,ga4gh}] [-f FP_BEDFILE]
[--stratification STRAT_TSV]
[--stratification-region STRAT_REGIONS]
[--stratification-fixchr] [-V] [-X]
[--no-write-counts] [--output-vtc]
[--preserve-info] [--roc ROC] [--no-roc]
[--roc-regions ROC_REGIONS]
[--roc-filter ROC_FILTER] [--roc-delta ROC_DELTA]
[--ci-alpha CI_ALPHA] [--no-json]
[--location LOCATIONS] [--pass-only]
[--filters-only FILTERS_ONLY] [-R REGIONS_BEDFILE]
[-T TARGETS_BEDFILE] [-L] [--no-leftshift]
[--decompose] [-D] [--bcftools-norm] [--fixchr]
[--no-fixchr] [--bcf] [--somatic]
[--set-gt {half,hemi,het,hom,first}]
[--gender {male,female,auto,none}]
[--preprocess-truth] [--usefiltered-truth]
[--preprocessing-window-size PREPROCESS_WINDOW]
[--adjust-conf-regions] [--no-adjust-conf-regions]
[--unhappy] [-w WINDOW]
[--xcmp-enumeration-threshold MAX_ENUM]
[--xcmp-expand-hapblocks HB_EXPAND]
[--threads THREADS]
[--engine {xcmp,vcfeval,scmp-somatic,scmp-distance}]
[--engine-vcfeval-path ENGINE_VCFEVAL]
[--engine-vcfeval-template ENGINE_VCFEVAL_TEMPLATE]
[--scmp-distance ENGINE_SCMP_DISTANCE]
[--logfile LOGFILE] [--verbose | --quiet]
[_vcfs [_vcfs ...]]
positional arguments:
_vcfs Two VCF files.
requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
dockerPull: pkrusche/hap.py:v0.3.9
inputs:
- id: truthVCF
label: truthVCF
type: File
inputBinding:
position: 1
- id: compareVCF
label: compareVCF
type: File
inputBinding:
position: 2
- id: reportPrefix
label: reportPrefix
doc: (-o) Filename prefix for report output.
type:
- string
- 'null'
default: generated
inputBinding:
prefix: --report-prefix
- id: reference
label: reference
doc: (-r) Specify a reference file.
type: File
secondaryFiles:
- pattern: .fai
- pattern: .amb
- pattern: .ann
- pattern: .bwt
- pattern: .pac
- pattern: .sa
- pattern: ^.dict
inputBinding:
prefix: --reference
- id: intervals
label: intervals
doc: (-T) Restrict analysis to given (dense) regions (using -T in bcftools).
type:
- File
- 'null'
inputBinding:
prefix: --target-regions
- id: version
label: version
doc: (-v) Show version number and exit.
type:
- boolean
- 'null'
inputBinding:
prefix: --version
- id: scratchPrefix
label: scratchPrefix
doc: Directory for scratch files.
type:
- string
- 'null'
inputBinding:
prefix: --scratch-prefix
- id: keepScratch
label: keepScratch
doc: Filename prefix for scratch report output. Annotation format in input VCF file.
type:
- string
- 'null'
inputBinding:
prefix: --keep-scratch
- id: falsePositives
label: falsePositives
doc: |-
(-f) False positive / confident call regions (.bed or .bed.gz). Calls outside these regions will be labelled as UNK.
type:
- File
- 'null'
inputBinding:
prefix: --false-positives
- id: stratification
label: stratification
doc: |2-
Stratification file list (TSV format -- first column is region name, second column is file name).
type:
- File
- 'null'
inputBinding:
prefix: --stratification
- id: stratificationRegion
label: stratificationRegion
doc: Add single stratification region, e.g. --stratification-region TEST:test.bed
type:
- string
- 'null'
inputBinding:
prefix: --stratification-region
- id: stratificationFixchr
label: stratificationFixchr
doc: ' Add chr prefix to stratification files if necessary'
type:
- string
- 'null'
inputBinding:
prefix: --stratification-fixchr
- id: writeVcf
label: writeVcf
doc: (-V) Write an annotated VCF.
type:
- boolean
- 'null'
inputBinding:
prefix: --write-vcf
- id: writeCounts
label: writeCounts
doc: (-X) Write advanced counts and metrics.
type:
- boolean
- 'null'
inputBinding:
prefix: --write-counts
- id: noWriteCounts
label: noWriteCounts
doc: Do not write advanced counts and metrics.
type:
- boolean
- 'null'
inputBinding:
prefix: --no-write-counts
- id: outputVtc
label: outputVtc
doc: |-
Write VTC field in the final VCF which gives the counts each position has contributed to.
type:
- boolean
- 'null'
inputBinding:
prefix: --output-vtc
- id: preserveInfo
label: preserveInfo
doc: |-
When using XCMP, preserve and merge the INFO fields in truth and query. Useful for ROC computation.
type:
- boolean
- 'null'
inputBinding:
prefix: --preserve-info
- id: roc
label: roc
doc: Select a feature to produce a ROC on (INFO feature, QUAL, GQX, ...).
type:
- string
- 'null'
inputBinding:
prefix: --roc
- id: noRoc
label: noRoc
doc: |-
Disable ROC computation and only output summary statistics for more concise output.
type:
- boolean
- 'null'
inputBinding:
prefix: --no-roc
- id: rocRegions
label: rocRegions
doc: |2-
Select a list of regions to compute ROCs in. By default, only the '*' region will produce ROC output (aggregate variant counts).
type:
- string
- 'null'
inputBinding:
prefix: --roc-regions
- id: rocFilter
label: rocFilter
doc: ' Select a filter to ignore when making ROCs.'
type:
- string
- 'null'
inputBinding:
prefix: --roc-filter
- id: rocDelta
label: rocDelta
doc: ' Minimum spacing between ROC QQ levels.'
type:
- int
- 'null'
inputBinding:
prefix: --roc-delta
- id: ciAlpha
label: ciAlpha
doc: |-
Confidence level for Jeffrey's CI for recall, precision and fraction of non-assessed calls.
type:
- int
- 'null'
inputBinding:
prefix: --ci-alpha
- id: noJson
label: noJson
doc: Disable JSON file output.
type:
- boolean
- 'null'
inputBinding:
prefix: --no-json
- id: passOnly
label: passOnly
doc: Keep only PASS variants.
type:
- boolean
- 'null'
inputBinding:
prefix: --pass-only
- id: restrictRegions
label: restrictRegions
doc: (-R) Restrict analysis to given (sparse) regions (using -R in bcftools).
type:
- boolean
- 'null'
inputBinding:
prefix: --restrict-regions
- id: leftshift
label: leftshift
doc: (-L) Left-shift variants safely.
type:
- boolean
- 'null'
inputBinding:
prefix: --leftshift
- id: noLeftshift
label: noLeftshift
doc: Do not left-shift variants safely.
type:
- boolean
- 'null'
inputBinding:
prefix: --no-leftshift
- id: decompose
label: decompose
doc: Decompose variants into primitives. This results in more granular counts.
type:
- boolean
- 'null'
inputBinding:
prefix: --decompose
- id: noDecompose
label: noDecompose
doc: (-D) Do not decompose variants into primitives.
type:
- boolean
- 'null'
inputBinding:
prefix: --no-decompose
- id: bcftoolsNorm
label: bcftoolsNorm
doc: |-
Enable preprocessing through bcftools norm -c x -D (requires external preprocessing to be switched on).
type:
- boolean
- 'null'
inputBinding:
prefix: --bcftools-norm
- id: fixchr
label: fixchr
doc: |-
Add chr prefix to VCF records where necessary (default: auto, attempt to match reference).
type:
- boolean
- 'null'
inputBinding:
prefix: --fixchr
- id: noFixchr
label: noFixchr
doc: |-
Do not add chr prefix to VCF records (default: auto, attempt to match reference).
type:
- boolean
- 'null'
inputBinding:
prefix: --no-fixchr
- id: bcf
label: bcf
doc: |-
Use BCF internally. This is the default when the input file is in BCF format already. Using BCF can speed up temp file access, but may fail for VCF files that have broken headers or records that don't comply with the header.
type:
- boolean
- 'null'
inputBinding:
prefix: --bcf
- id: somatic
label: somatic
doc: |-
Assume the input file is a somatic call file and squash all columns into one, putting all FORMATs into INFO + use half genotypes (see also --set-gt). This will replace all sample columns and replace them with a single one. This is used to treat Strelka somatic files Possible values for this parameter: half / hemi / het / hom / half to assign one of the following genotypes to the resulting sample: 1 | 0/1 | 1/1 | ./1. This will replace all sample columns and replace them with a single one.
type:
- boolean
- 'null'
inputBinding:
prefix: --somatic
- id: setGT
label: setGT
doc: |-
This is used to treat Strelka somatic files Possible values for this parameter: half / hemi / het / hom / half to assign one of the following genotypes to the resulting sample: 1 | 0/1 | 1/1 | ./1. This will replace all sample columns and replace them with a single one.
type:
- boolean
- 'null'
inputBinding:
prefix: --set-gt
- id: gender
label: gender
doc: |-
({male,female,auto,none}) Specify gender. This determines how haploid calls on chrX get treated: for male samples, all non-ref calls (in the truthset only when running through hap.py) are given a 1/1 genotype.
type:
- string
- 'null'
inputBinding:
prefix: --gender
- id: preprocessTruth
label: preprocessTruth
doc: |-
Preprocess truth file with same settings as query (default is to accept truth in original format).
type:
- boolean
- 'null'
inputBinding:
prefix: --preprocess-truth
- id: usefilteredTruth
label: usefilteredTruth
doc: |-
Use filtered variant calls in truth file (by default, only PASS calls in the truth file are used)
type:
- boolean
- 'null'
inputBinding:
prefix: --usefiltered-truth
- id: preprocessingWindowSize
label: preprocessingWindowSize
doc: |2-
Preprocessing window size (variants further apart than that size are not expected to interfere).
type:
- boolean
- 'null'
inputBinding:
prefix: --preprocessing-window-size
- id: adjustConfRegions
label: adjustConfRegions
doc: |2-
Adjust confident regions to include variant locations. Note this will only include variants that are included in the CONF regions already when viewing with bcftools; this option only makes sure insertions are padded correctly in the CONF regions (to capture these, both the base before and after must be contained in the bed file).
type:
- boolean
- 'null'
inputBinding:
prefix: --adjust-conf-regions
- id: noAdjustConfRegions
label: noAdjustConfRegions
doc: ' Do not adjust confident regions for insertions.'
type:
- boolean
- 'null'
inputBinding:
prefix: --no-adjust-conf-regions
- id: noHaplotypeComparison
label: noHaplotypeComparison
doc: (--unhappy) Disable haplotype comparison (only count direct GT matches as
TP).
type:
- boolean
- 'null'
inputBinding:
prefix: --no-haplotype-comparison
- id: windowSize
label: windowSize
doc: |-
(-w) Minimum distance between variants such that they fall into the same superlocus.
type:
- int
- 'null'
inputBinding:
prefix: --window-size
- id: xcmpEnumerationThreshold
label: xcmpEnumerationThreshold
doc: ' Enumeration threshold / maximum number of sequences to enumerate per block.'
type:
- int
- 'null'
inputBinding:
prefix: --xcmp-enumeration-threshold
- id: xcmpExpandHapblocks
label: xcmpExpandHapblocks
doc: ' Expand haplotype blocks by this many basepairs left and right.'
type:
- string
- 'null'
inputBinding:
prefix: --xcmp-expand-hapblocks
- id: threads
label: threads
doc: Number of threads to use. Comparison engine to use.
type:
- int
- 'null'
inputBinding:
prefix: --threads
valueFrom: |-
$([inputs.runtime_cpu, 2, 1].filter(function (inner) { return inner != null })[0])
- id: engine
label: engine
doc: ' {xcmp,vcfeval,scmp-somatic,scmp-distance} Comparison engine to use.'
type:
- string
- 'null'
inputBinding:
prefix: --engine
- id: engineVcfevalTemplate
label: engineVcfevalTemplate
doc: |2-
Vcfeval needs the reference sequence formatted in its own file format (SDF -- run rtg format -o ref.SDF ref.fa). You can specify this here to save time when running hap.py with vcfeval. If no SDF folder is specified, hap.py will create a temporary one.
type:
- string
- 'null'
inputBinding:
prefix: --engine-vcfeval-template
- id: scmpDistance
label: scmpDistance
doc: ' For distance-based matching, this is the distance between variants to use.'
type:
- int
- 'null'
inputBinding:
prefix: --scmp-distance
- id: logfile
label: logfile
doc: Write logging information into file rather than to stderr
type:
- string
- 'null'
default: generated--log.txt
inputBinding:
prefix: --logfile
- id: verbose
label: verbose
doc: Raise logging level from warning to info.
type:
- boolean
- 'null'
inputBinding:
prefix: --verbose
- id: quiet
label: quiet
doc: Set logging level to output errors only.
type:
- boolean
- 'null'
inputBinding:
prefix: --quiet
outputs:
- id: extended
label: extended
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".extended.csv"))
outputEval: $((inputs.reportPrefix.basename + ".extended.csv"))
loadContents: false
- id: summary
label: summary
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".summary.csv"))
outputEval: $((inputs.reportPrefix.basename + ".summary.csv"))
loadContents: false
- id: metrics
label: metrics
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".metrics.json.gz"))
outputEval: $((inputs.reportPrefix.basename + ".metrics.json.gz"))
loadContents: false
- id: vcf
label: vcf
type: File
secondaryFiles:
- pattern: .tbi
outputBinding:
glob: $((inputs.reportPrefix + ".vcf.gz"))
outputEval: $((inputs.reportPrefix.basename + ".vcf.gz"))
loadContents: false
- id: runinfo
label: runinfo
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".runinfo.json"))
outputEval: $((inputs.reportPrefix.basename + ".runinfo.json"))
loadContents: false
- id: rocOut
label: rocOut
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".roc.all.csv.gz"))
outputEval: $((inputs.reportPrefix.basename + ".roc.all.csv.gz"))
loadContents: false
- id: indelLocations
label: indelLocations
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".roc.Locations.INDEL.csv.gz"))
outputEval: $((inputs.reportPrefix.basename + ".roc.Locations.INDEL.csv.gz"))
loadContents: false
- id: indelPassLocations
label: indelPassLocations
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".roc.Locations.INDEL.PASS.csv.gz"))
outputEval: $((inputs.reportPrefix.basename + ".roc.Locations.INDEL.PASS.csv.gz"))
loadContents: false
- id: snpLocations
label: snpLocations
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".roc.Locations.SNP.csv.gz"))
outputEval: $((inputs.reportPrefix.basename + ".roc.Locations.SNP.csv.gz"))
loadContents: false
- id: snpPassLocations
label: snpPassLocations
type: File
outputBinding:
glob: $((inputs.reportPrefix + ".roc.Locations.SNP.PASS.csv.gz"))
outputEval: $((inputs.reportPrefix.basename + ".roc.Locations.SNP.PASS.csv.gz"))
loadContents: false
stdout: _stdout
stderr: _stderr
baseCommand: /opt/hap.py/bin/hap.py
arguments: []
hints:
- class: ToolTimeLimit
timelimit: |-
$([inputs.runtime_seconds, 86400].filter(function (inner) { return inner != null })[0])
id: happy_validator