Making sense of the output¶
dEploid outputs text files with user-specified prefix with flag -o.
Log file records
dEploid version, input file paths, parameter used and proportion estimates at the final iteration.
Log likelihood of the MCMC chain.
MCMC updates of the proportion estimates.
Haplotypes at the final iteration in plain text file.
-vcfOut is turned on, haplotypes are saved at the final iteration in VCF format.
-exportPostProb is turned on, posterior probabilities of the final iteration of strain [i].
-ibd is used. ‘DEploid’ executes first learns the number of strain and their proportions with an identity by descent model (‘DEploid-IBD’). Then it fixes the number of strains and proportions and train the haplotypes, and train the haplotypes using the original DEploid algorithm (‘DEploid-classic’). The staged output are labelled with “.ibd” and “.classic” respectively, and followed by the prefix.
-best is used. ‘DEploid-BEST’ executes the deconvolution algorithms in an optimised sequence to best report the number of strains, proportions and haplotypes. The program (‘DEploid-Lasso’) learns the number of strain with optimised reference panel; “.chooseK” is appended to the prefix for these output (NOTE: likelihood is not tracked in this case). It (‘DEploid-IBD’) then fixes the number of strains and tune the strain proportions with an identity by descent model; “.ibd” is appended to the prefix for these output. Finally, the program (‘DEploid-Lasso’) fixes the number of strains and proportions, and uses the optimised reference panel again to train and report the haplotypes; “.final” is appended to the prefix for these output. When
-vcfOut is applied, this will only be the final haplotypes.
Example of output interpretation¶
Example 1. Standard deconvolution output¶
$ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \ -plaf data/exampleData/labStrains.eg.PLAF.txt \ -noPanel -o PG0390-CNopanel -seed 1 $ utilities/interpretDEploid.r -vcf data/exampleData/PG0390-C.eg.vcf.gz \ -plaf data/exampleData/labStrains.eg.PLAF.txt \ -dEprefix PG0390-CNopanel \ -o PG0390-CNopanel -ring
The top three figures are the same as figures show in :ref:
data example <sec-eg>, with a small addition of inferred WSAF marked in blue, in the top right figure.
- The bottom left figure show the relative proportion change history of the MCMC chain.
- The middle figure show the correlation between the expected and observed allele frequency in sample.
- The right figure shows changes in MCMC likelihood .
This panel figure shows all allele frequencies within sample across all 14 chromosomes. Expected and observed WSAF are marked in blue and red respectively.
Example 2. Haplotype painting from a given panel¶
dEploid can take its output haplotypes, and calculate the posterior probability of each deconvoluted strain with the reference panel. In this example, the reference panel includes four lab strains: 3D7 (red), Dd2 (dark orange), HB3 (orange) and 7G8 (yellow).
$ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \ -plaf data/exampleData/labStrains.eg.PLAF.txt \ -panel data/exampleData/labStrains.eg.panel.txt \ -o PG0390-CPanel -seed 1 -k 3 $ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \ -plaf data/exampleData/labStrains.eg.PLAF.txt \ -panel data/exampleData/labStrains.eg.panel.txt \ -o PG0390-CPanel \ -painting PG0390-CPanel.hap \ -initialP 0.8 0 0.2 -k 3 $ utilities/interpretDEploid.r -vcf data/exampleData/PG0390-C.eg.vcf.gz \ -plaf data/exampleData/labStrains.eg.PLAF.txt \ -dEprefix PG0390-CPanel \ -o PG0390-CPanel -ring
Example 3. Deconvolution followed by IBD painting¶
In addition to lab mixed samples, here we show example of
dEploid deconvolute field sample PD0577-C.