Genome Polarisation with diemr
Chapter 1 Introduction
Genome polarisation is a genome-painting approach based on the likelihood-based diagnostic index expectation maximisation (diem) algorithm (Baird et al. 2023). It identifies which alleles of single-nucleotide variants (SNV) belong to either side of a barrier to gene flow, co-estimating both the assignment of individuals to a barrier side and the diagnosticity of each marker, meaning how consistently individuals on one side are homozygous for the allele associated with that side.
By inferring which parts of the genome correspond to each parental lineage, genome polarisation provides a direct view of how barriers to gene flow shape genomic architecture. It can detect and quantify hybridisation, distinguish introgressed from non-introgressed regions, and reveal how species boundaries evolve during speciation and diversification. Compared with methods such as STRUCTURE, ADMIXTURE, or PCA, which summarise population structure statistically, genome polarisation explicitly identifies the genomic segments that define the divergence between taxa or lineages.
The diagnostic index computed by diem highlights markers that are most informative for the primary axis of genetic differentiation. These diagnostic loci can then be used to describe patterns of hybridisation, assess barrier strength, or visualise the genomic distribution of ancestry.
This book provides a step-by-step guide to performing genome polarisation analyses in R
using the diemr
package, also available from CRAN. The package includes functions for input validation, file-format conversion, visualisation, and diagnostic summaries, with examples based on typical genomic datasets such as variant call format (VCF) files and SNP matrices. By the end of the book, users will be able to run the complete analysis workflow, interpret its outputs, and generate graphical representations of genome polarisation.
The diem algorithm itself is also implemented in Mathematica
and Python
, available here, allowing reproducibility and interoperability across analytical environments.