Chapter 6 Visualisation of polarised genotypes

Natália Martínková

Visualisation is a key step in interpreting the results of genome polarisation. The diemr package provides several functions for graphical inspection of polarised genotypes, including plotPolarized for genome-wide patterns, plotMarkerAxis for adding chromosome information, and plotDeFinetti for individual genotype composition summaries.

6.1 Plotting polarised genotypes

The function plotPolarized displays genotypes in either a rectangular or a circular (iris-style) layout. Rows represent individuals and columns represent single-nucleotide variants (SNVs). Colours correspond to genotype states: by default, purple and green mark the two homozygote classes (0 and 2), yellow marks heterozygotes (1), and white indicates missing data (_).

The polarised genotypes must first be imported with correct site polarity. The polarity will be in memory in element res$markerPolarity described in (Chapter 2) or in the obligatory output file MarkerDiagnosticsWithOptimalPolarities.txt (Chapter 5.1.1) in the column newPolarity.

# Import polarised genotypes
gen <- importPolarized(
  file = system.file("extdata", "data7x10.txt", package = "diemr"),
  changePolarity = c(TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
  ChosenInds = 1:7
)

# Calculate individual hybrid indices from the imported genotypes 
HI <- apply(gen, 1, function(x) pHetErrOnStateCount(sStateCount(x)))[1, ]

6.1.1 Rectangular plot

The rectangular layout is useful for viewing the order of markers along the genome and for comparing individual hybrid indices along the vertical axis. This is the default plot, and uses the matrix of polarised genotypes and the numeric vector of individual hybrid indices.

plotPolarized(genotypes = gen, HI = HI)

Each individual is plotted in order of increasing hybrid index, from bottom to top. A colour transition across the plot reveals the genomic gradient between parental sides.

6.1.1.1 Adding chromosome information

If the input data were generated from a VCF file, marker positions are already available. The file *-includedSites.txt produced by vcf2diem stores physical coordinates for each marker and can be used to draw chromosome names and position ticks beneath the genotype matrix. This file links plotted markers to their genomic locations.

The following example uses a snippet of SNV data with Myotis bat diversity (Harazim et al. 2021).

# Prepare input data
myo <- system.file("extdata", "myotis.vcf", package = "diemr")
vcf2diem(SNP = myo, filename = "myo")
inds <- 1:14


# Polarise genotypes assuming diploid data
res <- diem("myo-001.txt", ChosenInds = inds, ploidy = FALSE)

# Import polarised genotypes
gen <- importPolarized("myo-001.txt", 
    changePolarity = res$markerPolarity, 
    ChosenInds = inds)
    
# Calculate hybrid indices  
HI <- apply(gen, 1, function(x) pHetErrOnStateCount(sStateCount(x)))[1, ]

# Plot polarised genotypes with the marker axis 
plotPolarized(
  genotypes = gen,
  HI = HI,
  addMarkerAxis = TRUE,
  includedSites = "myo-includedSites.txt",
  tickDist = 100
)

If the default marker axis needs adjustments, such as here, where accession numbers overlap, use plotMarkerAxis directly. This returns an axisInfo list so you can modify labels before redrawing. plotMarkerAxis also accepts additional graphical arguments.

# Draw plot without axis
plotPolarized(
  genotypes = gen,
  HI = HI,
  addMarkerAxis = FALSE,
  xlab = ""
)

# Draw marker axis, and capture axis information
axisInfo <- plotMarkerAxis(includedSites = "myo-includedSites.txt", tickDist = 100)

The axisInfo list will now contain five elements with plotting data. Let’s shorten the contig names and convert the tick labels to base pairs (default is to display milions of base pairs). This list can then be used for plotting directly, without the need to recalculate the position of the axis elements from the *-includedSites.txt file.

print(axisInfo)
# $CHROMbreaks
# [1] 0.5 1.5 5.5 6.5 7.5 8.5
# 
# $CHROMnamesPos
# [1] 1.0 3.5 6.0 7.0 8.0
# 
# $CHROMnames
# [1] "KE212673.1" "KE222443.1" "KE222801.1" "KE224361.1" "KE227386.1"
# 
# $ticksPos
# [1] 1.5 6.5 7.5
# 
# $ticksNames
# [1] 1e-04 1e-04 1e-04

# Update list elements
axisInfo$CHROMnames <- c("con1", "con2", "con3", "con4", "con5")
axisInfo$ticksNames <- c(100, 100, 100)

# Plot polarised genotypes with updated marker axis labels
plotPolarized(
  genotypes = gen,
  HI = HI,
  addMarkerAxis = FALSE,
  xlab = ""
)
plotMarkerAxis(axisInfo = axisInfo)

6.2 Circular (iris) plot

The circular layout displays the same information radially, emphasising the continuity of ancestry along chromosomes. Individuals are ordered from the centre (lowest hybrid index) to the periphery (highest).

plotPolarized(genotypes = gen, HI = HI, type = "circular")
plotMarkerAxis(axisInfo = axisInfo, labels.facing = "in", major.tick.length = 1)

The marker axis in circular plots is drawn analogically to rectangular plots (Chapter 6.1.1.1), but chromosome names and tick marks are shown as separate elements. Chromosome names appear in an inner track with alternating white and grey shading, while tick marks showing distances along each chromosome are plotted on the outer rim of the genotype rings.

This layout is compact for displaying genomes of many individuals and highlights symmetry or asymmetry between parental contributions. However, circular plotting of large datasets (more than \(10^5\) markers) can be slow. To monitor progress, set showProgress = TRUE in plotPolarized. This prints the percentage of plotted individuals to the standard output during rendering.

References

Harazim, Markéta, Lubomı́r Piálek, Jiri Pikula, Veronika Seidlová, Jan Zukal, Erik Bachorec, Tomáš Bartonička, Tomasz Kokurewicz, and Natália Martı́nková. 2021. “Associating Physiological Functions with Genomic Variability in Hibernating Bats.” Evolutionary Ecology 35: 291–308. https://doi.org/10.1007/s10682-020-10096-4.