25 May 2022

Haplotype-resolved petunia assembly and a pangenomics approach to analyze pangenome diversity

Conventional genome assemblies collapse the genetic information of a diploid or polyploid individual into a single-haplotype representation. A single-haplotype representation of a genome is useful for some of the downstream analyses, for example for variant discovery, however it also has disadvantages. Most importantly, it omits the information on alternative haplotype(s), which might be important for traits of interest. This is particularly inconvenient for species that harbor large genetic diversity, as well as for traits that tend to be associated to complex regions (e.g. resistance-related genes).

To overcome the limitations of a conventional (haploidized) assembly, we generated a haplotype-resolved assembly of the most important ornamental nightshade, petunia (Petunia hybrida) using the latest PacBio HiFi sequencing technology combined with a Phase Genomics Hi-C kit for scaffolding. The haplotype-resolved assembly is comprised of two sets of seven chromosomes, with each haplotype approximately 1.3 Gb in size (2n = 14), as well as a chloroplast and mitochondrion assembly. Remarkably, the PacBio HiFi data in combination with Hi-C achieved higher contiguity than the gold-standard trio-binning approach that uses sequencing data of parents of the sequenced P. hybrida individual. Neither contained evidence of haplotype switches.

Based on the statistics, the haplotype-resolved genome assembly was a remarkable success. However, a genome assembly is only as valuable as the information that we can derive from it. We thus next annotated the petunia assembly and integrated it into a pangenome with two publicly available (haploidized) P. axilaris and P. inflata genome assemblies. Using the graph-based pangenomics toolkit PanTools we analyzed gene presence/absence polymorphisms, and found species-specific regions as well as larger structural variants. For this analysis, the availability of information on both haplotypes for the highly heterozygous P. hybrida sample provides crucial information on candidate genes for a number of traits.

This work was presented by our colleague Bart Nijland at the AGBT Ag meeting in San Diego on April 4, 2022, and by Peter van Dam at Plant Genomes Online on April 28, 2022.

 

Other innovations

Genotyping array design
Genotyping array design

Genotyping array design

Tailored marker selection allowing for genotyping regardless of species, genome size and ploidy level.

14 Mar. 2021 View project
Association analysis
Association analysis

Association analysis

Advanced statistical models are needed to increase the power to discover (new) associations between genetic and phenotypic variation.

1 Mar. 2021 View project
Lab automation
Lab automation

Lab automation

With increasing sample numbers comes the need for automating the most valuable and most often applied protocols at Genetwister.

14 Mar. 2021 View project
Marker discovery
Marker discovery

Marker discovery

Discovery of markers in all plant species by next-generation sequencing for accelerated plant breeding

28 Feb. 2021 View project
Long Read Sequencing
Long Read Sequencing

Long Read Sequencing

Using state-of-the-art long read sequencing technologies, Genetwister builds high quality genome assemblies of novel species and obtains long-range genomic information such as haplotypes and structural variants.

14 Mar. 2021 View project
Experimental Design
Experimental Design

Experimental Design

Appropriate experimental design and data analysis of breeding trials are essential. Genetwister offers assistance in the experimental design, taking into account all practical considerations of the breeding program.

1 Mar. 2021 View project
High-throughput sequencing and genotyping
High-throughput sequencing and genotyping

High-throughput sequencing and ...

With increasing sample numbers and marker panel sizes, it is valuable to increase the throughput of sequencing and genotyping procedures. Genetwister develops cost-effective methods and applies available methods for allowing high-throughput genomics.

4 Feb. 2021 View project