Genome Duplication and Immune Expansion: Comparative Dynamics of NBS Disease Resistance Genes in Diploid and Polyploid Plants

Ellie Ward Dec 02, 2025 187

This article comprehensively explores the expansion and evolutionary dynamics of Nucleotide-Binding Site (NBS) disease resistance genes in diploid versus polyploid plants.

Genome Duplication and Immune Expansion: Comparative Dynamics of NBS Disease Resistance Genes in Diploid and Polyploid Plants

Abstract

This article comprehensively explores the expansion and evolutionary dynamics of Nucleotide-Binding Site (NBS) disease resistance genes in diploid versus polyploid plants. Drawing from recent genomic studies across diverse species, we examine how whole-genome duplication events shape the repertoire, architecture, and functional diversification of this critical gene family. The scope encompasses foundational principles of NBS gene classification, methodologies for genome-wide identification and analysis, troubleshooting complexities in polyploid genomes, and validation through comparative genomics and expression profiling. Aimed at researchers and scientists in genetics and drug development, this review synthesizes current evidence to elucidate the genetic trade-offs between diploid and polyploid strategies for pathogen resistance, offering insights for future crop improvement and biomedical analogies.

The Plant Immune Repertoire: Understanding NBS Gene Diversity and Ploidy-Based Expansion

Plants have evolved a sophisticated, two-layered innate immune system to defend against diverse pathogens [1]. The first layer, Pattern-Triggered Immunity (PTI), is initiated by cell-surface receptors that recognize conserved microbial patterns. The second layer, Effector-Triggered Immunity (ETI), is mediated by intracellular resistance (R) proteins that detect specific pathogen effector proteins, activating a stronger immune response often accompanied by a hypersensitive response (HR) and programmed cell death (PCD) [1]. The Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR or NLR) gene family constitutes the largest and most comprehensively studied class of these R proteins, with approximately 80% of cloned R genes belonging to this family [1] [2]. These intracellular immune receptors function as specialized sensors that directly or indirectly recognize pathogen-encoded effectors, triggering robust defense signaling cascades [3].

Domain Architecture and Classification of NBS-LRR Proteins

NBS-LRR proteins are modular, typically comprising three core domains: a variable N-terminal domain, a central Nucleotide-Binding Site (NBS) domain, and a C-terminal Leucine-Rich Repeat (LRR) domain [3] [2].

Protein Domains and Functions

N-Terminal Domain: This domain determines the major NBS-LRR subclasses. The two primary types are the Toll/Interleukin-1 Receptor (TIR) homology region (TNL) and the Coiled-Coil (CC) motif (CNL). A third, smaller subclass features an N-terminal Resistance to Powdery Mildew 8 (RPW8) domain (RNL) [1] [2]. The N-terminal domain is primarily involved in downstream signal transduction [3].
Central NBS Domain: Also known as the NB-ARC domain, this is a conserved nucleotide-binding adaptor shared by APAF-1, plant R proteins, and CED-4. It contains characteristic motifs for binding and hydrolyzing ATP/GTP, functioning as an ADP-ATP switch system that regulates the protein's ON/OFF state [1] [3] [2].
C-Terminal LRR Domain: This domain adopts a slender, arc-shaped structure that maximizes surface area for protein-protein interactions [4]. It is primarily involved in pathogen recognition and also interacts with the NBS domain to maintain the protein in an auto-inhibited state before pathogen perception [3].

Classification and Genomic Distribution

NBS-LRR genes are classified based on their domain composition. Proteins with all three domains (N-terminus, NBS, LRR) are termed "typical," while those lacking one or more domains (e.g., NBS-only, TIR-NBS, CC-NBS) are "atypical" and often function as adaptors or regulators [1] [5]. The table below summarizes the classification and distribution of NBS-LRR genes across various plant species.

Table 1: Classification and Count of NBS-LRR Genes in Selected Plant Species

Species	Total NBS Genes	TNL	CNL	RNL	Atypical	Reference
Arabidopsis thaliana	149	83	51	4	58	[6]
Salvia miltiorrhiza	196	2	75	1	118	[1]
Nicotiana benthamiana	156	5	25	4	122	[5]
Oryza sativa (Rice)	505	0	Predominant	Limited	Not Specified	[1] [2]
Solanum tuberosum (Potato)	447	Not Specified	Not Specified	Not Specified	Not Specified	[1]

The number of NBS-LRR genes varies significantly between species, reflecting adaptations to different pathogenic environments [3]. Notably, TNL genes are absent in monocots like rice and wheat, while they are present in eudicots like Arabidopsis [1] [3] [2].

Molecular Mechanisms of Pathogen Recognition and Signaling

NBS-LRR proteins activate defense responses upon detection of pathogen effectors. The current model involves switching from an ADP-bound (OFF) state to an ATP-bound (ON) state, triggering a conformational change that activates downstream signaling [5].

Recognition Strategies

NBS-LRR proteins employ two primary strategies for pathogen sensing, as illustrated in the diagram below:

Direct Recognition: The LRR domain of the NBS-LRR protein physically interacts with the pathogen effector. The first validated case was the interaction between the LRR domain of the rice R protein Pi-ta and the fungal effector Avr-Pita [4] [1].
Indirect Recognition (Guard Hypothesis): The NBS-LRR protein "guards" a host protein that is the target of a pathogen effector. The NBS-LRR protein detects the modification of this host target by the effector and subsequently activates defense signaling [6].

Signaling and Downstream Defense Activation

Upon activation, the N-terminal domain transduces the defense signal. TNL and CNL proteins often initiate distinct but converging signaling pathways. Key downstream events include:

Transcriptional Reprogramming: Activation of defense-related genes.
Hypersensitive Response (HR): Localized programmed cell death at the infection site to restrict pathogen spread [4] [6].
Systemic Acquired Resistance (SAR): Establishment of long-lasting, broad-spectrum resistance throughout the plant.

Recent studies show that PTI and ETI can act synergistically to enhance plant immunity [1].

Evolution and Genomic Architecture of NBS-LRR Genes

Evolutionary Origins and Dynamics

NBS-LRR genes are ancient, with origins predating the emergence of land plants. The central NB domain is found in proteins from bacteria, protists, and algae, where it is associated with domains like WD40 or TPR [3]. The recombination of the NB domain with the LRR domain is believed to have occurred in the ancestors of green plants, creating the core structure of the NLR immune receptor [3]. This gene family has been shaped by a continuous "arms race" with rapidly evolving pathogens, leading to extraordinary diversification [4] [3].

The evolution of NBS-LRR genes is characterized by birth-and-death dynamics, where new genes are created by duplication, and existing genes are lost or become pseudogenes [3]. These genes are under strong diversifying selection, particularly in the LRR region, where positive selection acts on solvent-exposed residues to alter recognition specificities [4]. This allows plants to keep pace with evolving pathogen effectors.

Genomic Organization and the Context of Ploidy

A hallmark of NBS-LRR genes is their non-random clustered organization in plant genomes [6] [3]. These clusters can be homogeneous (containing similar NLR types) or heterogeneous (containing different NLR classes or even other receptor types like RLPs and RLKs) [3]. This arrangement facilitates the generation of new resistance specificities through mechanisms such as unequal crossing-over, gene conversion, and ectopic recombination [4] [6].

This genomic architecture is highly relevant to the thesis context of gene expansion in diploid versus tetraploid plants. Polyploidization, or whole-genome duplication (WGD), is a major evolutionary force that provides a reservoir of duplicated genes, including NBS-LRRs [2]. In the allotetraploid genome of cotton (Gossypium hirsutum), for example, the evolution of NBS-LRR sequences after separation from its diploid parents (G. raimondii and G. arboreum) was influenced by "polyploidisation, natural and artificial selection, hybrid necrosis, duplication and recombination" [7]. These processes can lead to the shedding of redundant genes and the evolution of new ones, shaping the disease resistance profile of the polyploid. Comparative analyses suggest that the NBS-LRR repertoire in tetraploid cotton evolved through the gradual accumulation of mutants and positive selection, leading to a slow rate of divergence from its diploid progenitors [7]. The interplay between WGD and small-scale duplications (e.g., tandem duplications) is a key driver of the complex and dynamic NLR repertoires observed in plants today [2].

Experimental Protocols for NBS-LRR Gene Identification and Functional Analysis

Genome-Wide Identification and Phylogenetic Analysis

Objective: To identify all NBS-LRR encoding genes in a plant genome and determine their evolutionary relationships [1] [2] [5].

Methodology:

Sequence Retrieval: Obtain the complete genome sequence and annotation file for the target species.
HMMER Search: Use the HMMER software package (e.g., HMMsearch) with a hidden Markov model (HMM) of the NBS (NB-ARC) domain (Pfam: PF00931) to scan the proteome. Use a strict E-value cutoff (e.g., < 1e-20) to identify candidate genes [6] [5].
Domain Validation: Submit the candidate protein sequences to domain databases (Pfam, SMART, CDD) to confirm the presence and completeness of the NBS domain and identify associated domains (TIR, CC, LRR, RPW8) [5].
Phylogenetic Tree Construction:
- Perform multiple sequence alignment of the NBS domains using tools like Clustal W or MAFFT [5].
- Construct a phylogenetic tree using the Maximum Likelihood method (e.g., in MEGA7 or FastTree) with bootstrap support (e.g., 1000 replicates) to assess node reliability [5].
- Classify genes into clades (CNL, TNL, RNL) based on their phylogenetic grouping and domain composition [1] [5].

Functional Validation via Virus-Induced Gene Silencing (VIGS)

Objective: To determine the functional role of a specific NBS-LRR gene in disease resistance [2].

Methodology:

Vector Construction: Clone a 200-500 base pair fragment of the candidate NBS-LRR gene into a VIGS vector (e.g., TRV-based pYL156 vector).
Plant Infiltration: Introduce the recombinant VIGS vector into Agrobacterium tumefaciens and infiltrate the bacteria into the leaves of resistant plants. Control plants are infiltrated with an empty vector.
Phenotypic Analysis:
- After successful gene silencing (knockdown), challenge the plants with the target pathogen.
- Monitor for a loss of resistance phenotype (e.g., increased disease symptoms or higher viral titer) in the silenced plants compared to controls.
- Use qRT-PCR to confirm the reduction in transcript levels of the target NBS-LRR gene and to quantify pathogen biomass [2].

Table 2: Key Reagents and Resources for NBS-LRR Research

Reagent/Resource	Function/Application	Example Tools/Databases
HMM Profile (PF00931)	Identifies the conserved NBS domain in protein sequences during HMMER searches.	Pfam Database [5]
Genome Databases	Source of genomic sequences and annotations for gene identification and comparative analysis.	NCBI, Phytozome, Plaza [2]
Domain Databases	Validates the presence and structure of protein domains (TIR, CC, LRR, NBS).	SMART, Conserved Domain Database (CDD), Pfam [5]
VIGS Vectors	Allows transient silencing of target genes to assess function in plant-pathogen interactions.	TRV (Tobacco Rattle Virus)-based vectors [2]
Cis-Element Analysis Tools	Identifies potential regulatory elements in promoter regions of NBS-LRR genes.	PlantCARE Database [5]

NBS-LRR genes are the cornerstone of the plant innate immune system, providing a highly adaptable and diverse arsenal against pathogens. Their unique domain architecture, coupled with a dynamic genomic organization that is significantly influenced by evolutionary pressures including polyploidy, allows for rapid adaptation. Understanding the molecular mechanisms of NBS-LRR function and evolution, especially in the context of ploidy, is crucial for developing future strategies in crop improvement and disease resistance breeding. The experimental frameworks and resources outlined here provide a foundation for ongoing research into this critical gene family.

Nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes represent the largest family of plant disease resistance (R) genes, encoding intracellular immune receptors that play a crucial role in pathogen detection and defense activation [8]. These genes are fundamental to the plant immune system, enabling recognition of diverse pathogens including fungi, bacteria, and viruses [9]. During plant evolution, NBS-LRR genes have undergone significant expansion, creating complex families that vary considerably between species—a diversity particularly evident when comparing diploid and tetraploid plants [10]. The genomic architecture and evolutionary dynamics of these genes are therefore essential for understanding plant-pathogen interactions and developing disease-resistant crops.

Plant NBS-LRR proteins are modular in structure and can be classified into distinct subfamilies based on their N-terminal domains: TNL (Toll/Interleukin-1 receptor-NBS-LRR), CNL (Coiled-Coil-NBS-LRR), and RNL (RPW8-NBS-LRR) [11] [9]. All three subfamilies share a central nucleotide-binding site (NBS) domain and C-terminal leucine-rich repeats (LRRs), but differ in their N-terminal signaling domains, which dictates their specific functions in immune signaling [12]. The distribution and expansion of these subfamilies vary dramatically across plant genomes, with important implications for disease resistance mechanisms in both diploid and polyploid species [13] [10].

Structural Domains and Classification

Core Architectural Components

All functional NBS-LRR proteins contain three fundamental domains that work in concert to mediate pathogen recognition and immune activation:

N-terminal Domain: Serves as the signaling module and defines the primary subfamily classification. TNLs contain a TIR (Toll/Interleukin-1 Receptor) domain, CNLs feature a Coiled-Coil (CC) domain, while RNLs possess an RPW8 (Resistance to Powdery Mildew 8) domain [11] [9].
Central NBS (NB-ARC) Domain: Functions as a molecular switch by binding and hydrolyzing nucleotides (ATP/GTP). This domain contains several conserved motifs (P-loop, kinase-2, kinase-3a, GLPL, and MHDL) that facilitate nucleotide-dependent conformational changes [14] [15].
C-terminal LRR Domain: Composed of multiple leucine-rich repeats that form a solenoid structure involved in protein-protein interactions, primarily responsible for direct or indirect pathogen effector recognition [8].

Table 1: Core Domains of Plant NBS-LRR Proteins

Domain	Structural Features	Functional Role	Conserved Motifs
TIR	Flavodoxin-like fold with 5 α-helices surrounding 5 β-strands [12]	Signal transduction; self-association for signaling complex formation [12]	Defined surfaces on αA, αD, and αE helices for interaction [12]
CC	Largely helical structure; specific architecture debated [12]	Signal transduction; some involved in effector perception [12]	EDVID motif in CCEDVID class [12]
RPW8	Compact helical domain [9]	Downstream signal transduction; helper function [11] [9]	Shared similarity with RPW8 proteins [9]
NBS (NB-ARC)	STAND family ATPase; nucleotide-binding pocket [11]	Molecular switch; ADP/ATP exchange triggers activation [8] [15]	P-loop, kinase-2, kinase-3a, GLPL, MHDL [14]
LRR	Solenoid structure with parallel β-sheet lining inner surface [8]	Effector recognition; autoinhibition in resting state [8]	Variable leucine-rich repeats determine specificity [8]

Classification System and Domain Architectures

Beyond the three major subfamilies (TNL, CNL, RNL), NBS-encoding genes are further categorized based on their domain combinations, resulting in eight distinct structural types as systematically identified across multiple plant genomes [10]:

Table 2: Classification of NBS-Encoding Genes Based on Domain Architecture

Gene Type	Domain Structure	Representative Species and Counts	Functional Notes
TNL	TIR-NBS-LRR	G. barbadense: 44 genes [10]	Sensor NLRs; direct pathogen detection [11]
CNL	CC-NBS-LRR	G. hirsutum: 165 genes [10]	Sensor NLRs; direct or indirect pathogen detection [11]
RNL	RPW8-NBS-LRR	G. barbadense: 9 genes [10]	Helper NLRs; signal amplification [11]
TN	TIR-NBS	G. raimondii: 14 genes [10]	Truncated forms; potential regulatory functions
CN	CC-NBS	G. hirsutum: 89 genes [10]	Truncated forms; potential regulatory functions
RN	RPW8-NBS	G. barbadense: 2 genes [10]	Truncated forms; potential regulatory functions
NL	NBS-LRR	G. barbadense: 210 genes [10]	Lack defined N-terminal domain
N	NBS	G. barbadense: 171 genes [10]	Minimal structure; possible signaling components

Diagram 1: NBS-LRR protein domain architecture showing the three major subfamilies with their characteristic N-terminal domains (TIR, CC, RPW8), central NBS domain, and C-terminal LRR domain.

Genomic Distribution and Expansion Patterns

Variation Across Plant Genomes

The distribution of NBS gene subfamilies exhibits remarkable variation across plant species, reflecting different evolutionary paths and adaptation to distinct pathogen pressures:

Table 3: Comparative Distribution of NBS Gene Subfamilies Across Plant Species

Plant Species	Ploidy	Total NBS	TNL	CNL	RNL	Key Features
Arabidopsis thaliana	Diploid	~200	Present	Present	Present	Model dicot with all subfamilies [14]
Helianthus annuus (Sunflower)	Diploid	352	77	100	13	All chromosomes; clusters on chromosome 13 [14]
Akebia trifoliata	Diploid	73	19	50	4	Low NBS count; CNL-dominated [9]
Dioscorea rotundata (Yam)	Diploid	167	0	166	1	Monocot; lacks TNL genes [13]
Gossypium hirsutum (Cotton)	Allotetraploid	588	5	165	6	Inherited mainly from G. arboreum [10]
Gossypium barbadense (Cotton)	Allotetraploid	682	44	143	9	Inherited mainly from G. raimondii [10]
Solanum lycopersicum (Tomato)	Diploid	238	~58	~87	~13	Clustered distribution [16]

Impact of Polyploidization on NBS Gene Expansion

Whole genome duplication (polyploidization) events have profoundly influenced NBS gene evolution, creating opportunities for functional diversification:

Allotetraploid Cotton: Comparative analysis of diploid and allotetraploid cotton species reveals asymmetric evolution of NBS-encoding genes. G. hirsutum inherited more NBS genes from its A-genome diploid progenitor (G. arboreum), while G. barbadense inherited more from its D-genome progenitor (G. raimondii) [10]. This asymmetric inheritance correlates with disease resistance patterns, as G. raimondii and G. barbadense show stronger resistance to Verticillium wilt compared to G. arboreum and G. hirsutum [10].
Differential Subfamily Expansion: In tetraploid cotton species, TNL genes show the most dramatic proportional changes—about 7-fold higher in G. raimondii and G. barbadense compared to G. arboreum and G. hirsutum [10]. This suggests TNLs may play significant roles in specific disease resistances, particularly against Verticillium wilt.
Genomic Clustering: NBS genes are typically distributed non-randomly across chromosomes, often forming gene clusters [14] [16]. In sunflower, one-third of NBS gene clusters are located on a single chromosome (chromosome 13) [14], while in tomato, approximately 58% of NBS genes form multiple gene clusters [16]. This clustered organization facilitates the generation of sequence diversity through recombination and unequal crossing over.

Functional Mechanisms in Plant Immunity

Sensor and Helper NLR Paradigm

Recent structural and functional studies have clarified the distinct roles of different NLR subfamilies in plant immunity:

Sensor NLRs (TNLs and CNLs): Function primarily in pathogen recognition through either direct effector binding or indirect monitoring of host proteins [11] [8]. Upon activation, they undergo conformational changes that enable the formation of oligomeric complexes called resistosomes [11] [15].
Helper NLRs (RNLs): Comprise two lineages—NRG1 (N-required gene 1) and ADR1 (activated disease resistance gene 1) [9]. RNLs function downstream of sensor NLRs, transducing immune signals and amplifying defense responses [11]. They are essential for TNL signaling, with NRG1 proteins acting specifically in TNL signal transduction [13].

Activation Mechanisms and Signaling Pathways

NBS-LRR proteins employ sophisticated molecular mechanisms for pathogen perception and immune activation:

Direct Recognition: Some NBS-LRR proteins physically bind pathogen effectors through their LRR domains. Examples include the rice Pi-ta protein binding to M. grisea effector AVR-Pita, and flax L proteins interacting with rust fungus AvrL567 effectors [8].
Indirect Recognition (Guard Hypothesis): Many NBS-LRR proteins monitor the integrity of host "guardee" proteins that are targeted by pathogen effectors. The Arabidopsis RIN4 protein is guarded by multiple NLRs (RPM1, RPS2) and is modified by different bacterial effectors (AvrRpm1, AvrB, AvrRpt2) [8].
Integrated Decoy Domains: Some NLRs incorporate additional domains that mimic authentic pathogen targets, enabling direct effector recognition while avoiding manipulation of true host targets [11].

Diagram 2: NLR signaling paradigm showing sensor NLRs (TNLs, CNLs) detecting pathogen effectors and signaling through helper NLRs (RNLs) to activate defense responses.

Experimental Approaches and Methodologies

Genome-Wide Identification Protocols

Standardized bioinformatics pipelines have been established for comprehensive identification and classification of NBS-encoding genes:

Sequence Retrieval: Obtain complete genome sequences and annotated protein datasets from relevant databases (Phytozome, NCBI) [14].
Domain Identification: Perform HMMER searches using hidden Markov models of the NB-ARC domain (PF00931) as query with E-value cutoff of 1.0 [9]. Additional domain identification includes:
- TIR domain: PF01582
- RPW8 domain: PF05659
- LRR domain: PF08191
- CC domains: Identified using Coiled-coil prediction tools with threshold 0.5 [9] [17]
Classification and Validation: Classify genes based on domain architecture and validate using Pfam database (E-value 10^-4) to confirm presence of conserved NBS domain [9].
Genomic Distribution Mapping: Map chromosomal locations and identify gene clusters (tandemly arranged homologous genes) and singletons (isolated genes) [14] [9].

The Scientist's Toolkit: Essential Research Reagents

Table 4: Key Research Reagents and Resources for NBS Gene Studies

Reagent/Resource	Function/Application	Example Usage
HMMER Suite	Hidden Markov Model profiling for domain identification	Identifying NB-ARC domains in protein sequences [14] [9]
Pfam Database	Curated database of protein families and domains	Verifying NBS domain presence (PF00931) [9]
Coiled-coil Prediction Tools	Computational prediction of coiled-coil domains	Identifying CC domains in CNL proteins [9] [17]
Phytozome/NCBI Databases	Genomic data repositories	Retrieving genome sequences and annotations [14]
MEME Suite	Motif-based sequence analysis tools	Identifying conserved motifs in NBS domains [9]
RNA-seq Data	Transcriptome profiling	Analyzing expression patterns of NBS genes [9] [13]

Evolutionary Perspectives in Diploid vs. Tetraploid Plants

The evolutionary dynamics of NBS genes differ significantly between diploid and polyploid plants, with important implications for disease resistance:

Differential Selection Pressure: Analysis of homologous NBS genes in tomato revealed that most experience purifying selection (Ka/Ks < 1), conserving their functional roles while allowing for diversification [16].
Expansion Mechanisms: Tandem and dispersed duplications are the primary forces driving NBS gene expansion. In Akebia trifoliata, these mechanisms produced 33 and 29 genes respectively [9], while in yam, tandem duplication served as the major force for cluster arrangement despite no whole-genome duplication [13].
Subfamily-Specific Evolutionary Patterns: TNL genes show the most dramatic variation between species, being completely absent in monocots [13], while showing 7-fold proportional differences between cotton species [10]. This suggests distinct evolutionary constraints and adaptive trajectories for each subfamily.
Expression Divergence: NBS genes typically show low basal expression with tissue-specific patterns [14] [9] [13]. In Dioscorea rotundata, tubers and leaves display relatively higher NBS gene expression than stems and flowers [13], while in Akebia trifoliata, certain NBS genes show increased expression during later fruit development stages in rind tissues [9].

The structural and functional diversification of TNL, CNL, and RNL subfamilies represents a sophisticated plant immune strategy that has evolved through complex genomic mechanisms. The comparative analysis between diploid and tetraploid species reveals asymmetric evolution of NBS genes, with profound implications for disease resistance. The distinctive distribution patterns—where TNLs are absent in monocots, CNLs dominate in most species, and RNLs are consistently rare but conserved—highlight both evolutionary constraints and adaptive flexibility.

Understanding these genomic dynamics provides crucial insights for crop improvement strategies, particularly in leveraging wild relatives and polyploidization events to enhance disease resistance. Future research focusing on the signaling mechanisms of resistosome formation and the precise roles of helper NLRs will further illuminate this sophisticated plant immune system, enabling more targeted approaches to developing durable disease resistance in agricultural crops.

Ploidy defines the number of complete sets of chromosomes in a cell, forming a fundamental aspect of genomic architecture across the plant kingdom [18]. While diploid organisms possess two sets of chromosomes (2n), one from each parent, polyploid organisms contain more than two sets, a condition widespread in plants [19]. This guide distinguishes between two primary types of polyploidy: autopolyploidy, which involves chromosome set duplication within a single species, and allopolyploidy, which results from hybridization between different species followed by chromosome doubling [20]. Understanding these distinctions is crucial, as ploidy level significantly influences fundamental genetic behaviors, including chromosome pairing during meiosis, gene expression patterns, and the potential for adaptive evolution [21]. Recent research, particularly in the field of plant-pathogen interactions, has highlighted how these different genomic configurations can shape the evolution of key gene families, such as the nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes that constitute the plant immune system's primary defense arsenal [2] [22].

Core Concepts and Definitions

The Ploidy Terminology Framework

A clear understanding of ploidy requires precise terminology, which is summarized in the table below.

Table 1: Key Terminology in Ploidy Analysis

Term	Symbol	Definition
Basic Chromosome Number	x	The number of chromosomes in a single, complete set (genome) [18] [19].
Haploid Number	n	The number of chromosomes found in a gamete. In diploids, n = x; in polyploids, n is a multiple of x [18].
Somatic Number	2n	The total number of chromosomes in a somatic cell [19].
Monoploid	1x	An organism or cell with one set of chromosomes [18].
Diploid	2n=2x	An organism or cell with two sets of chromosomes, one from each parent [18].
Autopolyploid	e.g., 4x, 6x	An organism with multiple chromosome sets derived from a single species [20].
Allopolyploid	e.g., 4x, 6x	An organism with chromosome sets derived from two or more different progenitor species [20].

Mechanisms of Ploidy Change

Polyploidization occurs through distinct mechanisms, leading to the formation of autopolyploids or allopolyploids. The following diagram visualizes these pathways and their key outcomes.

Figure 1: Pathways of Autopolyploid and Allopolyploid Formation.

Natural and Artificial Induction: These mechanisms can occur naturally through errors in cell division or be induced artificially in the laboratory. The chemical colchicine is widely used to disrupt spindle fiber formation during mitosis, preventing chromosome segregation and leading to genome doubling [19]. This is a key tool for creating polyploids for research or breeding.

Distinguishing Between Diploid, Autopolyploid, and Allopolyploid Genomes

The genomic structure and meiotic behavior of diploid, autopolyploid, and allopolyploid organisms dictate their genetic characteristics and research applications.

Table 2: Comparative Characteristics of Diploid and Polyploid Genomes

Feature	Diploid	Autopolyploid	Allopolyploid
Genomic Constitution	Two homologous sets (AA)	Multiple homologous sets from one species (e.g., AAAA)	Multiple homoeologous sets from different species (e.g., AABB) [20]
Meiotic Pairing	Bivalents (pairs)	Multivalents or random bivalents [21]	Preferential bivalent pairing (like diploids) [19]
Heterozygosity Level	Standard	Can be high due to polysomic inheritance [20]	Fixed heterozygosity from divergent genomes [21]
Genetic Segregation	Disomic (2:2)	Polysomic (complex, e.g., 2:2 or 3:1) [21]	Disomic (2:2)
Fertility	Normal	Often reduced due to meiotic instability [19] [21]	Typically restored after genome doubling [19]
Example Crops	Maize, Rice	Potato, Alfalfa [20]	Bread Wheat, Cotton, Canola [19]

Key Genetic Consequences:

Gene Dosage and Expression: Polyploidy increases gene copy number, which can lead to a dosage effect and higher expression levels. However, this relationship is not always proportional and can lead to non-additive gene expression and epigenetic changes, especially in allopolyploids [21].
Cell Size and Biomass: Polyploids often exhibit larger cell sizes, which can translate to increased organ size and biomass, a trait frequently exploited in agriculture [20].
Masking of Deleterious Alleles: The redundancy of multiple gene copies can mask the effects of recessive deleterious mutations, providing a buffer against genetic load [21].

Ploidy in Action: NBS Gene Expansion in Plant Immunity

The connection between ploidy and the evolution of disease resistance is a key area of modern plant research. Nucleotide-binding site (NBS) genes, which encode major plant immune receptors, provide a powerful model for studying this interaction.

NBS-LRR Genes and Effector-Triggered Immunity

NBS-LRR proteins are modular, typically consisting of a variable N-terminal domain (TIR, CC, or RPW8), a central NBS domain that binds nucleotides, and a C-terminal LRR domain responsible for pathogen recognition [2] [22]. They mediate effector-triggered immunity (ETI), a robust defense response activated when a specific pathogen effector is recognized [22].

Experimental Analysis of NBS Genes in Polyploid Plants

Recent studies have leveraged genomic and transcriptomic approaches to investigate how polyploidy influences the NBS gene family. The following diagram outlines a generalized workflow for such an analysis.

Figure 2: Workflow for Comparative NBS Gene Analysis.

Detailed Methodologies:

Genome-Wide Identification and Classification:
- Method: Retrieve genome assemblies from public databases like NCBI or Phytozome [2] [23]. Use profile hidden Markov models (HMMs), such as those from the Pfam database (e.g., PF00931 for NB-ARC), to scan proteomes with a strict e-value cutoff (e.g., 1.1e-50) [2]. Genes containing the NB-ARC domain are classified as NBS genes and further categorized based on their domain architecture (TNL, CNL, RNL, etc.) [2] [22].
Evolutionary and Duplication Analysis:
- Method: Utilize tools like OrthoFinder with the DIAMOND sequence aligner and MCL clustering algorithm to group NBS genes into orthogroups (OGs) across species [2]. Analyze duplication patterns by determining the ratio of tandemly duplicated genes versus segmentally/whole-genome duplicated genes. Calculate nonsynonymous (Ka) to synonymous (Ks) substitution rates to infer selection pressure on NBS genes [23].
Expression Profiling under Stress:
- Method: Extract RNA-seq data (e.g., FPKM values) from databases or conduct new experiments on polyploid plants under biotic (pathogen) and abiotic (drought, salt) stresses. Compare expression in tolerant versus susceptible accessions and different tissues. Validate key findings using quantitative reverse-transcription PCR (qRT-PCR) on cDNA samples with gene-specific primers [2] [23].
Functional Validation using VIGS:
- Method: To confirm the role of a specific NBS gene, use Virus-Induced Gene Silencing (VIGS). Clone a fragment of the candidate NBS gene (e.g., GaNBS from cotton) into a VIGS vector (e.g., TRV-based). Inoculate resistant plants and subsequently challenge them with the pathogen. A significant increase in disease symptoms or viral titer in silenced plants compared to controls demonstrates the gene's functional role in resistance [2].

Key Findings: Ploidy's Impact on NBS Gene Repertoires

Research comparing diploid and polyploid species has revealed several key trends:

Expansion and Diversification: Polyploid species often exhibit significant expansion of their NBS gene repertoires. A 2024 study identified over 12,000 NBS genes across 34 plant species, noting that allopolyploid crops like wheat can harbor thousands of these genes [2]. This expansion is frequently driven by tandem duplications and retained segmental duplications from the polyploidization event itself [2] [23].
Subgenome Dominance in Allopolyploids: In allopolyploids, NBS genes can be asymmetrically retained or lost between the different subgenomes, a phenomenon known as subgenome dominance. This creates a more complex and diverse reservoir of resistance specificities compared to their diploid progenitors [2].
Contraction in Some Lineages: Not all polyploids show expansion. Some orchid species, despite being ancient polyploids, have relatively small NBS repertoires, indicating lineage-specific evolutionary paths involving significant gene loss after polyploidization [22].

Table 3: Key Research Reagents and Resources for Ploidy and NBS Gene Studies

Reagent / Resource	Function / Application
Colchicine	A chemical used to induce polyploidy by inhibiting spindle formation during mitosis, leading to chromosome doubling [19].
Pfam HMM Profiles (e.g., NB-ARC)	Curated protein family models used for the bioinformatic identification of NBS domain-containing genes from genomic data [2].
OrthoFinder Software	A tool for comparative genomics that identifies orthologous groups of genes across multiple species, crucial for tracing NBS gene evolution [2].
VIGS Vectors (e.g., TRV1, TRV2)	Viral vectors used for Virus-Induced Gene Silencing to rapidly knock down gene expression in plants for functional validation of candidate NBS genes [2].
Haplotype-Resolved Genome Assemblies	High-quality reference genomes that distinguish between parental subgenomes. Essential for accurately cataloging and studying genes in allopolyploids [24].

Distinguishing between diploid, autopolyploid, and allopolyploid genomes is fundamental to understanding plant evolution, genetics, and breeding. The distinct meiotic behaviors and genomic interactions in these systems have direct consequences for the expansion and contraction of key gene families. Research into NBS-LRR genes has demonstrated that polyploidy, particularly allopolyploidy, serves as a major engine for generating diversity in the plant immune system. The combination of advanced genomic sequencing, sophisticated bioinformatic analyses, and functional genetic tools continues to unravel how ploidy level shapes a plant's capacity to adapt to biotic stresses, offering novel insights for future crop improvement strategies.

The nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family represents one of the most dynamic and evolutionarily plastic components of plant genomes, serving as the cornerstone of the plant innate immune system. These genes encode intracellular receptors that recognize pathogen effectors and initiate effector-triggered immunity, playing a crucial role in plant survival and adaptation. The expansion and contraction of NBS gene families across plant lineages reveal fascinating evolutionary narratives, driven by the relentless pressure of co-evolving pathogens. This technical guide examines the documented patterns of NBS gene expansion, from extreme copy number proliferation to remarkable genomic scarcity, providing researchers with comprehensive insights into the mechanisms, methodologies, and implications of this genetic dynamism within the context of diploid and tetraploid plant research.

Extreme Diversity in NBS Gene Repertoire Across Plant Lineages

Quantitative Analysis of NBS Gene Distribution

The copy number of NBS-LRR genes varies dramatically across plant species, reflecting diverse evolutionary paths and adaptive strategies. This variation spans several orders of magnitude, from mere dozens to over a thousand copies per genome.

Table 1: Documented NBS-LRR Gene Counts Across Plant Species

Plant Species	Family/Type	Ploidy	NBS-LRR Count	TNL	CNL	RNL	Key References
Eucalyptus grandis	Woody angiosperm	Diploid	1,215	760	455	-	[25]
Hordeum vulgare (Barley)	Cereal crop	Diploid	467	-	-	-	[26]
Solanum tuberosum (Potato)	Solanaceae	Tetraploid	361	-	-	-	[26]
Triticum urartu	Wheat progenitor	Diploid	146	-	-	-	[26]
Glycine max (Soybean)	Legume	Paleopolyploid	103	-	-	-	[26]
Oryza sativa (Rice)	Cereal crop	Diploid	159	-	-	-	[26]
Arabidopsis thaliana	Brassicaceae	Diploid	51	-	-	-	[26]
Passiflora edulis (Purple)	Passion fruit	Diploid	25	-	25	-	[26]
Physcomitrella patens	Moss	Haploid	~25	-	-	-	[2]
Selaginella moellendorffii	Lycophyte	Diploid	~2	-	-	-	[2]

The data reveal several striking patterns. First, recently sequenced genomes like Eucalyptus grandis contain exceptionally high numbers of NBS-LRR genes (>1,200), representing over 1% of its protein-coding genes [25]. Second, basal land plants like mosses and lycophytes maintain minimal NBS-LRR repertoires (approximately 25 and 2 genes, respectively), suggesting that major expansion occurred after the divergence of vascular plants [2]. Third, among angiosperms, no clear correlation exists between genome size or organismal complexity and NBS-LRR gene number, indicating lineage-specific expansion and contraction events.

Subfamily Distribution and Evolutionary Implications

The distribution between the two major NBS-LRR subfamilies—TNL (TIR-NBS-LRR) and CNL (CC-NBS-LRR)—also shows significant variation across plant lineages:

Table 2: NBS-LRR Subfamily Distribution Across Select Species

Species	Total NBS-LRR	TNL Count	CNL Count	TNL:CNL Ratio	Notable Features
Eucalyptus grandis	1,215	760	455	1.67:1	Higher TIL proportion than other woody species [25]
Nine Solanaceae species	819	182	583	0.31:1	CNL dominance; 54 RNL genes identified [27]
Lathyrus sativus (Grass pea)	274	124	150	0.83:1	Balanced distribution [28]
Arabidopsis thaliana	51	-	-	-	Reference for comparative studies [26]

Monocots, particularly cereals, notably lack TNL genes, suggesting a fundamental evolutionary divergence in immune receptor architecture between monocots and eudicots. The elevated TNL ratio in Eucalyptus grandis compared to other woody species represents a distinctive evolutionary pathway that warrants further investigation [25].

Genomic Architecture and Arrangement of NBS Genes

Physical Clustering as a Hallmark of NBS Genomic Organization

A predominant characteristic of NBS-LRR genes across plant genomes is their non-random distribution, with a significant majority organized into physical clusters. In Eucalyptus grandis, 76% of NBS-LRR genes are arranged in clusters of three or more genes, with only 24% existing as singletons [25]. Similar clustering patterns are observed across angiosperms, suggesting this organization provides evolutionary advantages.

These clusters frequently reside in chromosomal termini regions, as documented in Solanaceae species, where NBS-LRR genes predominantly localize to chromosome ends [27]. This distribution positions them in recombinationally active regions, potentially facilitating more rapid generation of diversity through unequal crossing over and gene conversion.

Figure 1: Evolutionary dynamics of NBS gene cluster formation and diversification. Tandem duplication events followed by pathogen-driven selection generate clusters that serve as diversity reservoirs for novel recognition specificities.

Expression Hotspots and Functional Correlations

Physical clustering correlates with coordinated expression patterns, forming expression hotspots within genomes. Research on Eucalyptus grandis challenged with fungal pathogens (Chrysoporthe austroafricana) and insect pests (Leptocybe invasa) revealed that specific NBS-LRR clusters show differential expression in resistant versus susceptible genotypes [25]. These expression hotspots frequently include incomplete NBS-LRR genes (lacking full domain structures), suggesting potential roles in immune signaling or regulation.

The transcriptional activity within these clusters often extends beyond annotated complete genes, with intergenic regions and partial genes showing significant expression, indicating possible epigenetic coordination or the presence of unannotated functional elements within these genomic regions.

Mechanisms Driving NBS Gene Expansion and Diversification

Whole Genome Duplication vs. Small-Scale Duplication

NBS-LRR gene family expansion occurs through two primary mechanisms: whole genome duplication (WGD) and small-scale duplication (SSD), each contributing differently to gene repertoire evolution.

Table 3: Duplication Mechanisms in NBS-LRR Gene Expansion

Mechanism	Genomic Scale	Impact on NBS Genes	Documented Examples
Whole Genome Duplication (WGD)	Entire genome	Creates complete duplicate sets; subsequent fractionation	Solanaceae: Recent WGT shaped NBS-LRR distribution [27]
Tandem Duplication	Localized region	Rapid expansion of specific subfamilies; clustered arrangement	Eucalyptus grandis: Defense gene enrichment in tandem arrays [25]
Segmental Duplication	Chromosomal segments	Duplicates gene blocks; maintains linkage relationships	Passion fruit: 17 segmental duplication gene pairs identified [26]
Transposition-Mediated	Single gene	Disperses genes to new genomic contexts	Potential mechanism for singleton distribution

In Solanaceae, whole genome triplication (WGT) has significantly influenced NBS-LRR family expansion, with the most recent WGT event shaping the current gene distribution [27]. Following polyploidization, most duplicated genes are lost through fractionation, but NBS-LRR genes often show retention biases, potentially due to the adaptive advantage of maintaining diverse immune receptors.

Birth-and-Death Evolution and Diversifying Selection

The NBS-LRR gene family evolves through a birth-and-death process where new genes are created by duplication, and existing genes are lost through pseudogenization or deletion. This dynamic process, coupled with diversifying selection particularly in LRR domains involved in pathogen recognition, generates the extensive diversity observed in NBS-LRR repertoires.

This evolutionary model explains why closely related species can have dramatically different NBS-LRR gene complements and why orthologous relationships are often difficult to establish across species boundaries. The high turnover rate creates species-specific NBS-LRR landscapes shaped by their unique pathogen exposure histories.

Research Methodologies for NBS Gene Identification and Analysis

Computational Identification Pipeline

Standardized bioinformatics approaches have been developed for comprehensive identification and classification of NBS-LRR genes across plant genomes.

Figure 2: Standardized workflow for genome-wide identification and analysis of NBS-LRR genes, integrating domain validation, phylogenetic classification, and evolutionary analysis.

Key Experimental Protocols:

Initial Identification: Perform BLASTp searches using reference NBS-LRR sequences (e.g., from Arabidopsis thaliana) against target proteomes with E-value cutoff < 1e-5 and alignment length >500 bp [25]. Concurrently, conduct HMMER searches using NB-ARC domain (PF00931) profiles from Pfam database [28] [29].
Domain Validation: Verify candidate sequences using:
- PfamScan (e < 1e-04) for NB-ARC domain confirmation [25]
- NCBI Conserved Domain Database (CDD) for domain architecture [28]
- SMART database and InterPro for additional domain validation [30]
- Paircoil2 for coiled-coil domain prediction [26]
Subfamily Classification: Construct phylogenetic trees using:
- Multiple sequence alignment with MUSCLE or ClustalW [28] [29]
- Maximum likelihood analysis with RAxML or MEGA software [28] [29]
- Bootstrap validation (typically 1000 replicates) [30]
- Reference sequences from established NBS-LRR subfamilies

Evolutionary and Expression Analysis

Evolutionary Dynamics:
- Calculate synonymous (Ks) and non-synonymous (Ka) substitution rates using Ka/Ks calculator in TBtools [30]
- Identify duplication events (tandem, segmental, WGD) using MCScanX [30] [26]
- Analyze syntenic relationships across related species [29] [30]
Expression Profiling:
- Extract RNA-seq data from public databases (NCBI, species-specific databases) [2] [28]
- Analyze differential expression under biotic and abiotic stresses
- Validate key candidates using qRT-PCR with stress treatments [28] [26]
- Apply machine learning approaches (e.g., Random Forest) to identify multi-stress responsive genes [26]

Table 4: Essential Research Reagents for NBS Gene Studies

Reagent/Resource	Specifications	Application	Example Implementation
Reference Genomes	High-quality, annotated assemblies from Phytozome, NCBI, species-specific databases	Identification of candidate NBS-LRR genes	E. grandis v2.0; S. lycopersicum SL4.0 [27] [25]
Domain Databases	Pfam (PF00931), CDD, SMART, InterPro	Domain validation and architecture analysis	Confirming NB-ARC, TIR, CC, LRR domains [28] [30]
HMMER Software	Version 3.3.2; e-value cutoff 1e-5 to 1e-20	Hidden Markov Model-based gene identification	Building species-specific HMMs for NB-ARC domain [30] [25]
Phylogenetic Tools	MEGA, RAxML, OrthoFinder	Evolutionary classification and orthogroup analysis	Subfamily classification (TNL, CNL, RNL) [28] [27]
Synteny Analysis	MCScanX, TBtools, CIRCOS	Visualization of genomic relationships	Identifying duplication events and collinearity [30] [26]
Expression Data	RNA-seq libraries, qPCR primers	Expression validation under stress conditions	Differential expression analysis in passion fruit under CMV and cold stress [26]

The documented patterns of NBS gene expansion reveal a complex evolutionary landscape shaped by both genomic constraints and pathogen pressures. From the striking scarcity in basal lineages to the extreme copy number in angiosperms like Eucalyptus grandis, these patterns underscore the dynamic nature of plant immune system evolution. The prevalence of physical clustering and the retention of duplicated genes in polyploid genomes highlight the importance of genomic architecture in facilitating rapid adaptation.

For researchers engaged in diploid and tetraploid plant research, these patterns offer both challenges and opportunities. The extensive variation in NBS gene content complicates direct orthology transfers between species, yet provides a rich reservoir of genetic diversity for crop improvement. Understanding the mechanisms driving NBS gene expansion—from whole genome duplications to tandem amplifications—enables more targeted approaches for mining resistance genes and engineering durable disease resistance in crop plants.

As genomic technologies advance, particularly in long-read sequencing and pan-genome construction, our understanding of NBS gene expansion patterns will continue to refine, offering new insights into the evolutionary arms race between plants and their pathogens and facilitating the development of more resilient crop varieties through molecular breeding and biotechnology.

Nucleotide-binding site (NBS) genes constitute one of the largest families of plant disease resistance (R) genes and play a critical role in effector-triggered immunity [2]. The proliferation and evolution of these genes are fundamentally driven by duplication mechanisms, primarily tandem duplication (TD) and whole-genome duplication (WGD) [2]. Understanding the relative contributions of these mechanisms is essential for deciphering plant pathogen co-evolution and has significant implications for crop improvement strategies. This review synthesizes current knowledge on how TD and WGD shape the NBS gene repertoire in plants, with particular emphasis on differences observed between diploid and tetraploid genomes.

The expansion of NBS genes represents a genomic response to relentless pathogen pressure. These genes encode intracellular immune receptors that detect pathogen effectors and initiate defense responses [31]. The "birth-and-death" evolution model, characterized by repeated gene duplication and loss, generates the diversity necessary for recognizing rapidly evolving pathogens [32]. Recent genomic analyses across diverse plant taxa have revealed that both small-scale (TD) and large-scale (WGD) duplication events contribute significantly to this evolutionary arms race, though their impacts differ substantially in terms of genomic organization, evolutionary trajectory, and functional consequences [2] [33].

Molecular Characterization and Classification of NBS Genes

Structural Architecture and Domain Organization

NBS-encoding genes typically exhibit a modular structure consisting of three fundamental domains: an N-terminal signaling domain, a central nucleotide-binding adaptor (NB-ARC or NBS) domain, and C-terminal leucine-rich repeats (LRRs) [2] [31]. The N-terminal domain falls into two major categories—Toll/Interleukin-1 receptor (TIR) or coiled-coil (CC)—defining two principal classes of NBS-LRR genes: TNLs and CNLs [10] [31]. A third subclass features an N-terminal RPW8 domain (RNLs) [2].

Comprehensive genomic surveys have identified remarkable diversity in NBS domain architecture. A recent study analyzing 12,820 NBS-domain-containing genes across 34 plant species classified them into 168 distinct structural classes, encompassing both classical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific configurations (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS) [2]. This architectural diversity underscores the dynamic evolutionary history of this gene family.

Genomic Distribution and Organization

NBS genes are typically distributed non-randomly across plant genomes, frequently forming dense clusters [10]. This clustered organization is particularly conducive to tandem duplication and sequence diversification through non-allelic homologous recombination. Comparative genomics has revealed that the distribution of NBS-encoding genes among chromosomes is both nonrandom and uneven, with certain genomic regions serving as "hotspots" for NBS gene proliferation [10].

Two distinct evolutionary patterns have been characterized for NBS genes: Type I genes exist as multiple paralogs that evolve rapidly through frequent gene conversion events, while Type II genes typically have fewer paralogs, evolve more slowly, and display conservation across populations, often varying through presence/absence polymorphisms [31]. This dichotomy reflects different evolutionary strategies for generating diversity while maintaining essential immune functions.

Tandem Duplication and NBS Gene Expansion

Mechanisms and Evolutionary Drivers

Tandem duplication involves the localized replication of genomic segments, resulting in paralogous genes arranged in series along chromosomes. For NBS genes, this process is often mediated by sequences that promote duplication, such as long tandem repeats [32]. Recent research has revealed that duplication-inducing elements can form effectively cooperative associations with arms-race genes like NBS genes, where both elements benefit from the association [32].

This cooperative relationship provides an evolutionary advantage by creating redundant gene copies that can freely explore mutational space without adverse selective consequences [32]. The physical association between NBS genes and duplication-prone genomic regions is non-random, suggesting natural selection has favored lineages where this configuration enhances the generation of diversity for pathogen recognition [32].

Functional and Ecological Significance

Tandem duplication of NBS genes represents a convergent genomic adaptation to biotic stress, particularly soil microbial pressures [33]. A comprehensive study of 205 Archaeplastida genomes revealed that TD-derived genes are enriched in enzymatic catalysis and biotic stress responses, with TD frequency correlating strongly with microbial exposure [33]. This pattern is further supported by observations that plant lineages transitioning to reduced-microbe environments (aquatic, parasitic, halophytic, or carnivorous lifestyles) consistently exhibit decreased TD frequency [33].

The functional specialization of tandemly duplicated NBS genes is influenced by their mode of regulatory preservation. TD genes often maintain broad expression patterns across cell types due to retention of ancestral cis-regulatory elements [34]. However, they also frequently exhibit asymmetric divergence, where one copy maintains broad expression while its paralog specializes in few cell types—a hallmark of functional compartmentalization [34].

Table 1: Comparative Analysis of NBS Genes in Diploid and Tetraploid Cotton Species

Species	Ploidy	Total NBS Genes	CNL (%)	TNL (%)	NL (%)	Notable Features
G. arboreum	Diploid	246	32.52%	2.03%	21.54%	Lower TNL percentage
G. raimondii	Diploid	365	29.32%	13.70%	24.38%	Higher TNL percentage
G. hirsutum	Tetraploid	588	28.06%	0.85%	26.19%	Inherited more NBS genes from G. arboreum
G. barbadense	Tetraploid	682	20.97%	6.45%	30.79%	Inherited more NBS genes from G. raimondii

Whole Genome Duplication and NBS Gene Evolution

Genomic Consequences of Polyploidization

Whole genome duplication creates complete genomic redundancies by doubling chromosome sets, providing raw material for evolutionary innovation. In allopolyploids like cotton, WGD results in complex patterns of NBS gene retention and loss. Comparative genomics of diploid and tetraploid cotton species reveals that allotetraploids possess approximately twice the number of NBS genes compared to their diploid progenitors [10]. However, this inheritance is often asymmetric, with tetraploid species retaining more NBS genes from one progenitor than the other [10].

The case of Gossypium hirsutum and G. barbadense illustrates this pattern well. G. hirsutum inherited more NBS-encoding genes from G. arboreum, while G. barbadense inherited more from G. raimondii [10]. This asymmetric evolution has functional consequences for disease resistance, as G. raimondii and G. barbadense are more resistant to Verticillium wilt, whereas G. arboreum and G. hirsutum are more susceptible [10]. The data suggest that TNL genes specifically may play a significant role in disease resistance to Verticillium wilt [10].

Evolutionary Trajectories Following WGD

After WGD, NBS genes undergo several possible evolutionary fates: deletion, hypofunctionalization, subfunctionalization, neofunctionalization, or maintenance under dosage balance constraints [34]. Spatial transcriptomics studies reveal that WGD-derived paralogs typically exhibit more preserved expression profiles across cell types compared to small-scale duplicates [34]. This preservation is linked to retention of ancestral transcription factor binding sites in promoters and enhancers [34].

WGD-derived NBS genes frequently maintain central roles as hubs within coexpression networks, consistent with the preferential retention of essential, dosage-sensitive genes following WGD events [34]. The functional constraints on NBS genes following WGD are further illustrated by the phenomenon of "compensatory drift," where one copy evolves toward lower expression while its paralog evolves toward higher expression, thereby maintaining the overall total ancestral expression level [34].

Comparative Analysis in Diploid and Tetraploid Systems

Genomic Dynamics in Polyploid Plants

The comparison between diploid and tetraploid genomes reveals distinct patterns of NBS gene evolution. In tetraploids, the interaction between duplicated genomes can lead to subgenome dominance, as observed in the allotetraploid Acorus calamus, where subgenome B shows dominance over subgenome A [35]. This asymmetric evolution influences the retention and expression of NBS genes, potentially shaping pathogen response profiles.

Polyploidization events are frequently associated with massive gene loss followed by large expansions through gene duplications—an evolutionary scenario termed "less, but more" [36]. This pattern involves an initial reduction in gene family numbers followed by duplication of the surviving members, potentially leading to evolutionary innovations [36]. For NBS genes, this could enable rapid adaptation to new pathogen pressures while maintaining core immune functions.

Expression Divergence and Regulatory Evolution

Ploidy-dependent differences in NBS gene expression have significant implications for disease resistance. Expression profiling in cotton has revealed differential regulation of specific NBS orthogroups (OG2, OG6, OG15) in tolerant versus susceptible varieties under biotic stress [2]. Genetic variation analyses between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified distinct variant profiles, with Mac7 exhibiting 6583 unique variants in NBS genes compared to 5173 in Coker312 [2].

Functional validation through virus-induced gene silencing (VIGS) of GaNBS (OG2) in resistant cotton demonstrated its critical role in virus tittering, confirming the functional significance of NBS gene expansion in disease resistance [2]. Protein-ligand and protein-protein interaction analyses further revealed strong interactions between specific NBS proteins and ADP/ATP, as well as core proteins of the cotton leaf curl disease virus [2].

Table 2: Experimental Approaches for Studying NBS Gene Duplication

Method	Application	Key Insights	Example Studies
Genome-wide identification & classification	Cataloging NBS genes across species	Reveals architectural diversity and species-specific innovations	[2]
Orthogroup analysis	Tracing evolutionary relationships	Identifies core conserved vs. lineage-specific NBS genes	[2]
Synteny analysis	Determining gene origins and losses	Uncovers asymmetric evolution in polyploids	[10]
Spatial transcriptomics	Mapping expression divergence in tissues	Shows how duplication mechanism affects expression evolution	[34]
Virus-Induced Gene Silencing (VIGS)	Functional validation of specific NBS genes	Confirms role in pathogen resistance	[2]

Experimental Methodologies and Research Tools

Genomic and Bioinformatics Approaches

The identification and classification of NBS genes relies on sophisticated bioinformatics pipelines. HMMER-based searches using PFAM models (e.g., NB-ARC domain PF00931) with stringent e-value cutoffs (1.1e-50) effectively identify NBS-domain-containing genes from genomic sequences [2]. Subsequent domain architecture analysis using tools like PfamScan enables comprehensive classification of NBS genes into structural categories [2].

Evolutionary analyses employ orthology inference tools such as OrthoFinder with DIAMOND for sequence similarity searches and MCL for clustering [2]. Phylogenetic reconstruction using maximum likelihood methods (FastTreeMP) with bootstrap validation provides insights into evolutionary relationships [2]. Synteny analysis further elucidates genomic conservation and rearrangement of NBS genes across species [10].

Functional Validation Techniques

Functional characterization of NBS genes involves multiple experimental approaches. Virus-induced gene silencing (VIGS) enables transient knockdown of candidate NBS genes to assess their role in disease resistance [2]. Protein-ligand and protein-protein interaction studies through molecular docking analyses reveal interactions between NBS proteins and pathogen effectors [2].

Expression profiling using RNA-seq under various biotic and abiotic stresses identifies differentially regulated NBS genes [2]. Spatial transcriptomics at cell-type resolution provides unprecedented insights into expression divergence following duplication events [34]. These integrated approaches bridge the gap between genomic identification and functional validation of NBS genes.

Diagram 1: Evolutionary framework of NBS gene proliferation in polyploid plants

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Essential Research Reagents for NBS Gene Analysis

Reagent/Material	Application	Function	Example Use Cases
PacBio HiFi reads	Genome assembly	Provides long, accurate reads for resolving repetitive regions	Assembling complex NBS clusters [37] [35]
Hi-C library kits	Chromosome scaffolding	Enaches chromatin interaction mapping	Physical map reconstruction [35]
HMMER/Pfam databases	NBS gene identification	Identifies NB-ARC domains in genomic sequences	Genome-wide NBS annotation [2]
OrthoFinder pipeline	Evolutionary analysis	Clusters genes into orthogroups	Comparative genomics across species [2]
VIGS vectors	Functional validation	Enables transient gene silencing	Testing NBS gene function in disease resistance [2]
Spatial transcriptomics platforms	Expression mapping	Resolves gene expression at cell-type resolution	Analyzing duplicate gene expression divergence [34]

The proliferation of NBS genes in plant genomes is driven by an interplay between tandem duplication and whole genome duplication, each contributing distinct evolutionary dynamics. Tandem duplication enables rapid, localized expansion of specific NBS families in response to pathogen pressure, while whole genome duplication provides genomic redundancies that undergo complex processes of subfunctionalization and neofunctionalization over evolutionary time.

In polyploid plants, the interaction between these duplication mechanisms creates unique opportunities for pathogen resistance evolution. The asymmetric evolution of NBS genes in tetraploids, with preferential retention from specific diploid progenitors, can determine disease resistance outcomes. Understanding these evolutionary forces provides crucial insights for crop improvement strategies, particularly for enhancing disease resistance through manipulation of NBS gene content and diversity.

Future research leveraging emerging technologies like spatial transcriptomics and pangenomics will further elucidate how duplication mechanisms shape the evolutionary landscape of NBS genes. These insights will be critical for developing climate-resilient crops with enhanced and durable disease resistance.

Decoding Complex Genomes: Methods for Profiling NBS Genes in Polyploid Systems

This technical guide provides a comprehensive overview of genome-wide identification pipelines for Nucleotide-Binding Site (NBS) genes, focusing on the integrated use of HMMER, Pfam, and custom Hidden Markov Model (HMM) profiles. Within the context of plant genomics, the expansion and evolution of NBS-encoding genes—one of the largest families of disease resistance (R) genes—exhibit distinct patterns between diploid and tetraploid species. This whitepaper details the bioinformatics methodologies that enable researchers to systematically identify, classify, and characterize these genes, thereby facilitating a deeper understanding of plant immunity mechanisms and supporting the development of disease-resistant crops. The guide is structured to serve the needs of researchers, scientists, and drug development professionals engaged in comparative genomics and plant pathogen resistance studies.

NBS-encoding genes constitute a major class of plant resistance (R) genes that play a critical role in effector-triggered immunity (ETI), providing protection against diverse pathogens including viruses, bacteria, and fungi [2] [38]. These genes typically encode proteins characterized by a conserved nucleotide-binding site (NBS) domain and often C-terminal leucine-rich repeat (LRR) domains. Based on their N-terminal domains, NBS-LRR genes are primarily classified into two major subfamilies: TNL (TIR-NBS-LRR) and CNL (CC-NBS-LRR) [38] [26]. The NBS domain itself is part of the larger NB-ARC (Apaf-1, R proteins, and CED-4) domain, which contains conserved motifs including P-loop, kinase-2, kinase-3a, GLPL, and MHDL that function in nucleotide binding and hydrolysis [39].

In plant genomes, NBS-encoding genes represent one of the largest and most variable gene families. Comparative genomic analyses have revealed striking differences in the size and composition of NBS gene repertoires across plant species. For instance, a recent study identified 12,820 NBS-domain-containing genes across 34 plant species, ranging from mosses to monocots and dicots, showcasing the extensive diversification of this gene family throughout plant evolution [2]. The expansion of NBS genes is driven primarily by gene duplication events, including both whole-genome duplication (WGD) and small-scale duplications (SSD) such as tandem, segmental, and transposon-mediated duplications [40] [2].

The evolutionary dynamics of NBS genes become particularly intriguing when comparing diploid and tetraploid plant species. Allotetraploid cotton species (e.g., Gossypium hirsutum and G. barbadense) possess approximately twice the number of NBS genes compared to their diploid progenitors (G. arboreum and G. raimondii), suggesting that polyploidization events contribute significantly to NBS gene expansion [39]. However, this expansion is not uniform across NBS gene types. Research has revealed asymmetric evolution of NBS-encoding genes in allotetraploid cottons, with G. hirsutum inheriting more NBS genes from its G. arboreum progenitor, while G. barbadense inherited more from its G. raimondii progenitor [39]. This asymmetric distribution may explain differential disease resistance patterns observed between these species, particularly regarding resistance to Verticillium wilt [39].

Core Components of Genome-Wide Identification Pipelines

HMMER: The Foundation of Profile HMM Searches

HMMER represents one of the most widely used software packages for sensitive homology detection based on profile Hidden Markov Models (HMMs) [41] [42]. This open-source tool employs probabilistic models to capture evolutionary information from multiple sequence alignments, enabling the detection of distant homologies that might be missed by pairwise sequence comparison methods like BLAST [41] [42]. The core HMMER workflow typically involves two main steps: model building using hmmbuild and database searching using hmmscan or hmmsearch [41] [42].

The fundamental advantage of HMMER in genome-wide identification pipelines lies in its ability to detect remote homologs through its sophisticated modeling of position-specific scores, insertion probabilities, and deletion probabilities [41]. Unlike simple pairwise methods, profile HMMs incorporate evolutionary information from entire protein families, making them particularly suitable for identifying divergent members of large gene families like NBS-encoding genes [41] [42]. When comparing performance, studies have shown that using the default options and parameters, SAM (another profile HMM package) consistently produces better models than HMMER when starting from identical alignments, though HMMER is typically between one and three times faster when searching databases larger than 2000 sequences [41].

Table 1: Key HMMER Components for NBS Gene Identification

Program	Function	Application in NBS Gene Identification
`hmmbuild`	Constructs HMM profiles from multiple sequence alignments	Building custom HMM profiles for NBS domains
`hmmscan`	Searches protein sequences against HMM database	Identifying NBS domains in proteome datasets
`hmmsearch`	Searches HMM profile against sequence database	Finding additional homologs of NBS genes
`hmmalign`	Creates multiple sequence alignment	Aligning identified NBS gene sequences
`jackhmmer`	Iterative sequence search	Detecting distant NBS gene homologs

Pfam Database: A Curated Repository of Protein Domains

Pfam represents a comprehensive collection of protein domain families, each represented by multiple sequence alignments and HMM profiles [43] [42]. As a core component of the InterPro database, Pfam provides standardized, curated models for thousands of protein domains, including those relevant to NBS gene identification [43] [42]. The NB-ARC domain (Pfam accession PF00931) serves as the primary Pfam model for identifying NBS-encoding genes in plant genomes [2] [39].

Recent advances in protein structure prediction, particularly through AlphaFold2, have enabled new investigations into the structural variability of Pfam domains [43]. Studies have revealed that many Pfam families contain between 20% and 40% of members with no assigned regular secondary structures, demonstrating significant within-family structural variability that may have implications for functional predictions [43]. This structural diversity presents both challenges and opportunities for NBS gene annotation, as the NB-ARC domain itself may exhibit structural variations that influence protein function.

The standard protocol for Pfam domain annotation involves using tools like InterProScan, which integrates multiple domain databases including Pfam, to comprehensively annotate protein sequences [43] [2]. For NBS gene identification, researchers typically perform domain architecture analysis to classify identified genes into subfamilies (e.g., CNL, TNL, RNL) based on the presence of additional domains such as TIR (PF01582), CC, or LRR (PF00560, PF07723, PF07725, PF12799, PF13306, etc.) [38] [39].

Custom HMM Profiles: Enhancing Specificity and Sensitivity

While Pfam provides general domain models, custom HMM profiles offer the advantage of tailored specificity for particular gene families or phylogenetic clades [2] [39]. The development of custom HMM profiles is particularly valuable for studying NBS gene evolution in diploid and tetraploid plants, where lineage-specific variations may not be fully captured by generic models.

The construction of custom HMM profiles typically begins with the compilation of a high-quality training set of known NBS sequences, preferably from closely related species [39]. These sequences are aligned using multiple sequence alignment tools such as MAFFT or MUSCLE, with careful attention to alignment quality as this represents the most critical factor affecting HMM performance [41]. The resulting alignment serves as input for hmmbuild to generate the custom profile [41] [42].

Custom HMM profiles have demonstrated particular utility in comparative studies of NBS genes across cotton species. For example, in a comprehensive analysis of four Gossypium species, custom HMM profiles enabled the identification of 246, 365, 588, and 682 NBS-encoding genes in G. arboreum, G. raimondii, G. hirsutum, and G. barbadense, respectively [39]. These custom models facilitated the detection of species-specific variations in NBS gene architectures and distributions, revealing asymmetric evolution patterns between diploid and tetraploid species.

Table 2: Comparison of NBS Gene Subfamilies in Diploid and Tetraploid Cotton Species

NBS Type	G. arboreum (Diploid)	G. raimondii (Diploid)	G. hirsutum (Tetraploid)	G. barbadense (Tetraploid)
CN	17.89%	10.68%	16.67%	11.29%
CNL	32.52%	29.32%	31.12%	29.03%
N	23.98%	16.99%	21.77%	17.01%
NL	5.69%	11.78%	6.80%	11.73%
TN	1.22%	7.95%	1.70%	8.36%
TNL	1.63%	12.05%	2.38%	12.61%
RN	8.54%	8.22%	9.69%	7.92%
RNL	8.54%	3.01%	9.86%	2.05%

Integrated Workflow for NBS Gene Identification

Pipeline Architecture and Implementation

The integrated workflow for genome-wide identification of NBS genes combines HMMER, Pfam, and custom HMM profiles in a systematic pipeline that ensures comprehensive detection and accurate classification. A robust implementation of this pipeline, as demonstrated in recent studies of NBS genes in tung trees and passion fruit, typically follows a multi-stage process [38] [26].

The initial stage involves data acquisition and preprocessing, where proteome or genome sequences are obtained from relevant databases. For the identification of NBS-encoding genes, the PfamScan script with the Pfam-A.hmm model is commonly employed using a stringent e-value cutoff (typically 1.1e-50) to ensure specificity [2] [39]. All genes containing the NB-ARC domain are initially considered putative NBS genes and subjected to further analysis.

The subsequent stage involves comprehensive domain architecture analysis using InterProScan or similar tools, which provides additional domain annotations beyond the core NB-ARC domain [2] [38]. This step is crucial for classifying NBS genes into subfamilies based on the presence of N-terminal domains (TIR, CC, RPW8) and C-terminal domains (LRR) [38] [39]. The classification system typically groups similar domain architectures into the same classes, enabling systematic comparison across species [2].

Diagram 1: Integrated workflow for NBS gene identification showing the sequential stages from data collection through bioinformatics analysis to biological interpretation.

Classification and Orthology Analysis

Following identification, NBS genes are classified based on their domain architectures into established categories such as CN, CNL, N, NL, RN, RNL, TN, and TNL [39]. This classification provides the foundation for comparative analyses between diploid and tetraploid species. Orthology analysis using tools like OrthoFinder with the DIAMOND algorithm for sequence similarity searches and the MCL clustering algorithm for gene grouping enables the identification of orthogroups (OGs) across species [2]. This approach has revealed core orthogroups (e.g., OG0, OG1, OG2) that are conserved across multiple species, as well as unique orthogroups specific to particular lineages [2].

Evolutionary analysis typically involves constructing phylogenetic trees using maximum likelihood methods implemented in tools like FastTreeMP with bootstrap support [2]. These phylogenetic analyses, coupled with synteny analysis, help elucidate the evolutionary relationships between NBS genes in diploid and tetraploid species. For example, studies in cotton have demonstrated that the TIR-NBS genes of G. barbadense are closely related to those of G. raimondii, providing insights into the asymmetric evolution of NBS genes in allotetraploid species [39].

Expression and Functional Validation

Bioinformatic predictions require validation through expression analysis and functional studies. Transcriptomic analyses using RNA-seq data from various tissues and stress conditions help identify NBS genes with potentially important biological roles [2] [26]. For instance, expression profiling in cotton has revealed the putative upregulation of specific orthogroups (OG2, OG6, OG15) in different tissues under various biotic and abiotic stresses in susceptible and tolerant plants [2].

Functional validation often employs virus-induced gene silencing (VIGS) to demonstrate the role of candidate NBS genes in disease resistance [2] [38]. For example, silencing of GaNBS (OG2) in resistant cotton resulted in increased susceptibility, demonstrating its putative role in defense responses [2]. Similarly, in tung trees, VIGS experiments confirmed that Vm019719 confers resistance to Fusarium wilt in V. montana, while its allelic counterpart in susceptible V. fordii (Vf11G0978) contains a promoter deletion that renders it ineffective [38].

Table 3: Research Reagent Solutions for NBS Gene Studies

Reagent/Resource	Function	Application Example
HMMER Software Suite	Profile HMM construction and searching	Identifying NBS domain-containing genes in proteomes [41] [42]
Pfam Database	Curated protein domain families	Annotating NB-ARC and associated domains [43] [42]
InterProScan	Integrated protein domain annotation	Comprehensive domain architecture analysis [43] [2]
AlphaFold2 Database	Predicted protein structures	Assessing structural variability of NBS domains [43]
OrthoFinder	Orthogroup inference	Identifying conserved NBS gene families across species [2]
VIGS Vectors	Virus-induced gene silencing	Functional validation of NBS gene candidates [2] [38]
RNA-seq Databases	Transcriptome data	Expression profiling of NBS genes under stress [2] [26]

Case Studies in Diploid and Tetraploid Plants

Cotton Species: Asymmetric Evolution of NBS Genes

Comparative genomics analyses of four Gossypium species have provided remarkable insights into the evolution of NBS genes in diploid versus tetraploid plants [39]. The diploid species G. arboreum (A-genome) and G. raimondii (D-genome) contain 246 and 365 NBS-encoding genes, respectively, while the allotetraploid species G. hirsutum (AD-genome) and G. barbadense (AD-genome) contain 588 and 682 NBS genes, respectively [39]. This nearly twofold increase in NBS gene numbers in tetraploids suggests preservation and potential expansion following polyploidization.

Strikingly, the distribution of NBS gene types differs significantly between the diploid progenitors and is reflected in their tetraploid descendants. G. arboreum possesses a larger proportion of CN, CNL, and N genes, while G. raimondii has higher proportions of NL, TN, and TNL genes [39]. This bias is maintained in the allotetraploids, with G. hirsutum resembling G. arboreum in its NBS profile, and G. barbadense resembling G. raimondii [39]. The most dramatic difference is observed in TNL genes, which are approximately seven times more abundant in G. raimondii and G. barbadense compared to G. arboreum and G. hirsutum [39].

This asymmetric evolution of NBS genes has functional implications for disease resistance. G. raimondii and G. barbadense display greater resistance to Verticillium wilt compared to G. arboreum and G. hirsutum, suggesting that TNL genes may play a significant role in resistance to this pathogen [39]. These findings highlight how allopolyploidization can lead to divergent evolutionary trajectories for disease resistance genes in different tetraploid lineages.

Tung Trees: NBS Gene Diversification and Fusarium Wilt Resistance

Comparative analysis of the diploid tung tree species Vernicia fordii (susceptible to Fusarium wilt) and Vernicia montana (resistant) has revealed significant differences in their NBS-LRR gene complements [38]. V. fordii contains 90 NBS-LRR genes, while V. montana possesses 149, with the latter exhibiting greater architectural diversity including TIR-NBS-LRR genes that are absent in V. fordii [38].

Notably, V. montana contains 12 NBS-LRR genes with TIR domains (8.1% of its total), while V. fordii completely lacks TIR-type NBS-LRR genes [38]. This discrepancy suggests that loss of TNL genes in V. fordii may contribute to its susceptibility to Fusarium wilt. Furthermore, V. montana displays four types of LRR domains (LRR1, LRR3, LRR4, LRR8), while V. fordii has only two (LRR3, LRR8), indicating additional domain loss in the susceptible species [38].

Functional analysis identified the orthologous gene pair Vf11G0978-Vm019719 as a potential determinant of resistance differences [38]. While Vm019719 shows upregulated expression in V. montana following infection, its allele in V. fordii (Vf11G0978) shows downregulated expression [38]. Molecular characterization revealed that Vm019719 is activated by VmWRKY64, while the promoter of Vf11G0978 contains a deletion in the W-box element that likely impairs its responsiveness [38]. This case study illustrates how integrated bioinformatics and experimental approaches can pinpoint specific genetic variations underlying differential disease resistance.

Diagram 2: Evolutionary relationships and NBS gene inheritance patterns between diploid and tetraploid cotton species, showing asymmetric evolution of NBS gene types and their association with disease resistance phenotypes.

Technical Considerations and Best Practices

Parameter Optimization and Threshold Selection

The sensitivity and specificity of NBS gene identification pipelines depend critically on appropriate parameter selection. For HMMER-based searches, the e-value threshold represents one of the most important parameters. Studies of NBS genes typically employ stringent e-value cutoffs (e.g., 1.1e-50) to minimize false positives while maintaining sensitivity [2] [39]. This stringent threshold is justified by the highly conserved nature of the NB-ARC domain and the need to distinguish true NBS genes from distant homologs or pseudogenes.

When building custom HMM profiles, multiple sequence alignment quality is paramount [41]. The alignment should include representative sequences from the target phylogenetic range, with careful manual inspection to ensure proper alignment of conserved motifs. For NBS genes, special attention should be paid to the P-loop, kinase-2, kinase-3a, GLPL, and MHDL motifs, as proper alignment of these regions is essential for constructing accurate profiles [39].

For orthology analysis, inflation parameters in the MCL algorithm significantly impact orthogroup detection. Testing multiple inflation values (typically 1.5-4.0) and comparing results can help identify optimal parameters for specific datasets [2]. Additionally, incorporating domain architecture information alongside sequence similarity can improve orthogroup accuracy, as genes with similar domain architectures are more likely to share common functions [2].

Handling Structural Variability and Domain Diversity

The structural variability observed in Pfam domains presents both challenges and opportunities for NBS gene annotation [43]. Recent analyses of AlphaFold2-predicted structures have revealed that many Pfam families contain substantial structural diversity, with 20-40% of members lacking regular secondary structures in certain families [43]. This variability may be particularly relevant for NBS genes, as flexible regions often involved in signal transduction and conformational changes.

To address this variability, researchers can employ structural clustering approaches using tools like FoldSeek to identify structurally distinct subgroups within NBS gene families [43]. Agglomerative clustering with TM-score thresholds (e.g., 0.6) can group structurally similar domains while distinguishing divergent variants [43]. This structural information complements sequence-based analyses and may help identify functionally important subfamilies.

The diversity of domain architectures presents another analytical challenge. Beyond the major classes (CNL, TNL, RNL), numerous species-specific architectures have been identified, including TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, and Sugar_tr-NBS in some species [2]. Comprehensive classification systems should accommodate this diversity while maintaining consistent nomenclature to facilitate cross-study comparisons.

Genome-wide identification of NBS genes in multiple species represents a computationally intensive task, particularly for large plant genomes with abundant gene duplicates. Performance considerations become especially important when analyzing tetraploid genomes, which often approach or exceed 1-2 Gb in size with over 40,000 genes [40] [39].

HMMER3 offers significant performance improvements over earlier versions, but searches against large databases can still require substantial computational resources [41] [42]. Parallelization using GNU Parallel or similar tools can distribute searches across multiple cores, dramatically reducing processing time [43]. For very large datasets, pre-filtering using faster tools like BLAST with conservative thresholds can reduce the search space before applying more sensitive HMMER searches [41].

Memory usage represents another important consideration, particularly for orthology analysis of large gene families across multiple genomes. OrthoFinder implementations with DIAMOND instead of BLAST can reduce memory requirements while maintaining sensitivity [2]. For extremely large analyses, disk-based or distributed computing approaches may be necessary to handle intermediate files and results.

The integration of HMMER, Pfam, and custom HMM profiles in genome-wide identification pipelines has revolutionized our understanding of NBS gene expansion in diploid and tetraploid plants. These bioinformatic approaches have revealed asymmetric evolution of NBS genes in allopolyploids, with different tetraploid lineages preferentially retaining NBS genes from different diploid progenitors [39]. This asymmetric evolution has functional consequences, influencing disease resistance profiles and adaptation to pathogen pressures.

Future developments in this field will likely include more sophisticated integration of structural information from AlphaFold2 predictions to refine domain annotations and functional predictions [43]. Machine learning approaches, particularly Random Forest models, show promise for identifying multi-stress responsive NBS genes based on integrated sequence, structural, and expression features [26]. Additionally, the increasing availability of pan-genome resources will enable more comprehensive surveys of NBS gene diversity within and between species, moving beyond single reference genomes to capture the full spectrum of variation.

As these methodologies continue to evolve, they will further illuminate the complex evolutionary dynamics of plant immune genes and provide valuable resources for marker-assisted breeding and genetic engineering of disease-resistant crops. The pipeline described in this guide provides a robust foundation for these future investigations, enabling researchers to systematically characterize NBS genes across the spectrum of plant diversity.

This technical guide provides a comprehensive framework for conducting structural and phylogenetic analyses of plant nucleotide-binding site (NBS) genes, with particular emphasis on investigating gene family expansion in diploid versus tetraploid plants. The NBS gene family represents the largest class of plant disease resistance (R) genes, encoding proteins containing nucleotide-binding site and leucine-rich repeat (NBS-LRR) domains that play critical roles in pathogen recognition and defense activation [44] [17]. Understanding the evolutionary dynamics of these genes across ploidy levels is essential for unraveling the genetic basis of disease resistance and developing improved crop varieties through targeted breeding strategies.

Recent studies have demonstrated that whole-genome duplication (WGD) events, which generate tetraploids from diploid progenitors, trigger significant genomic and transcriptomic changes that can influence resistance gene evolution [45] [46]. Polyploidization has been shown to alter growth patterns, cell wall composition, and transcriptional networks, potentially creating novel genetic material for evolutionary innovation [46]. This guide integrates methodologies from multiple contemporary studies to establish robust protocols for analyzing domain architecture and phylogenetic relationships within the context of ploidy variation.

Structural Analysis of NBS Genes: Domain Architecture Determination

Domain Identification and Classification

Structural analysis begins with the comprehensive identification of protein domains within NBS genes. The standard workflow involves multiple bioinformatic tools to detect characteristic domains:

NB-ARC Domain (PF00931): The conserved nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4 serves as the fundamental signature for NBS gene identification [17] [47]. This domain is typically identified using Hidden Markov Model (HMM) profiles from Pfam with an E-value cutoff of 10^-4.
Leucine-Rich Repeats (LRR): The C-terminal LRR domain (PF08191) is responsible for pathogen recognition and specificity [44] [17]. Detection requires SMART protein motif analysis to improve identification accuracy beyond basic Pfam searches.
N-terminal Domains: Classification of NBS genes into subfamilies depends on N-terminal domain presence:
- TIR (Toll/Interleukin-1 Receptor) domain (PF01582): Characteristic of TNL subfamily
- CC (Coiled-Coil) domain: Identified using COILS with a threshold of 0.5, characteristic of CNL subfamily [17]
- RPW8 domain (PF05659): Characteristic of RNL subfamily [17]

Table 1: NBS Gene Classification Based on Domain Architecture

Class	Domain Structure	Representative Motifs	Functional Role
CNL	CC-NBS-LRR	P-loop, RNBS-A-non-TIR, Kinase-2, RNBS-B, RNBS-C, GLPL [44]	Pathogen recognition, defense signaling
TNL	TIR-NBS-LRR	P-loop, RNBS-A-TIR, Kinase-2, RNBS-B, GLPL [44]	Signal recognition and transduction
RNL	RPW8-NBS-LRR	P-loop, Kinase-2, RNBS-B, GLPL [17]	Defense signal transduction
NL	NBS-LRR	P-loop, Kinase-2, RNBS-B, GLPL [44]	Minimal recognition unit
N	NBS-only	P-loop, Kinase-2, RNBS-B [44]	Degenerated resistance function

Motif Conservation Analysis

Beyond domain architecture, conserved motifs within the NBS domain provide critical insights into functional evolution. Six core motifs have been identified across NBS genes:

P-loop/kin1a (GxGKT/S): Involved in ATP/GTP binding
RNBS-A (VLxLVVGxISTND): Distinguishes TNL (RWKKVFVLDDVW) from nTNL (VLLEVIGCISNTND) subfamilies [44]
Kinase-2 (KxPRVLIVLDDVW): Catalytic core region
RNBS-B (NGSRIVTTTRENKV): Structural maintenance
RNBS-C (LLxLENGWKxLRD): Functional specificity
GLPL (CxGLPLA): Structural motif [44]

These motifs exhibit subfamily-specific conservation patterns, with TNL sequences showing distinct RNBS-A motifs (RWKKVFVLDDVW) compared to nTNL sequences (VLLEVIGCISNTND) [44]. Motif analysis should be performed using multiple sequence alignment with MEGA or ClustalW, followed by visualization in specialized tools like GenDoc to identify lineage-specific variations.

Experimental Protocol: Structural Characterization Workflow

Step 1: Data Acquisition and Preprocessing

Retrieve genome sequences and annotation files from NCBI, Phytozome, or species-specific databases
Extract protein coding sequences and translate to amino acid sequences
Create a local database of all putative proteins

Step 2: Domain Identification

Perform HMMER search against Pfam database (NB-ARC domain: PF00931)
Conduct complementary BLASTP search using reference NBS sequences (E-value < 10^-5)
Remove redundant sequences, retaining the longest isoform per gene

Step 3: Architecture Classification

Identify TIR domains using Pfam (PF01582)
Detect CC domains using COILS with threshold 0.5
Identify LRR domains using SMART and Pfam (PF08191)
Classify genes into CNL, TNL, RNL, NL, or N categories

Step 4: Motif Analysis

Perform multiple sequence alignment using MUSCLE or ClustalW
Identify conserved motifs manually or using MEME suite
Validate motif conservation across subfamilies and ploidy levels

Phylogenetic Analysis: Reconstructing Evolutionary Relationships

Phylogenetic Tree Construction

Phylogenetic analysis of NBS genes provides insights into evolutionary history, duplication events, and selective pressures. The standard approach involves:

Sequence Alignment and Model Selection

Extract NB-ARC domain sequences from all identified NBS genes
Perform multiple sequence alignment using MUSCLE or ClustalW with default parameters
Select best-fit substitution model using ModelTest or ProtTest (Jones-Taylor-Thornton model for amino acids) [29]

Tree Building Methods

Maximum Likelihood: Implement using RAxML or MEGA with 1000 bootstrap replicates [47] [29]
Bayesian Inference: Perform using MrBayes with Markov Chain Monte Carlo (MCMC) sampling

Tree Calibration

Estimate divergence times using synonymous substitution rates (Ks) between paralogs and orthologs
Calculate Ka/Ks ratios to identify positive selection (Ka/Ks > 1) or purifying selection (Ka/Ks < 1) [47]

Table 2: Evolutionary Parameters for NBS Gene Phylogenetics

Parameter	Calculation Method	Interpretation	Application in Ploidy Studies
Ks (Synonymous substitutions)	MEGA v6.06, PAML4 package [47]	Measures neutral evolutionary rate, estimates duplication times	Compare duplication rates between diploids and tetraploids
Ka (Nonsynonymous substitutions)	MEGA v6.06, PAML4 package [47]	Measures functional constraint	Assess functional divergence post-polyploidization
Ka/Ks ratio	Ka/Ks	Identifies selection pressure: <1 purifying, =1 neutral, >1 positive [47]	Detect selection differences in polyploid lineages
Bootstrap value	RAxML, MEGA (1000 replicates) [29]	Measures node support in phylogenetic trees	Validate evolutionary relationships across ploidy levels

Evolutionary Analysis in Diploid-Tetraploid Systems

Comparative analysis of NBS genes across ploidy levels reveals distinctive evolutionary patterns:

Gene Family Expansion Mechanisms

Tandem Duplications: Primary driver of NBS gene clusters, frequently observed in specific chromosomal regions [44] [47]
Dispersed Duplications: Contribute to singleton NBS genes distributed across genomes [17]
Whole-Genome Duplication: Polyploidization creates immediate gene copy number increases, followed by selective retention or loss [45] [46]

Studies in Fragaria species demonstrated that lineage-specific duplications occurred before species divergence, with NBS-LRR genes forming 184 gene families across six species [47]. Tetraploids exhibit significant transcriptomic alterations, with 92 differentially expressed genes associated with elevated leaf potassium in neo-tetraploid Arabidopsis [45].

Subfamily-Specific Evolutionary Rates TNL genes show significantly higher Ks and Ka/Ks values compared to non-TNL genes, indicating more rapid evolution and stronger diversifying selection [47]. Monocots frequently show TNL depletion, with complete absence observed in orchids and other species [48].

Experimental Protocol: Phylogenetic Analysis Workflow

Step 1: Sequence Preparation

Extract NB-ARC domain sequences from all NBS genes
Add outgroup sequences from related species for rooting
Perform multiple sequence alignment using MUSCLE

Step 2: Model Selection

Test substitution models using ModelTest or ProtTest
Select best-fit model based on Akaike/Bayesian Information Criterion

Step 3: Tree Construction

Build maximum likelihood tree with 1000 bootstrap replicates
Run Bayesian inference with MCMC (1,000,000 generations)
Combine methods for robust phylogenetic hypothesis

Step 4: Evolutionary Analysis

Calculate Ka, Ks, and Ka/Ks ratios using MEGA or PAML
Test for positive selection using site-specific models
Estimate divergence times from Ks values

Table 3: Essential Research Reagents for NBS Gene Analysis

Reagent/Resource	Specifications	Application	Example Sources
Pfam Database	HMM profiles for protein domains (NB-ARC: PF00931) [17]	Domain identification and architecture classification	pfam.xfam.org
COILS Server	Coiled-coil prediction with threshold 0.5 [17]	CC domain identification in CNL and RNL genes	embnet.vital-it.ch/software/COILS
SMART Database	Protein domain annotation with improved LRR detection [47]	Comprehensive domain architecture analysis	smart.embl-heidelberg.de
MEGA Software	Molecular Evolutionary Genetics Analysis, version 6+ [29]	Phylogenetic tree construction, Ka/Ks calculation	megasoftware.net
PAML Package	Phylogenetic Analysis by Maximum Likelihood, version 4 [47]	Detection of positive selection, evolutionary rate analysis	abacus.gene.ucl.ac.uk/software/paml
ClustalW/MUSCLE	Multiple sequence alignment algorithms [29]	Preparing sequences for phylogenetic analysis	ebi.ac.uk/Tools/clustalw2
GENECONV	Sequence exchange detection with permutation tests [47]	Identifying gene conversion events	math.wustl.edu/~sawyer/mbprogs

Structural and phylogenetic analyses provide powerful complementary approaches for investigating NBS gene expansion in diploid and tetraploid plants. Integration of domain architecture characterization with evolutionary relationship reconstruction reveals how whole-genome duplication events shape resistance gene repertoire and function. The protocols and methodologies outlined in this guide establish a robust framework for comparative genomic studies aimed at understanding the evolutionary consequences of polyploidization on plant immunity systems. Future research directions should incorporate three-dimensional protein structure prediction and single-cell transcriptomics to further elucidate structure-function relationships in polyploid plants, potentially accelerating the development of disease-resistant crop varieties through manipulation of ploidy states.

Nucleotide-binding site (NBS) domain genes represent one of the largest superfamilies of plant disease resistance (R) genes, encoding proteins crucial for pathogen recognition and defense activation [2]. These genes exhibit remarkable structural diversity, encompassing classical architectures such as NBS, NBS-LRR, TIR-NBS, and TIR-NBS-LRR, alongside species-specific patterns [2]. A hallmark of NBS-encoding genes across plant genomes is their uneven genomic distribution, frequently organized in clustered arrangements at chromosome ends rather than being randomly dispersed [17]. This spatial organization has significant implications for understanding plant-pathogen co-evolution and the mechanisms driving resistance gene diversification, particularly in the context of ploidy variation between diploid and tetraploid plants [49].

The investigation of NBS gene distribution patterns provides critical insights into evolutionary dynamics, including the role of tandem duplications and whole-genome duplications (WGD) in expanding and reshaping the resistance gene repertoire [2] [50]. Studies across numerous plant species reveal that NBS genes are often concentrated in specific genomic regions, with this clustering facilitating rapid evolution and generating novel resistance specificities through unequal crossing over and gene conversion [17]. Within the framework of broader thesis research on NBS gene expansion in diploid versus tetraploid plants, chromosomal mapping and cluster analysis serve as foundational methodologies for visualizing and quantifying these distribution patterns, enabling researchers to trace evolutionary history and identify candidate genes for functional validation.

Methodological Framework for Chromosomal Mapping and Cluster Analysis

Identification and Annotation of NBS-Encoding Genes

The initial and crucial step in chromosomal mapping involves the comprehensive identification and accurate annotation of NBS-encoding genes within a genome assembly. The standard methodology utilizes Hidden Markov Model (HMM) profiles of conserved domains to scan predicted protein sequences, followed by additional validation steps to confirm domain architecture [2] [17].

Table 1: Key Bioinformatics Tools for NBS Gene Identification

Tool/Database	Primary Function	Key Parameters	Application Example
PfamScan.pl HMM	Domain search using HMM models	e-value cutoff (e.g., 1.1e-50), Pfam-A.hmm model [2]	Initial screening for NB-ARC domain (PF00931) [2] [17]
NCBI Conserved Domain Database (CDD)	Domain identification and classification	e-value threshold (e.g., 10^-4) [17]	Verification of TIR (PF01582), RPW8 (PF05659), LRR (PF08191) domains [17]
Coiled-coil prediction tools	Identification of coiled-coil (CC) domains	Threshold value ≥ 0.5 [17]	Classification of CNL subfamily members [17]
BLASTP	Sequence homology searches	e-value ~1.0 [17]	Supplemental identification of NBS protein homologs [17]

The typical workflow begins with using HMMER software with the Pfam NB-ARC domain (PF00931) profile to scan the entire proteome [2] [17]. Candidate sequences identified are subsequently analyzed using the NCBI CDD and simple modular architecture research tools to identify associated N-terminal domains like TIR, CC, or RPW8, and C-terminal LRR repeats [17]. This multi-step process ensures accurate classification of NBS genes into subfamilies such as TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), and RNL (RPW8-NBS-LRR) [17].

Chromosomal Mapping and Cluster Definition

Once identified, NBS genes are mapped to their physical chromosomal locations using genome annotation files (GFF3 or GTF format). The physical position of each gene on the chromosome is extracted, and genes are visualized along the chromosomes using specialized bioinformatics software.

A critical aspect of distribution analysis involves defining gene clusters. A common operational definition considers NBS genes to be clustered if they are located within a specified physical distance on a chromosome. For example, in Akebia trifoliata, researchers defined NBS genes as clustered if the distance between adjacent NBS genes was less than 200 kilobases [17]. This analysis revealed that 41 of 64 mapped NBS genes (64%) were located in clusters, while the remaining 23 genes were singletons, with most clusters situated at chromosome ends [17].

The following workflow diagram illustrates the comprehensive process from gene identification to chromosomal mapping and cluster analysis:

Comparative Analysis of NBS Gene Distribution in Diploid and Tetraploid Plants

The expansion of NBS-encoding genes in plant genomes occurs primarily through duplication events, with both small-scale duplications (SSD) and whole-genome duplications (WGD) playing significant roles [2]. The distribution patterns and cluster characteristics often differ markedly between diploid and polyploid species, providing insights into the evolutionary dynamics of resistance gene repertoires.

Distribution Patterns in Diploid Plants

In diploid species, NBS genes consistently show non-random distribution patterns. A study of the diploid Akebia trifoliata revealed that NBS genes are "unevenly distributed on 14 chromosomes, most of which were assigned to the chromosome ends" [17]. This telomeric bias in distribution was accompanied by a high proportion of genes (64%) organized in clusters, with tandem duplications identified as the main force for NBS expansion in this species [17].

Similar patterns have been observed in other diploid species within the Rosaceae family. For instance, the diploid Asian pear (Pyrus bretschneideri) possesses 338 NBS-encoding genes, while the diploid European pear (P. communis) has 412 genes, with this difference attributed primarily to proximal duplications [51]. Phylogenetic analysis of these pear genomes revealed numerous species-specific clades and genes, suggesting independent expansion events following species divergence [51].

Distribution Patterns in Tetraploid Plants

Tetraploid species often exhibit more complex NBS gene distributions resulting from the combination of multiple genomes. In a haplotype-resolved genome assembly of the autotetraploid Actinidia arguta (hardy kiwifruit), researchers identified distinct NBS-LRR gene complements across the four haplotypes [49]. This complex genomic architecture provides opportunities for subfunctionalization and neofunctionalization of resistance genes following polyploidization.

The process of allopolyploidization can create particularly interesting distribution patterns. In the recently formed allopolyploid Acanthus tetraploideus, homeologous sequences were preferentially clustered with its two parental diploids in a roughly 1:1 ratio [52]. This merging of divergent genomes creates immediate cluster diversity and establishes the foundation for subsequent reorganization of resistance gene arrays.

Table 2: Comparative NBS Gene Statistics in Diploid and Tetraploid Plants

Plant Species	Ploidy	Total NBS Genes	% of Genome	Clustering Pattern	Main Expansion Mechanism
*Apple (Malus domestica)* [50]	Diploid	1,303	2.05%	Extreme expansion	Tandem duplication
*Asian Pear (P. bretschneideri)* [51]	Diploid	338	~0.07%	Clustered, uneven	Proximal duplication
*European Pear (P. communis)* [51]	Diploid	412	~0.08%	Clustered, uneven	Proximal duplication
*Strawberry (Fragaria vesca)* [50]	Diploid	346	1.05%	Clustered	Tandem duplication
Akebia trifoliata [17]	Diploid	73	Not reported	64% in clusters	Tandem and dispersed
Actinidia arguta [49]	Autotetraploid	Varies by haplotype	Not reported	Complex, haplotype-specific	Whole-genome duplication

Evolutionary Implications of Distribution Patterns

The uneven distribution of NBS genes in plant genomes has profound evolutionary implications, particularly in the context of plant-pathogen co-evolution. Cluster formation at chromosome ends creates genomic environments conducive to rapid evolution, as telomeric regions typically exhibit higher recombination rates [17]. This facilitates the generation of novel resistance specificities through mechanisms such as unequal crossing over and gene conversion, enabling plants to keep pace with rapidly evolving pathogens.

Comparative analyses between diploid and tetraploid species reveal different evolutionary trajectories for NBS gene arrays. In diploid plants, the "birth-and-death" evolution model predominates, where genes undergo tandem duplication followed by differential survival or loss [51]. In tetraploids, the interplay between whole-genome duplication and subsequent diploidization processes creates more complex evolutionary dynamics, including potential functional diversification among homeologs [49] [52].

Population genetic analyses in pear species have demonstrated that NBS genes frequently show signatures of positive selection, with approximately 15.79% of orthologous gene pairs between Asian and European pears exhibiting Ka/Ks ratios >1 [51]. This pattern of adaptive evolution appears to differ between diploid and tetraploid systems, reflecting their distinct genomic architectures and evolutionary constraints.

Table 3: Research Reagent Solutions for Chromosomal Mapping and Cluster Analysis

Reagent/Resource	Function/Application	Specific Examples/Notes
High-Quality Genome Assembly	Foundation for gene mapping and identification	Haplotype-resolved assemblies preferred for polyploids [49]
HMM Profile Databases	Identification of conserved NBS domains	Pfam NB-ARC domain (PF00931) [2] [17]
Genome Annotation Files	Chromosomal mapping and position analysis	GFF3/GTF format files from sequenced genomes [17]
Orthology Analysis Tools	Evolutionary and comparative genomics	OrthoFinder for orthogroup inference [2] [53]
Multiple Sequence Alignment Tools	Phylogenetic analysis and motif identification	MAFFT for accurate alignment [2]
Phylogenetic Tree Building Software	Evolutionary relationship inference	FastTreeMP with bootstrap validation [2]
Visualization Platforms	Chromosomal distribution mapping	Custom scripts for generating chromosome maps [17]

Chromosomal mapping and cluster analysis provide powerful methodologies for visualizing the uneven genomic distribution of NBS-encoding genes, revealing patterns that reflect the evolutionary history and selective pressures shaping plant immune systems. The distinct distribution characteristics observed between diploid and tetraploid plants—ranging from the tight clusters driven by tandem duplications in diploids to the complex homeologous relationships in tetraploids—highlight the diverse evolutionary paths available for resistance gene expansion.

These distribution patterns are not merely structural curiosities but have functional consequences for disease resistance mechanisms and adaptive potential. As genomic technologies advance, particularly for complex polyploid genomes, more refined analyses of NBS gene distribution will continue to enhance our understanding of plant-pathogen co-evolution and facilitate the identification of candidate genes for crop improvement programs. The methodological framework outlined in this guide provides a foundation for such investigations, enabling researchers to decipher the complex genomic architecture underlying plant immunity.

A foundational question in genomics is how changes in gene copy number translate into functional phenotypic output through the intermediary of transcriptomics. This relationship is pivotal for understanding the mechanisms of evolution, adaptation, and disease resistance in plants. The gene balance hypothesis (GBH) posits that there is selection on gene copy number to preserve the stoichiometric balance among interacting proteins, which presupposes that gene product abundance is governed by gene dosage [54]. This review frames this central question within the specific context of Nucleotide-Binding Site-Leucine Rich Repeat (NBS-LRR) gene expansion, a major class of plant disease resistance genes, comparing evolutionary patterns between diploid and tetraploid plants. The expansion and retention of these genes are critically influenced by whole-genome duplication (WGD) events, and transcriptomic profiling provides the key to linking their copy number to their functional role in plant immunity.

Core Concepts: Gene Dosage and Transcriptional Responses

The Gene Balance Hypothesis

The GBH provides a framework for predicting the retention and loss of genes following duplication events. It predicts a fitness cost to disrupting the stoichiometric balance of proteins involved in coordinated interaction networks, such as protein complexes and signaling cascades [54]. Whole-genome duplication (WGD) duplicates every gene in the network simultaneously, preserving this balance, and purifying selection subsequently acts to retain these genes together during the diploidization process. In contrast, small-scale duplications (SSD), including tandem duplications, disrupt this balance and are often purged from the genome, a pattern known as reciprocal retention [54]. For the GBH to operate, a change in gene copy number must be "felt" at the transcript level; it necessitates that gene expression changes in response to copy number alteration and that these changes are coordinated across genes within a balanced network [54].

NBS-LRR Genes and Plant Immunity

NBS-LRR genes are the largest class of plant R genes and are central to the plant's effector-triggered immunity (ETI) system [2] [55]. These genes are highly diverse and can be classified into subfamilies based on their N-terminal domains, primarily TIR-NBS-LRR (TNL) and CC-NBS-LRR (CNL) [55]. Their copy number varies dramatically across plant species. For instance, a 2024 study identified 12,820 NBS-domain-containing genes across 34 plant species, uncovering significant diversity and several novel domain architectures [2]. This expansion is driven by several evolutionary mechanisms, with WGD being a major contributor [55].

Table 1: Key Characteristics of NBS-LRR Gene Subfamilies

Subfamily	N-Terminal Domain	Downstream Signaling	Representative Role
TNL	TIR (Toll/Interleukin-1 Receptor)	Distinct from CNL; often requires helper genes	Defense against biotrophic pathogens
CNL	CC (Coiled-Coil)	Distinct from TNL; involves Ca²⁺ influx	Defense against various pathogens
RNL	RPW8 (Resistance to Powdery Mildew 8)	Acts as a common signaling component	Helper in TNL and CNL signaling networks

Experimental Approaches and Methodologies

Linking gene copy number to functional output requires a multi-faceted experimental approach, from genome-wide identification to functional validation.

Genome-Wide Identification and Evolutionary Analysis

1. Identification of NBS-Encoding Genes: The standard methodology involves scanning plant proteomes for the presence of a Nucleotide-Binding Site (NBS) or NB-ARC domain. This is typically performed using tools like PfamScan with a hidden Markov model (HMM) profile of the NB-ARC domain (e.g., PF00931) at a stringent e-value cutoff (e.g., 1.1e-˗⁵⁰) [2]. Subsequent filtering for genes containing additional domains like LRR, TIR, or CC allows for the classification of genes into NBS-LRR subfamilies [55].

2. Orthogroup and Phylogenetic Analysis: To trace evolutionary relationships, putative NBS genes from multiple species are clustered into orthogroups (OGs) using algorithms like OrthoFinder with tools like DIAMOND for sequence similarity and MCL for clustering [2]. This identifies core orthogroups (conserved across species) and lineage-specific expansions. Multiple sequence alignment of protein sequences with MAFFT followed by maximum-likelihood phylogenetic tree construction with FastTree or IQ-TREE helps visualize these evolutionary relationships [2] [55].

3. Duplication and Synteny Analysis: The contribution of different duplication mechanisms (WGD vs. SSD) to the NBS-LRR repertoire is assessed using synteny analysis. Tools like MCScanX are used to identify collinear genomic blocks within and between species, pinpointing NBS-LRR genes derived from WGD events [55]. The analysis of allele-specific loss in polyploids can further reveal the selective pressures acting on these duplicates.

Diagram 1: Workflow for linking gene copy number to function.

Transcriptomic Profiling and Expression Analysis

1. RNA-Sequencing (RNA-seq) for Expression Quantification: Transcriptome sequencing is the cornerstone for measuring functional output. For studies on polyploids, RNA-seq must be performed on the polyploid and its diploid progenitors under controlled conditions and relevant stresses. The expression level of each gene is quantified, typically as FPKM (Fragments Per Kilobase of transcript per Million mapped reads) or TPM (Transcripts Per Million) [2].

2. Analysis of Homeolog Expression Bias: In allopolyploids, which result from hybridization and genome doubling, the two homologous copies from each progenitor (homeologs) can be distinguished based on single nucleotide polymorphisms (SNPs). Using the RNA-seq data, the total expression and the relative contribution of each homeolog to the total expression are quantified. This reveals whether one homeolog is preferentially expressed (expression bias), a common phenomenon in allopolyploids like Acanthus tetraploideus, where 22.87% of genes exhibited biased homeolog expression [52].

3. Differential Expression Analysis: To understand the functional response of NBS-LRR genes, their expression is profiled in susceptible versus tolerant plant accessions under biotic stress. For example, in cotton leaf curl disease (CLCuD), expression profiling revealed the putative upregulation of specific orthogroups (OG2, OG6, OG15) in different tissues [2]. Software like DESeq2 or edgeR is used to identify genes that are differentially expressed between conditions.

Functional Validation

1. Virus-Induced Gene Silencing (VIGS): VIGS is a powerful reverse-genetics tool to rapidly assess gene function. For instance, the silencing of GaNBS (a gene from orthogroup OG2) in resistant cotton demonstrated its putative role in reducing the titer of the cotton leaf curl disease virus, thereby validating its function [2].

2. Protein-Ligand and Protein-Protein Interaction Studies: Computational models can predict the interaction between NBS proteins and pathogen effectors. For example, molecular docking studies showed a strong interaction of putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus, providing mechanistic insight into their function [2].

Table 2: Key Reagent Solutions for Transcriptomic and Functional Studies

Research Reagent / Tool	Function / Application	Key Feature
Pfam & InterProScan	Protein domain annotation and identification of NBS-ARC domain.	Curated HMM profiles for precise domain detection.
OrthoFinder	Clustering of genes into orthogroups across species.	Infers evolutionary relationships and gene families.
MCScanX	Intra- and inter-species synteny and collinearity analysis.	Identifies WGD and tandem duplication events.
DESeq2 / edgeR	Statistical analysis of differential gene expression from RNA-seq.	Models count data and controls for false discoveries.
VIGS Vectors	(e.g., TRV-based) Functional validation through transient gene silencing.	Rapid, transient knockdown without stable transformation.
VT3D / Circos	Visualization of transcriptomic data and genomic relationships.	Intuitive exploration of spatial and numerical data.

Key Findings in Diploid and Tetraploid Plants

Research has revealed distinct patterns of NBS-LRR evolution and expression in diploids versus polyploids, offering insights into the link between copy number and function.

Evolutionary Dynamics and Gene Retention

Comparative genomic analyses across multiple species show that the number of NBS-LRR genes is not correlated with genome size or total gene count but is significantly influenced by whole-genome duplication [55]. For example, in sugarcane, WGD is the primary driver of its large NBS-LRR repertoire. Furthermore, studies show that NBS-LRR genes are often retained in duplicate after WGD due to purifying selection, which aligns with the GBH, as their products often function in interconnected networks and pathways [54] [55].

Transcriptomic Responses to Genome Dosage Change

A critical test of the GBH is measuring transcriptomic responses immediately after genome duplication. Studies on synthetic autopolyploids of Arabidopsis show that while individual gene dosage responses are highly variable, genes putatively involved in dosage-balance-sensitive groups (e.g., certain GO terms, metabolic pathways) exhibit smaller and more coordinated dosage responses than dosage-insensitive genes [54]. This coordinated response is consistent with selective constraints to maintain stoichiometric balance.

Homeolog Expression Bias in Allopolyploids

In natural allopolyploids, transcriptomic asymmetry is a key feature. The recent allopolyploid mangrove Acanthus tetraploideus demonstrates that homeolog expression bias is widespread but attenuated compared to an in silico mix of its diploid parents' transcriptomes [52]. While 67.66% of genes showed bias in the synthetic mix, only 22.87% were biased in the natural tetraploid, indicating a post-polyploidization reconfiguration of expression. This reconfiguration involves both the retention of parental expression legacy and the emergence of novel expression patterns, potentially contributing to adaptation [52].

Diagram 2: Logical flow from duplication to functional output.

Expression in Disease Resistance

Transcriptomic studies in a disease context highlight the functional relevance of NBS-LRR expansion. In modern sugarcane cultivars, which are complex polyploids, a greater proportion of differentially expressed NBS-LRR genes in response to disease were derived from the wild species S. spontaneum than from S. officinarum, indicating that S. spontaneum contributes disproportionately to disease resistance [55]. This demonstrates how the merger of divergent genomes in a polyploid can create novel functional output by combining regulatory and coding sequences from different progenitors.

Table 3: Transcriptomic Responses to Ploidy and Stress

Study System	Ploidy Manipulation / Condition	Key Transcriptomic Finding	Reference
Arabidopsis accessions	Synthetic autopolyploidy	Dosage-balance-sensitive gene groups show smaller, more coordinated expression changes.	[54]
Gossypium hirsutum (Cotton)	CLCuD infection	Orthogroups OG2, OG6, OG15 show putative upregulation in tolerant and susceptible plants.	[2]
Acanthus tetraploideus (Mangrove)	Natural allopolyploid	22.87% of genes show homeolog expression bias, attenuated from the parental mix (67.66%).	[52]
Sugarcane cultivars	Multiple disease infections	More DE NBS-LRR genes originate from the S. spontaneum subgenome than expected.	[55]

Visualization of Transcriptomic Data

Effective visualization is crucial for interpreting complex transcriptomic datasets. For genomic data, Circos plots are ideal for displaying relationships, such as the location of NBS-LRR genes on chromosomes and their connections through duplication events [56]. For expression data, heatmaps standardly display gene expression patterns across multiple samples or conditions. Volcano plots are used to visualize the relationship between the magnitude of expression change (fold-change) and its statistical significance (-log10(p-value)) in differential expression analyses [56]. With the emergence of 3D spatially resolved transcriptomics, tools like VT3D allow for the projection of gene expression onto any 2D plane or the creation of interactive 3D models, enabling the exploration of gene expression patterns within the context of tissue architecture [57].

Transcriptomics provides the critical empirical link between the expansion of gene families, such as NBS-LRR, and their functional output in plant immunity and adaptation. The evidence demonstrates that the relationship between gene copy number and transcript abundance is not a simple 1:1 correlation but is modulated by complex regulatory mechanisms, including those enforcing gene balance and those generating novel expression patterns in polyploids. The interplay of whole-genome duplication, which provides the genetic raw material, and transcriptomic reconfiguration, which refines its functional output, is a powerful force in the evolution of disease resistance in plants. Future research, leveraging increasingly sophisticated genomic, transcriptomic, and visualization tools, will continue to unravel the precise mechanisms by which gene copy number is translated into a functional phenotype.

Functional genomics in polyploid plants presents unique challenges due to genomic complexity, gene redundancy, and the difficulties in transforming these species. This technical guide explores Virus-Induced Gene Silencing (VIGS) as a powerful tool for functional gene validation within the context of nucleotide-binding site (NBS) gene expansion in diploid versus tetraploid plants. We provide a comprehensive analysis of VIGS methodology, including optimized protocols for polyploid systems, data interpretation frameworks, and integration strategies with multi-omics approaches. The document serves as an essential resource for researchers investigating the evolutionary dynamics of disease resistance genes in complex plant genomes.

Virus-Induced Gene Silencing (VIGS) has emerged as a transformative technology for functional genomics in plants, particularly for species recalcitrant to stable genetic transformation. As a transient, sequence-specific post-transcriptional gene silencing method, VIGS utilizes recombinant viral vectors to trigger systemic suppression of endogenous plant gene expression, leading to observable phenotypic changes that enable rapid gene function characterization [58]. The foundation of VIGS was established in 1995 when Kumagai et al. used a Tobacco mosaic virus vector carrying a phytoene desaturase (PDS) gene fragment to induce silencing, resulting in characteristic photo-bleaching phenotypes [58]. Since this pioneering work, VIGS has been adapted for diverse plant species, with vectors based on various viruses including Tobacco Rattle Virus (TRV), Bean Pod Mottle Virus (BPMV), and Cotton Leaf Crumple Virus (CLCrV) expanding its applications [58].

Polyploidy, the possession of multiple sets of chromosomes, is a common phenomenon in flowering plants that provides evolutionary advantages but complicates functional genetic studies. The presence of homeologous gene copies (paralogs) in polyploid genomes can lead to functional redundancy, where silencing a single gene may not produce observable phenotypes due to compensation by other copies [10]. This is particularly relevant for NBS-encoding genes, which constitute the largest family of plant disease resistance (R) genes and have undergone significant expansion in polyploid species [10] [2]. Comparative genomic analyses have revealed that allotetraploid cotton species (G. hirsutum and G. barbadense) possess nearly twice the number of NBS-encoding genes compared to their diploid progenitors (G. arboreum and G. raimondii), demonstrating how polyploidization events dramatically reshape the R-gene repertoire [10]. Understanding the functional divergence and specialization of these expanded gene families requires sophisticated validation tools like VIGS that can overcome the challenges posed by polyploid genomes.

NBS Gene Expansion in Diploid vs. Tetraploid Plants: A Comparative Framework

The evolution of NBS-encoding genes following polyploidization events reveals complex patterns of gene retention, loss, and functional diversification. Comparative analyses between diploid and tetraploid cotton species provide compelling evidence for asymmetric evolution of NBS-encoding genes, where allotetraploids inherit different proportions of R-genes from their diploid progenitors [10]. In Gossypium species, G. hirsutum inherited more NBS-encoding genes from the A-genome diploid G. arboreum, while G. barbadense inherited more from the D-genome diploid G. raimondii [10]. This asymmetric distribution correlates with differential disease resistance, as G. raimondii and G. barbadense show stronger resistance to Verticillium wilt compared to the more susceptible G. arboreum and G. hirsutum [10].

Table 1: NBS-Encoding Gene Distribution in Diploid and Tetraploid Cotton Species

Species	Ploidy	Total NBS Genes	CNL	TNL	RNL	Other
G. arboreum (A2)	Diploid	246	124 (50.4%)	7 (2.8%)	3 (1.2%)	112 (45.5%)
G. raimondii (D5)	Diploid	365	146 (40.0%)	64 (17.5%)	4 (1.1%)	151 (41.4%)
G. hirsutum (AD1)	Allotetraploid	588	254 (43.2%)	5 (0.9%)	7 (1.2%)	322 (54.8%)
G. barbadense (AD2)	Allotetraploid	682	235 (34.5%)	55 (8.1%)	11 (1.6%)	381 (55.9%)

Beyond cotton, similar expansion patterns are observed across diverse polyploid systems. In wheat (allohexaploid), 580 complete ORF candidate NBS-encoding genes were identified, with balanced distribution across the three sub-genomes but uneven chromosomal distribution, with approximately 22% localized on homeologous group 7 chromosomes [59]. The diversification of NBS genes following polyploidization involves both whole-genome duplication and small-scale duplication mechanisms, with tandem duplications playing a particularly significant role in species-specific amplification of certain NBS classes [60] [61].

Structural analysis of NBS genes reveals significant variation in domain architecture between diploids and polyploids. In Brassica species, which underwent whole-genome triplication after divergence from Arabidopsis thaliana, NBS-encoding genes show distinct evolutionary patterns, with rapid deletion or loss of NBS-encoding homologous gene pairs on triplicated regions, followed by species-specific gene amplification through tandem duplication [60]. This dynamic evolutionary landscape underscores the importance of functional validation tools capable of resolving the contributions of individual homeologous copies in polyploid species.

VIGS Methodology for Polyploid Plants

Viral Vector Systems and Selection Criteria

Choosing appropriate viral vectors is fundamental to successful VIGS implementation in polyploid plants. Different vector systems offer distinct advantages and limitations that must be considered in the context of polyploid genomics:

Tobacco Rattle Virus (TRV) has emerged as one of the most versatile VIGS vectors, particularly for dicotyledonous plants. The bipartite genome organization of TRV requires two vectors: TRV1 encodes replicase proteins, movement protein, and a weak RNA interference suppressor, ensuring virus replication and systemic spread; TRV2 contains the capsid protein gene and a multiple cloning site for inserting target gene fragments [58]. TRV-based VIGS has been successfully established in soybean, where it achieved 65-95% silencing efficiency through Agrobacterium tumefaciens-mediated infection of cotyledon nodes [62]. The broad host range, efficient systemic movement, and mild symptomology of TRV make it particularly valuable for polyploid species [58].

Bean Pod Mottle Virus (BPMV) is widely adopted for legumes, especially soybean, but frequently relies on particle bombardment, which can induce leaf phenotypic alterations that interfere with accurate phenotypic evaluation [62]. This limitation is particularly problematic in polyploid systems where subtle phenotypic changes may be significant.

Species-specific vectors offer advantages for particular plant families. Apple Latent Spherical Virus (ALSV) has been used in soybean functional studies, while Cotton Leaf Crumple Virus (CLCrV) is valuable for Gossypium species [58]. For polyploid plants, vector selection must consider the ability to target multiple homeologous copies simultaneously and achieve systemic silencing across different tissue types.

Table 2: Comparison of Viral Vectors for VIGS in Polyploid Plants

Vector	Virus Type	Host Range	Advantages	Limitations
TRV	RNA virus	Broad, especially Solanaceae	Mild symptoms, efficient systemic movement, targets meristems	Bipartite system requires two vectors
BPMV	RNA virus	Legumes, especially soybean	Well-established for soybean	Often requires particle bombardment, can cause leaf symptoms
ALSV	RNA virus	Legumes, Rosaceae	Mild symptoms, broad host range	Less established protocols
CLCrV	DNA virus	Malvaceae, especially cotton	Species-specific efficiency	Limited to compatible hosts

Optimization of VIGS Protocols for Polyploid Species

Successful implementation of VIGS in polyploid plants requires protocol optimization to address challenges posed by genomic complexity and redundancy. The Agrobacterium-mediated infection method has been significantly improved for soybean, where conventional methods (misting and direct injection) showed low efficiency due to thick leaf cuticles and dense trichomes [62]. An optimized approach involves:

Explant Preparation: Sterilized soybeans are soaked in sterile water until swollen, then longitudinally bisected to obtain half-seed explants [62]. This technique exposes vulnerable meristematic tissues for efficient Agrobacterium infection.

Infection Procedure: Fresh explants are immersed for 20-30 minutes (optimal duration) in Agrobacterium tumefaciens GV3101 suspensions containing either pTRV1 or pTRV2-GFP derivatives [62]. The sterile tissue culture-based procedure achieves transformation efficiencies exceeding 80%, reaching up to 95% for specific cultivars like Tianlong 1 [62].

Efficiency Evaluation: Fluorescence microscopy at the infection sites reveals successful transduction, with longitudinal sections showing initial infiltration of 2-3 cell layers before gradual spread to deeper cells [62]. Transverse sections demonstrate that more than 80% of cells exhibit successful infiltration, indicating high infection efficiency [62].

For polyploid plants specifically, additional optimization parameters include:

Insert Design: Designing constructs that target conserved regions across homeologous genes to achieve simultaneous silencing of multiple copies, or designing specific constructs to target individual copies.
Agroinoculum Concentration: Optimizing optical density (OD600) to balance silencing efficiency with plant health, typically between 0.3-2.0 depending on species and vector.
Environmental Factors: Controlling temperature (18-22°C), humidity, and photoperiod to enhance silencing efficiency and stability.
Developmental Stage: Selecting appropriate plant growth stages (often 1-2 leaf stages) for inoculation to maximize systemic silencing.

Experimental Design and Workflow

The following diagram illustrates the complete VIGS workflow for functional validation of NBS genes in polyploid plants:

Target Gene Selection and Fragment Design

For polyploid plants, target selection requires comprehensive bioinformatic analysis to identify all homeologous copies of the target NBS gene. Genome databases, synteny maps, and phylogenetic analyses should be employed to catalog gene family members and identify conserved versus divergent regions. Effective fragment design should:

Target conserved regions to silence multiple homeologs simultaneously
Avoid off-target silencing by performing specificity checks against the entire transcriptome
Optimize fragment length (typically 200-500 bp) for efficient processing and silencing
Consider GC content and secondary structure that may affect silencing efficiency

Experimental Controls and Replication

Robust experimental design is particularly crucial in polyploid systems where phenotypic effects may be subtle due to genetic redundancy. Essential controls include:

Empty vector controls (pTRV:empty) to account for effects of viral infection
Non-silenced wild-type plants under identical growth conditions
Positive silencing controls using marker genes like phytoene desaturase (PDS) that produce visible phenotypes (photo-bleaching)
Multiple biological replicates (minimum 8-12 plants per construct) to account for plant-to-plant variation in silencing efficiency
Technical replicates for molecular analyses to ensure result reliability

Molecular Mechanisms and Signaling Pathways

The following diagram illustrates the molecular mechanism of VIGS and its interaction with plant defense signaling:

VIGS operates through the plant's endogenous RNA silencing machinery, specifically the post-transcriptional gene silencing (PTGS) pathway [58]. When a recombinant virus containing a fragment of a plant gene infects the host, the viral RNA is recognized by the plant's defense system, triggering a sequence-specific degradation process that also targets complementary endogenous mRNAs [58]. The core mechanism involves:

Double-stranded RNA (dsRNA) Formation: Viral replication intermediates or secondary structures form dsRNA molecules, which are recognized as pathogen-associated molecular patterns by the plant immune system.

Dicer-like Enzyme Processing: Cellular Dicer-like (DCL) enzymes cleave long dsRNA molecules into 21-24 nucleotide small interfering RNAs (siRNAs), with the size depending on the specific DCL enzyme involved [58].

RISC Assembly and Targeting: These siRNAs are incorporated into an RNA-induced silencing complex (RISC), which uses the siRNA as a guide to identify and cleave complementary viral and endogenous mRNA molecules [58].

Systemic Spread: The silencing signal amplifies and moves systemically through the plant, potentially targeting all homeologous copies of the gene of interest in different subgenomes.

In polyploid plants, this mechanism must overcome the challenge of genetic redundancy. Successful silencing of multiple homeologous copies requires sufficient sequence similarity for cross-silencing or the design of multiple constructs targeting different copies. The efficiency of systemic silencing spread is particularly important for reaching all tissues where target genes are expressed.

Data Interpretation and Validation in Polyploid Systems

Molecular Validation of Silencing Efficiency

Comprehensive validation of silencing efficiency is crucial in polyploid plants to confirm reduction of all target homeologs. Effective approaches include:

Quantitative RT-PCR: Design primers that either specifically amplify individual homeologs or target conserved regions to measure total expression reduction. Specific primer design requires identification of unique sequence variants in each homeolog.

Western Blotting: When suitable antibodies are available, protein-level analysis provides functional confirmation of reduced target expression.

Phenotypic Validation: For NBS genes, functional validation involves pathogen challenge assays to confirm compromised resistance responses in silenced plants.

In soybean VIGS systems, silencing efficiency typically ranges from 65% to 95%, as demonstrated by significant reduction in target gene expression and clear phenotypic changes in genes like GmPDS, GmRpp6907, and GmRPT4 [62].

Addressing Challenges in Polyploid Systems

Polyploid plants present specific challenges for VIGS experiments that require specialized approaches:

Functional Redundancy: When silencing single genes in multigene families fails to produce phenotypes, consider simultaneous silencing of multiple family members using constructs targeting conserved regions.

Differential Expression Patterns: Homeologous genes may exhibit divergent expression patterns in different tissues or developmental stages, requiring comprehensive analysis across multiple conditions.

Compensatory Mechanisms: Other genes may compensate for silenced homeologs, potentially masking phenotypic effects. Time-course experiments can help identify early phenotypes before compensation occurs.

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents for VIGS in Polyploid Plants

Reagent Category	Specific Examples	Function and Application
Viral Vectors	pTRV1, pTRV2, BPMV vectors	Delivery of target gene fragments to trigger silencing
Agrobacterium Strains	GV3101, LBA4404	Mediate plant transformation through T-DNA transfer
Selection Markers	Kanamycin, Rifampicin	Selection of transformed Agrobacterium and plant cells
Visual Markers	GFP, PDS	Visual assessment of silencing efficiency and spread
Enzymes for Molecular Cloning	Restriction enzymes, Ligases	Vector construction and target fragment insertion
qRT-PCR Reagents	SYBR Green, specific primers	Quantification of silencing efficiency across homeologs
Pathogen Isolates	Species-specific pathogens	Functional assessment of silenced NBS disease resistance

Case Study: VIGS Application in Cotton NBS Gene Validation

The application of VIGS for functional validation of NBS genes in cotton demonstrates its power in polyploid systems. In a recent study, silencing of specific NBS genes in resistant cotton (GaNBS from OG2) through VIGS demonstrated their putative role in virus resistance, as evidenced by increased viral titers in silenced plants [2]. This approach validated the function of specific NBS genes in disease resistance while also illustrating how VIGS can be used to dissect complex resistance mechanisms in polyploids.

Comparative analysis of NBS genes in diploid and allotetraploid cotton species revealed significant differences in TNL gene proportions, with G. raimondii (diploid) and G. barbadense (allotetraploid) possessing 13.70% and 6.45% TNL genes respectively, compared to only 2.03% in G. arboreum and 0.85% in G. hirsutum [10]. This uneven distribution suggests preferential retention or expansion of specific NBS classes following polyploidization, with potential functional implications for disease resistance specificity.

Integration with Multi-Omics Approaches

VIGS serves as a critical validation tool within comprehensive functional genomics pipelines integrating multi-omics data. In polyploid plants, this integration is particularly valuable for:

Linking Genomic and Transcriptomic Data: VIGS can validate predictions from comparative genomic analyses regarding functional divergence between homeologous genes.

Connecting Expression Patterns with Function: Tissue-specific or condition-specific expression patterns identified in transcriptomic studies can be functionally tested using VIGS.

Validating Proteomic and Metabolomic Networks: VIGS of regulatory genes can help establish causal relationships in protein and metabolic networks.

Recent advances in VIGS technology include integration with high-throughput phenotyping, CRISPR/Cas9 systems for validation of editing targets, and single-cell transcriptomics to resolve cell-type-specific functions of silenced genes.

VIGS has established itself as an indispensable tool for functional validation of NBS genes and other important gene families in polyploid plants. Its ability to overcome transformation barriers, rapidly assess gene function, and simultaneously target multiple homeologous copies makes it particularly valuable for dissecting the complex genetic architecture of polyploid genomes. The continued refinement of viral vectors, delivery methods, and validation approaches will further enhance VIGS applications in polyploid species.

Future developments in VIGS technology will likely focus on increasing specificity and efficiency, expanding host range, and improving temporal control over silencing induction. Integration with emerging technologies like single-cell sequencing, spatial transcriptomics, and advanced phenotyping platforms will provide unprecedented resolution in functional genomics studies. For researchers investigating NBS gene expansion in diploid versus tetraploid plants, VIGS offers a powerful approach to functionally validate evolutionary hypotheses and connect genomic changes with phenotypic outcomes in plant-pathogen interactions.

Navigating Analytical Challenges in Polyploid NBS Gene Research

Sequence Assembly and Annotation Hurdles in Highly Duplicated Polyploid Genomes

Highly duplicated polyploid genomes present a formidable challenge for sequence assembly and annotation, profoundly impacting research on nucleotide-binding site (NBS) gene expansion in diploid versus tetraploid plants. The coexistence of multiple homologous subgenomes and the extensive presence of repetitive elements complicate the reconstruction of accurate genome sequences, potentially obscuring genuine NBS gene diversification patterns. This technical review examines the specific bottlenecks introduced by polyploidy throughout genomic workflows, evaluates current technological and algorithmic solutions, and provides detailed experimental frameworks for studying NBS gene evolution across ploidy levels. By integrating recent advances in sequencing technologies with specialized bioinformatic approaches, we present a comprehensive strategy to navigate the complexities of polyploid genomes, enabling more accurate characterization of the link between genome duplication and disease resistance gene expansion.

Polyploidy, the condition of possessing more than two complete sets of chromosomes, represents a widespread evolutionary phenomenon in plants that drives genomic novelty and adaptation. Research comparing diploid and tetraploid organisms has revealed that genome doubling often induces substantial morphological and physiological changes, including altered leaf morphology and enhanced stress tolerance [63]. However, this genomic complexity creates significant obstacles for sequencing projects. Unlike diploid genomes with essentially two copies of each chromosome, polyploid genomes contain multiple homologous subgenomes with high sequence similarity. This homology makes it extremely challenging to distinguish between true genetic variation across subgenomes and assembly artifacts, particularly in repetitive regions where NBS resistance genes are frequently located [9] [2].

The study of NBS gene expansion in diploid versus tetraploid plants is particularly dependent on high-quality genome assemblies. NBS genes encode proteins containing nucleotide-binding sites and C-terminal leucine-rich repeats that constitute the largest family of plant resistance (R) genes [9]. These genes are vital for plant defense against pathogens, and their expansion through duplication events is considered a key mechanism in the evolution of disease resistance. In tetraploid plants, the immediate doubling of all genetic material provides raw material for NBS gene family expansion and functional diversification [2]. However, accurately resolving these often-tandemly duplicated genes in assembly outputs remains technically challenging, potentially leading to underestimation of gene family sizes or misannotation of paralogous relationships.

Technical Hurdles in Polyploid Genome Assembly

Repetitive Sequence Complexity

Repetitive DNA sequences constitute a substantial portion of plant genomes, with repeats accounting for 25–50% of typical mammalian genomes and often higher percentages in plants [64]. These repetitive elements can be broadly classified into two categories based on their genomic arrangement:

Table 1: Categories of Repetitive Sequences Complicating Polyploid Assembly

Category	Subtype	Unit Length	Genomic Features	Impact on Assembly
Tandem Repeats	Microsatellites	<5 bp	Short tandem repetitions; most frequent type	Fragment assembly; misassembly of paralogous regions
	Minisatellites	>5 bp	Tandem repetitions; relatively rare	Create identical overlaps between distinct loci
	Centromeric satellites	100-5000 bp	Alpha-satellite and Satellite II/III; span Mb regions	Prevent complete chromosome assembly
	Telomeric repeats	CCCTAA/TTAGGG motifs	300-8000 precise motifs; span 2-50 kb	Limit end-resolution of chromosomes
Interspersed Repeats	DNA transposons	Variable	~5% of human genome; inactive fossils in mammals	Cause misjoins between unrelated genomic regions
	RNA transposons (Retrotransposons)	Variable	LINEs, SINEs, SVA elements; remain active	Create complex, nested repeat structures

In polyploid genomes, the challenge of repetitive sequences is compounded by the presence of highly similar repeats across different subgenomes. Tandem repeats are particularly problematic because their repetitive nature means that sequence reads originating from different genomic locations appear identical, making it impossible to determine their correct placement in the assembly [64]. This issue is exacerbated in NBS gene regions, which frequently reside in repetitive-rich genomic neighborhoods and often form clustered arrays with sequence similarity between functional genes and pseudogenes [9].

Algorithmic Limitations with Homologous Sequences

Early genome assembly strategies relied on clone-by-clone sequencing and overlap-layout-consensus (OLC) algorithms, which were successful for assembling the first human and mouse genomes [65]. However, most contemporary assemblers use de Bruijn graph approaches that break reads into shorter k-mers before assembly. While computationally efficient, these methods struggle with the high levels of heterozygosity and repetitive elements characteristic of polyploid genomes [65] [66].

When applied to polyploid genomes, these algorithms frequently collapse homologous regions from different subgenomes into single consensus sequences, thereby erasing important structural and sequence variations that may have functional significance [66]. This "haplotype collapse" problem is particularly detrimental for studying NBS gene families, as it can obscure recent gene duplications and homogenize sequence variations that are crucial for understanding the evolutionary trajectory of disease resistance genes in polyploids.

Diagram 1: Polyploid genome assembly challenges. The high sequence similarity between subgenomes leads to graph complexities that result in three primary error types in final assemblies.

Scaffolding and Gap Problems

The presence of extensive repetitive regions in polyploid genomes leads to significant fragmentation during initial contig assembly. Scaffolding methods that use paired-end reads or long-range information often fail to correctly connect contigs across repetitive stretches that are longer than the read or insert size [66]. This results in assemblies with thousands of gaps, particularly in pericentromeric and subtelomeric regions where NBS genes are frequently located [64].

In polyploid genomes, the scaffolding problem is exacerbated because repetitive sequences may be conserved across subgenomes, making it difficult to determine which subgenome a particular contig belongs to. This issue directly impacts the study of NBS gene expansion, as these genes are often arranged in complex clusters with variable copy numbers between subgenomes. Without accurate scaffolding, researchers cannot determine whether NBS gene duplications occurred before or after polyploidization, nor can they accurately associate specific NBS genes with particular subgenomes [2].

Impact on NBS Gene Characterization in Polyploid Plants

Gene Copy Number Ambiguity

The accurate determination of gene copy number is essential for understanding NBS gene expansion in polyploid plants. However, assembly fragmentation and haplotype collapse can lead to significant underestimation of true NBS gene numbers. Comparative studies have shown that plant genomes contain highly variable numbers of NBS genes, ranging from just a few in some bryophytes to over 2,000 in wheat [2]. This variation reflects both biological differences and technical challenges in assembly and annotation.

In tetraploid plants, the immediate duplication of the entire genome provides a rich substrate for NBS gene family expansion. Recent analyses have identified 12,820 NBS-domain-containing genes across 34 plant species, with these genes classified into 168 distinct domain architecture patterns [2]. The research revealed that NBS genes in polyploid plants often show species-specific structural patterns and complex arrangements that are difficult to resolve with standard assembly approaches. When assemblies fragment within NBS gene clusters, annotation pipelines may fail to identify complete genes or may incorrectly merge adjacent genes into artificial chimeras.

Expression Analysis Complications

Incomplete assemblies directly impact transcriptomic studies of NBS genes in diploid versus tetraploid plants. RNA-seq analysis requires a reference genome for read alignment and transcript quantification, and assembly errors can lead to misinterpretation of expression patterns. In a comparative transcriptome study of diploid and tetraploid Miscanthus lutarioriparius under drought stress, researchers found that the number of differentially expressed genes in diploid plants was much higher than in tetraploid, suggesting tetraploids may require fewer transcriptional changes due to pre-adaptation mechanisms [67]. However, such conclusions depend heavily on the completeness of the reference assembly.

If NBS genes are missing or fragmented in the reference genome, their expression cannot be accurately quantified. This is particularly problematic for studies comparing expression between diploids and tetraploids, where missing paralogs in the assembly could create the false impression of NBS gene family contraction or reduced expression. Additionally, without chromosome-scale assemblies, researchers cannot determine whether expression differences are linked to specific genomic contexts or subgenomes, limiting understanding of NBS gene regulation in polyploids.

Methodological Solutions and Experimental Frameworks

Sequencing Technology Selection

Choosing appropriate sequencing technologies is critical for overcoming polyploid assembly challenges. No single technology currently provides the perfect solution, necessitating hybrid approaches that leverage the complementary strengths of multiple platforms:

Table 2: Sequencing Technologies for Polyploid Genome Assembly

Technology	Read Length	Advantages	Limitations	Application to NBS Genes
Illumina (Short-read)	50-300 bp	High base accuracy; low cost; high throughput	Insufficient for resolving repeats; haplotype collapse	Base-level polishing; variant calling; expression analysis
PacBio HiFi	10-25 kb	High accuracy long reads; resolves complex regions	Higher DNA input requirements; moderate cost	Spanning repetitive NBS clusters; phasing haplotypes
Oxford Nanopore	Up to hundreds of kb	Extremely long reads; direct epigenetic detection	Higher error rate; requires specialized analysis	Scaffolding; resolving satellite repeats near centromeres
Hi-C	N/A	Chromosome-scale scaffolding; subgenome assignment	Does not provide sequence content	Anchoring NBS clusters to chromosomes; subgenome assignment
Optical Mapping	N/A	Physical map validation; detecting misassemblies	Limited resolution; specialized equipment	Validating NBS cluster organization; checking assembly structure

Long-read sequencing technologies have demonstrated remarkable effectiveness in resolving complex genomic regions. Pacific Biosciences (PacBio) Single Molecule Real-Time Sequencing and Oxford Nanopore Technologies (ONT) can generate reads tens of kilobases long, often spanning entire repetitive elements and providing the connectivity information needed to correctly assemble through repetitive regions [65] [66]. These technologies have been instrumental in assembling previously intractable regions, including complex NBS gene clusters.

Specialized Assembly Algorithms for Polyploid Genomes

Modern genome assemblers have evolved to better handle the complexities of polyploid genomes through several key approaches:

Diploid-aware assembly: Tools such as FALCON and Canu incorporate specialized algorithms that preserve haplotype differences during assembly, rather than collapsing them into single consensus sequences [65]. These tools use an overlap-layout-consensus approach that is more suitable for long, error-prone reads and can maintain separate assembly paths for highly similar haplotypes.
Hybrid assembly strategies: Combining the base-level accuracy of short reads with the connectivity of long reads enables both high accuracy and improved contiguity. Tools such as SPAdes and MaSuRCA implement sophisticated hybrid approaches that leverage both de Bruijn graphs for accurate contig formation and overlap graphs for scaffolding [68].
Trio-binning and genetic mapping: Using sequence data from parental lines helps assign sequences to specific subgenomes in allopolyploids. This approach was successfully used in the assembly of complex plant genomes such as wheat and cotton, enabling researchers to distinguish between homologous chromosomes from different subgenomes [66].

Diagram 2: Recommended workflow for polyploid genome assembly focused on NBS gene characterization. The multi-platform approach provides complementary data types to overcome specific challenges.

NBS Gene Identification and Validation Pipeline

Accurate annotation of NBS genes in assembled genomes requires specialized approaches:

Domain-based identification: The standard method for identifying NBS genes involves searching for the NB-ARC domain (PF00931) using tools such as PfamScan with a conservative e-value threshold (1.1e-50) [2]. Additional domains (TIR, RPW8, LRR) are then identified to classify NBS genes into subfamilies (TNL, CNL, RNL).
Transcriptome integration: Incorporating RNA-seq data from multiple tissues and stress conditions significantly improves NBS gene annotation. The expression evidence helps validate gene models and may reveal condition-specific isoforms. In tetraploid birch, transcriptome analysis revealed that NBS genes were generally expressed at low levels, with a subset showing relatively high expression during later development in specific tissues [63].
Orthogroup analysis: Tools such as OrthoFinder enable the clustering of NBS genes into orthogroups across species, facilitating evolutionary comparisons between diploid and tetraploid plants. Recent studies have identified 603 orthogroups of NBS genes, with some core groups conserved across species and others specific to particular lineages [2].
Functional validation: Virus-induced gene silencing (VIGS) provides an efficient method for validating NBS gene function. In one study, silencing of GaNBS (OG2) in resistant cotton demonstrated its putative role in virus tittering, confirming the importance of this NBS gene in disease resistance [2].

Research Reagent Solutions for Polyploid NBS Gene Studies

Table 3: Essential Research Reagents and Tools for Polyploid NBS Gene Analysis

Category	Specific Tools/Reagents	Application	Technical Notes
Sequencing Technologies	PacBio Revio, Oxford Nanopore PromethION	Long-read sequencing for complex regions	HiFi reads recommended for base accuracy; ultra-long reads for scaffolding
Assembly Software	FALCON, Canu, HiCanu, Verkko	Diploid-aware assembly	Use specialized modes for polyploids; adjust parameters for expected heterozygosity
Repeat Annotation	ULTRA, TRF, tantan	Identification of tandem repeats	ULTRA provides improved sensitivity for decayed repeats
NBS Gene Identification	PfamScan, NLR-Annotator	Domain-based gene classification	Use custom HMM profiles for specific plant families
Expression Validation	RNA-seq, qPCR primers	Expression analysis across conditions	Include multiple tissues and stress treatments
Functional Validation	VIGS vectors, CRISPR-Cas9	Gene function validation	Optimize for specific plant species; include appropriate controls
Comparative Genomics	OrthoFinder, MCScanX	Evolutionary analysis across ploidy levels	Identify orthogroups specific to polyploids

Case Study: NBS Genes in Diploid vs. Tetraploid Cotton

A comprehensive study of NBS genes in Gossypium species provides an illustrative example of the challenges and solutions for studying NBS gene expansion in polyploid plants. Researchers identified NBS genes in diploid and tetraploid cotton species and analyzed their diversification, expression, and function [2].

The research revealed that tetraploid cotton contains a larger repertoire of NBS genes compared to its diploid progenitors, with significant expansion in specific orthogroups. Expression profiling showed that certain NBS orthogroups (OG2, OG6, and OG15) were upregulated in different tissues under various biotic and abiotic stresses in both susceptible and tolerant cotton varieties. Genetic variation analysis between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified several unique variants in NBS genes of Mac7 (6583 variants) compared to Coker312 (5173 variants) [2].

This case study highlights the importance of high-quality genome assemblies for accurate NBS gene annotation and comparative analysis. The researchers employed a combination of sequencing technologies and specialized bioinformatic tools to overcome the challenges posed by the complex polyploid cotton genome, enabling insights into the relationship between genome duplication, NBS gene expansion, and disease resistance.

The assembly and annotation of highly duplicated polyploid genomes remain formidable challenges in genomics, with significant implications for understanding NBS gene expansion in diploid versus tetraploid plants. Current technologies, particularly long-read sequencing and diploid-aware assembly algorithms, have dramatically improved our ability to resolve complex genomic regions, but significant hurdles remain.

Future progress will likely come from several directions: continued improvements in sequencing technology that provide even longer reads with higher accuracy; development of specialized algorithms that can better handle the complexities of polyploid genomes; and integration of multiple data types (genetic maps, Hi-C, optical mapping) to validate and improve assemblies. For researchers studying NBS gene expansion, adopting a multi-platform sequencing approach, implementing rigorous validation methods, and maintaining awareness of the limitations of genome assemblies will be crucial for generating accurate biological insights.

As these technical challenges are overcome, we will gain an increasingly precise understanding of how genome duplication drives the expansion and diversification of disease resistance genes in plants, ultimately facilitating the development of crops with enhanced and durable resistance to pathogens.

Allopolyploidy, the evolutionary process resulting from hybridization between different species followed by whole-genome duplication, has been a fundamental force in shaping plant evolution and domestication. This phenomenon presents a unique genomic puzzle: the merged genomes, termed subgenomes, coexist and interact within a single nucleus, leading to complex evolutionary trajectories. The study of these homeologous contributions—tracking which genetic elements originate from which progenitor—is crucial for understanding the genetic basis of traits such as disease resistance, environmental adaptation, and yield. Within the context of nucleotide-binding site (NBS) gene expansion, this tracking becomes particularly significant as these genes constitute one of the largest plant resistance gene families and exhibit dynamic evolution following polyploidization. Research across multiple allopolyploid systems reveals that NBS-encoding genes often undergo rapid diversification after genome merger and doubling, with significant implications for disease resistance profiles in polyploid crops [69]. The resolution of parental subgenome contributions not only illuminates evolutionary history but also empowers modern crop improvement efforts by identifying valuable genetic resources from progenitor species.

Evolutionary Dynamics and Subgenome Dominance

Following allopolyploidization, the merged genomes do not contribute equally to the evolutionary success of the new species. Extensive research has revealed the phenomenon of subgenome dominance, where one parental genome tends to retain more genes and exhibit higher expression levels than the other. Studies in Brassica carinata provide clear evidence of this phenomenon, where analysis of resistance gene analogs (RGAs) showed uneven duplication patterns between the B and C subgenomes, indicating subgenome dominance in this allotetraploid species [70]. Similarly, genomic investigations of all five Gossypium allopolyploid species demonstrated that subgenomes experienced evolutionary rate heterogeneities, with the D homoeologs generally acquiring substitution mutations more rapidly than the A homoeologs in most lineages [71].

However, not all allopolyploids exhibit pronounced subgenome dominance. Recent genomic analysis of Coffea arabica revealed that its two subgenomes (derived from C. canephora and C. eugenioides) show largely conserved genome structures with "no obvious global subgenome dominance" [72]. This harmonious coexistence suggests diverse evolutionary outcomes following polyploidization events. The fractionation process—where one copy of a duplicated gene is lost—also varies among allopolyploids. Arabica coffee shows only ~5% reversion of BUSCO genes to the diploid state since its allotetraploid origin, with fractionation occurring mostly in pericentromeric regions [72].

Table 1: Evolutionary Patterns in Different Allopolyploid Systems

Allopolyploid System	Subgenome Dominance	Key Evolutionary Observations	NBS Gene Dynamics
Brassica carinata	Evident in RGA duplication patterns	65.2% of RGAs affected by gene duplication events; intergenomic and intragenomic duplications identified	2,570 RGAs predicted; extensive expansion observed relative to progenitors [70]
Gossypium Species	Differential evolutionary rates between subgenomes	D homoeologs generally evolve faster than A homoeologs; transposable element exchange between subgenomes	Asymmetric inheritance affects disease resistance; TNL genes important for Verticillium wilt resistance [10] [71]
Coffea arabica	No obvious global dominance	Only ~5% BUSCO genes reverted to diploid state; harmonious subgenome coexistence	Limited information in sources; general gene retention patterns observed [72]
Brassica napus	Differential NBS gene retention	Greater diversification of NBS genes in C genome post-polyploidization; birth and death of NBS genes via non-homologous recombination	464 putatively functional NBS genes identified; co-localization with disease resistance QTLs [69]

Methodological Approaches for Subgenome Resolution

Bioinformatics Tools and Workflows

The complex task of distinguishing homeologous contributions requires specialized bioinformatics tools designed to handle the challenges of polyploid genomes. Several sophisticated approaches have been developed, each with specific strengths and applications:

AlloSHP represents a significant advancement as a command-line tool specifically designed to detect and extract single homeologous polymorphisms (SHPs) without requiring full genome assembly of the allopolyploid. This tool integrates three main algorithms—WGA, VCF2ALIGNMENT, and VCF2SYNTENY—and enables evolutionary analysis of allopolyploids by mapping sequences against known or putative diploid progenitor genomes. The key advantage of AlloSHP is its ability to work with resequencing data rather than requiring complete genome assembly, making it applicable to studies involving multiple accessions or populations [73].

CAPG employs a likelihood-based approach to weight read alignments against both subgenomic references and genotype individual allopolyploids from whole-genome resequencing data. This tool reports variant calls in VCF format with statistical support measures, classifying sites as homeologous SNPs, allelic SNPs within the subgenome, or invariant. CAPG has been validated in allotetraploid species such as peanut and cotton [73].

PhyloSD offers a different approach by integrating three sequential algorithms that enable subgenome identification even when one or more diploid progenitors are unknown (so-called "ghost" or "orphan" subgenomes). This pipeline is particularly valuable for systems where extant progenitors may be extinct or unidentified. Unlike AlloSHP and CAPG, PhyloSD requires gene or coding sequence assemblies from both diploid and polyploid species to infer gene trees [73].

Table 2: Bioinformatics Tools for Resolving Homeologous Contributions

Tool	Methodological Foundation	Data Requirements	Key Advantages	Limitations
AlloSHP [73]	Detection of single homeologous polymorphisms (SHPs) through simultaneous mapping and syntenic alignment	VCF file and reference genomes of diploid progenitors	No allopolyploid genome assembly required; works with resequencing data; preserves SNP positional traceability	SHPs restricted to syntenic regions; heterozygous sites excluded; requires progenitor references
CAPG [73]	Likelihood-based weighting of read alignments against subgenomic references	Whole-genome resequencing data; reference sequences from both subgenomes	Reports statistical support measures; handles heterozygous positions; validated in peanut and cotton	Requires reference sequences with known alignments in homologous regions
PhyloSD [73]	Integration of three algorithms for computational filtering, homeolog labeling, and subgenome assignment	Gene and/or CDS assemblies from diploid and polyploid species	Can identify "ghost" subgenomes without known progenitors; applicable to various ploidy levels	Requires gene assemblies rather than raw reads; computational complexity
PolyCat [73]	SNP-tolerant mapping using GSNAP to minimize mapping efficiency bias	NGS data from allopolyploids; single diploid reference genome	No allopolyploid assembly required; only one reference genome needed	Limited by genomic density of homeo-SNPs; potential mapping bias

Experimental and Functional Validation Approaches

Beyond computational prediction, experimental validation is crucial for confirming homeologous contributions and their functional significance. Recent advances in genome editing have opened new possibilities for functional validation:

Homeolog-specific gene editing using CRISPR/Cas9 technology represents a breakthrough for functionally testing the contributions of individual subgenomes. This approach has been successfully demonstrated in several polyploid systems. In Tragopogon mirus, researchers developed a homeolog-specific editing platform that successfully knocked out targeted homeologs of MYB10 and DFR genes without editing the other homeolog, achieving editing efficiencies of 35.7% and 45.5% respectively [74]. Similar approaches have been implemented in hexaploid wheat and tetraploid cotton, enabling precise manipulation of gene dosage to study its phenotypic consequences [74].

Comparative genomic approaches integrate multiple data types to validate subgenome contributions. For example, in Coffea arabica, researchers combined chromosome-level assemblies of the allopolyploid and its diploid progenitors with whole-genome resequencing data of wild and cultivated accessions. This integrated approach enabled both the identification of homeologous contributions and the analysis of their historical diversification during domestication [72].

Expression analysis of homeologs provides functional insights beyond sequence identification. Studies in Gossypium allopolyploids have revealed that subgenome-specific evolutionary trajectories are accompanied by gene-family diversification and homeolog expression divergence among polyploid lineages [71]. Such expression data helps identify which subgenome contributions are functionally relevant in specific tissues or conditions.

Case Studies in Major Crop Systems

Brassica: Dynamic Evolution of Disease Resistance Genes

The Brassica genus provides excellent examples of how allopolyploidization shapes the evolution of disease resistance genes. Genomic analysis of Brassica carinata (BBCC) revealed 2,570 resistance gene analogs (RGAs), with 65.2% affected by gene duplication events classified as either intergenomic or intragenomic duplications [70]. The contrasting patterns of these duplications between the B and C subgenomes provide evidence for subgenome dominance in this species. Comparative analysis with its diploid progenitors, B. nigra and B. oleracea, demonstrated conservation of genomic features while revealing that B. carinata RGAs have undergone extensive expansion [70].

In Brassica napus (AACC), genome-wide comparison identified 464 putatively functional NBS-encoding genes, unevenly distributed across the genome in clusters [69]. Interestingly, while the An-subgenome of B. napus possessed similar numbers of NBS-encoding genes (191) to the Ar genome of B. rapa (202), the Cn genome of B. napus contained many more genes (273) than the B. oleracea Co genome (146), suggesting greater diversification of NBS-encoding genes in the C genome after B. napus formation [69]. This asymmetric evolution has functional consequences, as 204 of these NBS-encoding genes were located within resistance quantitative trait locus (QTL) intervals against major diseases including blackleg, clubroot, and Sclerotinia stem rot [69].

Gossypium: Asymmetric Evolution and Disease Resistance

Cotton species (Gossypium) offer another compelling system for studying homeologous contributions. Genomic analysis of five allopolyploid cotton species revealed that despite conservation in gene content and synteny, the subgenomes have diversified through subgenomic transposon exchanges, evolutionary rate heterogeneities, and positive selection between homeologs [71]. These differential evolutionary trajectories correlate with disease resistance patterns, particularly for Verticillium wilt.

Comparative analysis of NBS-encoding genes in diploid and allotetraploid cotton species showed asymmetric evolution, with G. hirsutum inheriting more NBS-encoding genes from G. arboreum, while G. barbadense inherited more from G. raimondii [10]. This asymmetric inheritance helps explain why G. raimondii and G. barbadense show greater resistance to Verticillium wilt, while G. arboreum and G. hirsutum are more susceptible [10]. The study further suggested that TNL genes specifically may play a significant role in disease resistance to Verticillium wilt in G. raimondii and G. barbadense [10].

Coffea arabica: Limited Fractionation and Harmonious Coexistence

The recent genome assembly of Coffea arabica and its diploid progenitors provides insights into a different evolutionary path. Unlike Brassica and Gossypium, C. arabica shows no obvious global subgenome dominance and limited fractionation since its allopolyploid origin [72]. The two subgenomes (derived from C. canephora and C. eugenioides) exhibit high structural conservation, with only ~5% of BUSCO genes having reverted to the diploid state [72].

Syntenic comparisons revealed that genomic excision events, removing one or several genes at a time in similar proportions across the two subgenomes, have been the main driving force in genome fragmentation [72]. The Arabica allopolyploidy event did not significantly affect the rate of genome fractionation, which remained roughly constant when comparing deletions in progenitor species versus Arabica subgenomes after the event [72]. This evolutionary pattern more closely follows the 'harmonious coexistence' model observed in some Arabidopsis hybrids rather than the dominant-fractionation model seen in other allopolyploids.

Table 3: Essential Research Reagents and Resources for Subgenome Tracking Studies

Category	Specific Tools/Reagents	Function/Application	Example Use Cases
Bioinformatics Tools	AlloSHP, CAPG, PhyloSD, PolyCat	Detection and analysis of homeologous contributions; phylogenetic reconstruction; variant calling	Evolutionary analysis of allopolyploid complexes; population genomics studies [73]
Genome References	Diploid progenitor genomes; allopolyploid assemblies	Reference sequences for read mapping; synteny analysis; variant identification	Comparative genomics; identification of subgenome-specific markers [70] [72] [71]
Sequencing Technologies	PacBio HiFi; Oxford Nanopore; Illumina; Hi-C	Genome assembly; variant detection; chromatin interaction mapping	Chromosome-scale assemblies; structural variant identification; haplotype phasing [72] [71]
Genome Editing Systems	CRISPR/Cas9; homeolog-specific guides	Functional validation; gene dosage studies; trait manipulation	Testing phenotypic effects of specific homeologs; understanding gene retention patterns [74]
Expression Analysis	RNA-seq; qPCR; expression atlases	Homeolog expression divergence; subgenome dominance assessment	Identifying biased expression patterns; functional characterization of homeologs [71]

Implications for Crop Improvement

Understanding homeologous contributions has direct applications in crop improvement, particularly for enhancing disease resistance. The co-localization of NBS-encoding genes with known disease resistance QTLs in Brassica napus demonstrates how tracking subgenome origins can identify candidate genes for marker-assisted selection [69]. Similarly, the asymmetric evolution of NBS-encoding genes in Gossypium species provides insights for transferring resistance traits between cotton varieties [10].

The development of homeolog-specific gene editing systems in polyploid plants enables precise manipulation of agronomic traits without the limitations of traditional breeding. Successful examples in Tragopogon, wheat, and cotton demonstrate the feasibility of modifying specific homeologs to optimize gene dosage effects while maintaining desired traits from the other subgenome [74]. This approach is particularly valuable for manipulating disease resistance genes, where specific NBS gene homeologs may contribute differentially to pathogen recognition and defense activation.

Furthermore, understanding subgenome evolution informs strategies for wild relative introgression. Genomic studies in Gossypium have shown that recombination suppression in cultivated polyploids correlates with DNA hypermethylation and can be overcome by wild introgression [71]. This approach allows breeders to access valuable genetic diversity from wild relatives while maintaining the superior agricultural traits of cultivated varieties.

The resolution of homeologous contributions in allopolyploids has transformed from a theoretical challenge to a tractable research program with powerful tools and methodologies. The integration of bioinformatics approaches like AlloSHP with experimental validation through homeolog-specific editing provides a comprehensive framework for tracking parental subgenomes. Within the context of NBS gene expansion, these approaches reveal dynamic and often asymmetric evolutionary trajectories that significantly impact disease resistance profiles in polyploid crops. As these methodologies continue to advance, they promise to further illuminate the complex genomic interactions following polyploidization and accelerate the development of improved crop varieties with enhanced resilience to biotic stresses. The ongoing research in model systems like Brassica, Gossypium, and Coffea provides both fundamental insights into polyploid evolution and practical strategies for crop improvement.

The study of large gene families represents a significant computational and biological challenge in the field of genomics, particularly in plant species with complex genomes. Gene families such as the Nucleotide-Binding Site Leucine-Rich Repeat (NLR) family can contain thousands of members with diverse domain architectures and functional specializations. This challenge is further compounded in polyploid species, where genome duplication events create additional copies of genes that undergo complex evolutionary trajectories. Research on diploid and tetraploid cotton species (Gossypium spp.) has revealed substantial expansion and diversification of NLR genes, with recent studies identifying 12,820 NBS-domain-containing genes across 34 plant species, classified into 168 distinct domain architecture classes [2].

The analysis of these expansive gene families requires sophisticated computational approaches for identification, classification, and curation. In the context of a broader thesis on NBS gene expansion in diploid versus tetraploid plants, effective data management strategies become paramount. Studies comparing wild cotton diploids have demonstrated that different species employ divergent transcriptional cascades in response to environmental stresses like drought, highlighting the functional consequences of gene family diversification [75]. This technical guide provides a comprehensive framework for managing data from large gene families, with specific applications to NLR genes in cotton species, enabling researchers to extract meaningful biological insights from these complex datasets.

Foundational Concepts: NBS Gene Family Organization and Variation

The NLR gene family in plants is characterized by a modular domain structure typically consisting of three core components: an N-terminal domain (TIR, CC, or RPW8), a central NB-ARC/NACHT domain, and a C-terminal Leucine-Rich Repeat (LRR) region [2]. This basic architecture shows remarkable diversification across plant species, with expansions primarily occurring in flowering plants. Bryophytes like Physcomitrella patens possess relatively small NLR repertoires of approximately 25 genes, while surveyed angiosperm genomes can contain thousands of NLRs [2].

In cotton species, NBS genes exhibit substantial structural variation, including both classical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific structural patterns (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS) [2]. Comparative analyses of diploid and tetraploid chrysanthemum have revealed that organellar genome structure is generally conserved despite ploidy differences, though tetraploid accessions can contain unique sequences and previously undescribed open reading frames in their mitogenomes [37].

Table 1: NBS Gene Family Characteristics Across Plant Species

Species/Group	Genome Type	Approximate NBS Gene Count	Notable Features
Gossypium hirsutum (Cotton)	Tetraploid (AD1)	2,012 (based on wheat comparison)	Extensive diversification; response to CLCuD
Bryophytes	Diploid	~25	Small, ancestral NLR repertoires
Angiosperms	Various	Up to thousands	Substantial gene expansion
Gossypium species (multiple)	Diploid & Tetraploid	12,820 across 34 species	168 domain architecture classes

Computational Workflow for Gene Family Identification and Classification

Gene Identification and Domain Architecture Analysis

The initial step in managing large gene family data involves comprehensive identification of family members across species. For NBS gene identification, researchers have successfully employed PfamScan with the NB-ARC domain (PF00931) HMM profile using a conservative e-value cutoff of 1.1e-50 to ensure specificity [2]. This approach allows for the extraction of all genes containing the characteristic nucleotide-binding site domain from genomic datasets.

Following identification, domain architecture classification provides critical insights into functional diversification. A systematic classification approach groups genes with similar domain architectures into the same classes, enabling comparative analysis across species [2]. This method revealed significant diversity in NBS domain architectures among land plants, from classical structures to species-specific configurations. The resulting classification system facilitates evolutionary studies and functional comparisons by grouping genes with potentially similar molecular functions.

Table 2: Key Bioinformatics Tools for Gene Family Analysis

Tool Name	Primary Function	Application in NBS Gene Analysis
PfamScan/HMMER	Protein domain identification	NB-ARC domain detection using Pfam-A HMM models
OrthoFinder	Orthogroup inference	Clustering of NBS genes across multiple species
DIAMOND	Sequence similarity searches	Rapid comparison of NBS protein sequences
MCL	Clustering algorithm	Gene family sub-group identification
MAFFT	Multiple sequence alignment	Alignment of NBS protein sequences for phylogeny
FastTreeMP	Phylogenetic inference	Construction of gene trees for NBS genes

Orthology Inference and Evolutionary Analysis

To understand the evolutionary relationships within large gene families, orthology inference provides a critical framework. The application of OrthoFinder to NBS gene datasets has identified 603 orthogroups with both core (widely distributed) and unique (species-specific) patterns [2]. This analysis revealed evidence of tandem duplication events, a key mechanism driving NLR family expansion in plants.

The evolutionary analysis workflow typically involves:

Sequence similarity searching using DIAMOND for fast comparison of protein sequences
Clustering with the MCL algorithm to identify related sequences
Orthogroup inference using DendroBLAST to establish evolutionary relationships
Multiple sequence alignment with MAFFT for conserved motif identification
Phylogenetic tree construction using maximum likelihood methods in FastTreeMP with bootstrap support [2]

This pipeline enables researchers to distinguish between orthologous genes (derived from speciation events) and paralogous genes (derived from duplication events), providing insights into the evolutionary forces shaping gene family expansion in diploid versus tetraploid plants.

Diagram 1: Bioinformatics workflow for gene family analysis

Advanced Filtering and Curation Strategies

Criteria-Based Gene Selection Framework

Building on principles adapted from human genomic newborn screening programs, plant gene family curation can benefit from structured criteria for prioritizing biologically significant genes. The Screen4Care project developed a six-criteria framework for gene-disease pair selection that can be adapted for plant gene family curation [76]:

Treatability/Actionability: Existence of known interventions or biological significance
Clinical Validity: Strong evidence of gene-phenotype relationship (adapted for plant gene-function relationship)
Age of Onset: Early manifestation in development (for plant developmental traits)
Disease Severity: Impact on fitness or agricultural value
Penetrance: Consistency of phenotype expression
Technical Feasibility: Reliability of detection methods

In the Screen4Care project, application of this framework to 484 initial gene-disease pairs resulted in a final curated set of 245 genes after scoring and expert review [76]. This represents a rigorous approach to reducing false positives and focusing on the most biologically relevant candidates.

Expression-Based Filtering and Functional Annotation

Integration of transcriptomic data provides a powerful filtering criterion for prioritizing genes within large families. Studies in cotton species have employed RNA-seq analysis under various conditions to identify NBS genes with dynamic expression patterns. For example, research on diploid cotton species (G. arboreum, G. stocksii, and G. bickii) under drought stress revealed significant variation in responsive genes, with 3,052 up-regulated and 2,532 down-regulated genes in G. bickii alone, accounting for approximately 13% of the predicted proteome [75].

The functional annotation of curated gene sets can be enhanced through tools like the Database for Annotation, Visualization, and Integrated Discovery (DAVID), which provides comprehensive functional annotation tools to understand the biological meaning behind large gene lists [77]. DAVID integrates multiple sources of functional annotations and can identify enriched biological themes, particularly Gene Ontology terms, and cluster redundant annotation terms.

Diagram 2: Gene filtering and curation pipeline

Experimental Validation and Functional Characterization

Expression Profiling Across Species and Conditions

Comparative analysis of gene expression across diploid and tetraploid cotton species provides insights into the functional consequences of gene family expansion. Research has demonstrated that NBS gene expression patterns cluster more closely by species than by treatment conditions, emphasizing species-specific regulatory mechanisms [75]. For instance, hierarchical clustering analysis of orthologous gene groups revealed that expression patterns in each species under normal and stress conditions showed closer relationships with one another than with patterns of other species subjected to similar conditions [75].

Orthogroup-based expression analysis has identified conserved regulatory modules across species. In cotton NBS genes, OG2, OG6, and OG15 showed putative upregulation across different tissues under various biotic and abiotic stresses in both susceptible and tolerant accessions [2]. This conservation suggests core functions maintained across species despite overall diversification.

Functional Validation Using Genetic Tools

The ultimate test of gene family curation strategies comes from experimental validation of candidate genes. Several approaches have proven effective for functional characterization of NBS genes:

Virus-Induced Gene Silencing (VIGS) has been successfully employed to validate NBS gene function in resistant cotton. Silencing of GaNBS (OG2) demonstrated its putative role in virus tittering, confirming its importance in disease response pathways [2].

Genetic variation analysis between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified numerous unique variants in NBS genes, with 6,583 variants in Mac7 versus 5,173 variants in Coker312 [2]. These variants provide candidate polymorphisms underlying functional differences in disease response.

Protein interaction studies through protein-ligand and protein-protein interaction assays have revealed strong interactions between putative NBS proteins and ADP/ATP as well as core proteins of the cotton leaf curl disease virus [2], providing mechanistic insights into NBS protein function.

Table 3: Expression Analysis of NBS Genes in Cotton Under Stress

Species	Ploidy	Condition	Up-regulated DEGs	Down-regulated DEGs	Key Pathways Enriched
G. bickii	Diploid	Drought stress	3,052	2,532	Protein phosphorylation, dephosphorylation
G. arboreum	Diploid	Drought stress	4,484	Not specified	Response to auxin
G. stocksii	Diploid	Drought stress	2,147	Not specified	Ethylene & salicylic acid signaling
G. hirsutum	Tetraploid	Drought stress	Increasing over time	Increasing over time	Hormone signal transduction, photosynthesis

The Scientist's Toolkit: Essential Research Reagents and Solutions

Table 4: Research Reagent Solutions for Gene Family Analysis

Reagent/Resource	Function	Application Example
Pfam-A HMM models	Protein domain identification	NB-ARC domain detection in NBS genes
OrthoFinder	Orthogroup inference across species	Identifying core & species-specific NBS orthogroups
DAVID Bioinformatics	Functional annotation of gene lists	GO term enrichment for curated NBS genes
RNA-seq datasets	Expression profiling	Identifying stress-responsive NBS genes
VIGS vectors	Functional gene validation	Testing role of GaNBS in virus resistance
Franklin/VarSome tools	Variant interpretation	Classifying sequence variants in NBS genes
CottonFGD database	Species-specific genomic data	Accessing cotton NBS gene information

The management of data from large gene families requires an integrated approach combining sophisticated computational methods with rigorous experimental validation. The strategies outlined in this guide—from initial identification through domain analysis, orthology inference, expression-based filtering, and functional validation—provide a comprehensive framework for studying complex gene families like the NBS family in plants. The application of these methods to diploid and tetraploid cotton species has revealed both conserved and divergent evolutionary patterns, with implications for understanding plant immunity and stress response.

Future directions in gene family analysis will likely incorporate long-read sequencing to resolve complex genomic regions, single-cell transcriptomics to understand cell-type-specific expression patterns, and machine learning approaches to predict function from sequence and expression features. As these technologies mature, the strategies for filtering, classification, and curation will continue to evolve, enabling deeper insights into the functional significance of gene family expansion in plant evolution and adaptation.

Addressing Transcriptomic Asymmetry and Homeolog Expression Bias in Functional Studies

Whole-genome duplication, either within a species (autopolyploidy) or through hybridization between species (allopolyploidy), is a fundamental force in plant evolution and crop domestication. A key consequence of polyploidization is transcriptomic asymmetry, a phenomenon describing the non-equal expression of duplicated genes (homeologs) derived from different progenitor genomes. In the context of studying NBS (Nucleotide-Binding Site) gene expansion, understanding these expression dynamics is crucial for linking genetic changes to functional outcomes in disease resistance. This technical guide provides a comprehensive framework for addressing transcriptomic asymmetry and homeolog expression bias in functional studies, with specific application to NBS gene research in diploid and tetraploid plants.

The merger of two diverged genomes in allopolyploids creates intricate regulatory interactions that result in homeolog expression bias (the relative contribution of each homeolog to the transcriptome) and expression level dominance (where the total expression of both homeologs matches that of one progenitor) [78]. For researchers investigating the expansion of disease-resistant NBS genes, these transcriptional complexities present both challenges and opportunities for understanding how polyploid plants achieve enhanced pathogen resistance through duplicated gene networks.

Core Concepts and Definitions

Transcriptomic Asymmetry in Polyploid Plants

Transcriptomic asymmetry encompasses the unequal expression patterns between homeologous genes in polyploids. Recent studies on mangrove shrubs (Acanthus tetraploideus) revealed that approximately 22.87% of genes exhibited biased homeolog expression, with parental genetic legacy substantially influencing the reconfiguration of homeolog expression in the derived tetraploid [79]. This asymmetry arises from both immediate "transcriptome shock" following allopolyploidization and subsequent post-polyploid evolutionary processes that reshape gene expression networks.

Homeolog Expression Bias and Expression Level Dominance

Homeolog expression bias refers to the unequal contribution of the two homeologs to the total transcript pool, while expression level dominance describes the phenomenon where the total expression level of a homeolog pair in an allopolyploid matches that of only one of the two diploid parents [78]. Research in cotton has demonstrated that genome-wide expression level dominance can be biased toward one progenitor genome in diploid hybrids and natural allopolyploids, with the direction sometimes reversing in synthetic allopolyploids [78].

Table 1: Key Terminology in Polyploid Transcriptomics

Term	Definition	Research Significance
Homeolog	Homologous genes derived from different progenitor genomes in a polyploid	Fundamental unit of analysis in polyploid gene expression studies
Homeolog Expression Bias	Relative contribution of homeologs to the transcriptome	Reveals regulatory divergence and subfunctionalization
Expression Level Dominance	Total expression level of homeolog pair matches one progenitor	Indicates coordinated regulation and genome-wide dominance
Transcriptomic Asymmetry	Non-equal expression patterns between homeologous genes	Impacts phenotypic variation and adaptive potential
NBS Gene Expansion	Increase in nucleotide-binding site resistance genes through duplication	Provides raw material for evolution of disease resistance

Experimental Design and Methodologies

Genome-Wide Identification of NBS Genes

Comprehensive identification of NBS genes across diploid and tetraploid genomes forms the foundation for comparative transcriptomic studies. The methodology outlined below enables systematic characterization of this important gene family:

HMMER-based Domain Identification

Utilize HMMER software with the NB-ARC domain profile (PF00931) as query to scan proteomes
Apply stringent E-value cutoff (1.0) to identify candidate NBS-containing genes
Verify identified genes against Pfam database (E-value 10^-4) to confirm NBS domain presence
Classify genes into subfamilies (CNL, TNL, RNL) using NCBI Conserved Domain Database and Coiled-coil prediction tools [38] [2]

Classification and Structural Analysis

Identify domain architecture using PfamScan with default parameters
Detect TIR (PF01582), RPW8 (PF05659), and LRR (PF08191) domains
Predict coiled-coil domains using Coiledcoil with threshold value of 0.5
Analyze gene structure, conserved motifs, and chromosomal distribution [17]

This approach successfully identified 239 NBS-LRR genes across two tung tree genomes (Vernicia fordii and Vernicia montana), with 90 in the susceptible V. fordii and 149 in the resistant V. montana, revealing fundamental differences in NBS gene composition between diploid species with varying resistance phenotypes [38].

Transcriptomic Profiling Strategies

RNA-Seq Experimental Design

Sequence transcriptomes from multiple tissues, developmental stages, and stress conditions
Include biological replicates for statistical robustness (minimum n=3)
Consider temporal dynamics through time-series sampling
Account for tissue-specific expression patterns in experimental design [80] [81]

Recent research in citrus demonstrates the importance of sampling multiple tissues, with salt stress inducing distinct transcriptomic responses in leaves and roots of diploid and tetraploid genotypes [80]. Similarly, studies in wucai (Brassica campestris L.) revealed that differentially expressed genes between diploid and tetraploid plants showed stage-specific patterns across three developmental stages [81].

Library Preparation and Sequencing

Isolate high-quality RNA using TRIzol or column-based methods
Assess RNA integrity (RIN > 8.0) before library construction
Prepare stranded mRNA-seq libraries using polyA selection
Sequence on Illumina platforms with sufficient depth (≥30 million reads per sample)
Include both diploid and tetraploid genotypes under identical conditions [80]

Homeolog-Specific Expression Quantification

Bioinformatic Pipeline for Homeolog Resolution

Map RNA-Seq reads to a combined reference genome containing both subgenomes
Use alignment tools with high sensitivity (STAR, HISAT2)
Employ homeolog-specific read assignment tools (e.g., PolyCat, HomeoRoq)
Quantify expression using count-based methods (featureCounts)
Normalize using TMM or similar methods to account for compositional biases [78]

Statistical Analysis of Expression Bias

Test for significant homeolog expression bias using binomial tests
Assess expression level dominance through comparison with parental expression
Correct for multiple testing using FDR control (Benjamini-Hochberg)
Perform cluster analysis to identify co-regulated gene groups [79] [78]

In allopolyploid cotton, RNA-Seq analysis revealed that genome-wide expression level dominance was biased toward the A-genome in diploid hybrids and natural allopolyploids, while the direction reversed in synthetic allopolyploids, highlighting the dynamic nature of transcriptomic regulation following polyploidization [78].

Data Analysis and Visualization Framework

Quantitative Profiling of NBS Gene Expression

Analysis of NBS gene expression in polyploids requires specialized approaches to resolve homeolog-specific contributions. The following table summarizes key metrics and methods for quantifying expression patterns:

Table 2: Analytical Framework for NBS Gene Expression in Polyploids

Analysis Type	Key Metrics	Tools/Methods	Interpretation
Homeolog Expression Bias	Bias ratio, Statistical significance	Binomial test, Beta-binomial GLM	Direction and magnitude of homeolog preference
Expression Level Dominance	Dominance direction, Magnitude	ANOVA, Linear contrasts	Coordinated regulation across homeologs
Differential Expression	Fold-change, FDR	DESeq2, edgeR	Stress-responsive gene identification
Co-expression Networks	Module eigengenes, Connectivity	WGCNA, mutual rank	Regulatory relationships and hubs
Variant Analysis	SNP/InDel frequency, Impact	GATK, SnpEff	Structural and regulatory variation

Research in tung trees demonstrated the power of comparative analysis, revealing that the orthologous gene pair Vf11G0978-Vm019719 exhibited distinct expression patterns in susceptible (V. fordii) and resistant (V. montana) species, with the resistant ortholog showing upregulated expression during pathogen challenge [38].

Visualization Strategies for Transcriptomic Data

Effective visualization is essential for interpreting complex transcriptomic data. The following DOT script generates a workflow diagram for analyzing NBS gene expression in polyploids:

Diagram 1: Experimental workflow for polyploid NBS gene expression analysis.

For visualizing expression patterns across multiple samples and conditions, the following DOT script generates a heatmap representation:

Diagram 2: Expression heatmap visualization concept for NBS genes.

Functional Validation of NBS Genes in Polyploid Systems

Virus-Induced Gene Silencing (VIGS) Protocols

VIGS provides a powerful approach for functional validation of NBS genes in polyploid plants. The methodology below has been successfully applied in resistant tung trees (Vernicia montana) to confirm the role of specific NBS genes in disease resistance:

VIGS Vector Construction

Clone 200-300 bp gene-specific fragment into TRV-based vector (pTRV1/pTRV2)
Select fragment with minimal off-target potential (BLAST against host genome)
Include empty vector and non-related gene inserts as controls
Verify construct by sequencing and restriction digestion [38]

Plant Inoculation and Phenotyping

Grow plants to 2-4 leaf stage under controlled conditions
Inoculate using Agrobacterium-mediated infiltration (OD600 = 0.5-1.0)
Monitor silencing efficiency by qRT-PCR 2-3 weeks post-inoculation
Challenge with pathogen and assess disease symptoms
Measure pathogen biomass by qPCR of pathogen-specific genes [38]

Application of this protocol in Vernicia montana demonstrated that Vm019719, a NBS-LRR gene activated by VmWRKY64, confers resistance to Fusarium wilt. Silencing of this gene compromised resistance, validating its functional role in disease defense [38].

Analysis of Regulatory Variants

Identification of regulatory variants affecting NBS gene expression represents a critical component of functional studies:

Promoter Analysis

Isolate promoter regions (1-2 kb upstream) of target NBS genes
Identify cis-regulatory elements using PLACE and PlantCARE databases
Compare sequence variation between diploid progenitors and homeologs
Test for presence/absence of specific regulatory motifs [38]

In susceptible Vernicia fordii, the allelic counterpart (Vf11G0978) of the resistance gene Vm019719 exhibited an ineffective defense response due to a deletion in the promoter's W-box element, highlighting how regulatory variants can underlie expression differences and functional divergence [38].

Case Studies and Applications

NBS Gene Expansion and Expression in Polyploid Plants

Research in cotton provides compelling evidence for the dynamic evolution of NBS genes following polyploidization. A comprehensive analysis across land plants identified 12,820 NBS-domain-containing genes across 34 species, with several classical and species-specific structural patterns [2]. In tetraploid cottons, NBS genes exhibit complex expression patterns including homeolog expression bias and expression level dominance, contributing to novel resistance phenotypes.

Studies of Gossypium hirsutum accessions with varying susceptibility to cotton leaf curl disease (CLCuD) revealed substantial genetic variation in NBS genes, with the tolerant accession (Mac7) containing 6583 unique variants compared to 5173 in the susceptible variety (Coker312) [2]. This variation provides the raw material for evolutionary innovation in pathogen recognition and defense signaling.

Transcriptomic Asymmetry in Relation to Stress Adaptation

Recent investigations in citrus demonstrate how polyploidization enhances stress tolerance through transcriptomic reprogramming. Tetraploid citrus genotypes exhibit enhanced salt stress tolerance associated with upregulation of genes involved in sugar biosynthesis, transport management, cell wall remodeling, hormone signaling, enzyme regulation, and antioxidant metabolism [80]. Notably, salt stress induced overexpression of carbohydrate biosynthesis and cell wall remodeling-related genes specifically in tetraploid Cleopatra mandarin (CL4x), suggesting ploidy-specific transcriptional responses [80].

Similarly, in wucai (Brassica campestris L.), tetraploid plants exhibited enhanced photosynthetic capacity, with 36.76%, 34.48%, and 32.99% more chlorophyll a, chlorophyll b, and total chlorophyll than diploid plants, respectively [81]. These physiological advantages were underpinned by transcriptomic changes, with differentially expressed genes in tetraploids specifically enriched in starch and sucrose metabolism, pentose and glucuronate interconversions, and ascorbate and aldarate metabolism [81].

Research Reagent Solutions

Table 3: Essential Research Reagents for Polyploid NBS Gene Studies

Reagent/Tool	Specification	Application	Example Use
HMMER Software	Version 3.3.2	Domain-based identification of NBS genes	Identifying NB-ARC domains (PF00931) in proteomes [38] [2]
TRV VIGS Vectors	pTRV1, pTRV2	Functional validation of NBS genes	Silencing Vm019719 in Vernicia montana [38]
RNA-Seq Kit	Illumina TruSeq Stranded mRNA	Transcriptome library preparation	Profiling diploid and tetraploid citrus under salt stress [80]
OrthoFinder	Version 2.5.1	Evolutionary analysis of NBS genes	Orthogroup analysis across 34 plant species [2]
DESeq2	Version 1.40+	Differential expression analysis	Identifying salt-responsive genes in citrus polyploids [80]
Circos	Version 0.69+	Genomic data visualization	Visualizing NBS gene distribution across chromosomes [56]

The study of transcriptomic asymmetry and homeolog expression bias in polyploid plants provides fundamental insights into the evolutionary dynamics of duplicated genomes, with particular relevance for understanding the expansion and functional diversification of NBS disease resistance genes. The methodologies outlined in this technical guide—from genome-wide identification of NBS genes to functional validation using VIGS—provide a comprehensive framework for investigating these complex transcriptional patterns.

Future research directions should include single-cell transcriptomics to resolve homeolog expression at cellular resolution, spatial transcriptomics to understand tissue-specific bias, and integrated multi-omics approaches to connect transcriptional asymmetry with epigenetic regulation, protein abundance, and metabolic outputs. As these technologies advance, our understanding of how polyploid plants leverage transcriptomic asymmetry to enhance adaptive potential, particularly through the expansion and diversification of NBS gene families, will continue to deepen, providing novel strategies for crop improvement through manipulation of ploidy and gene expression networks.

Overcoming Limitations of Short-Read Sequencing for Complex NBS Gene Clusters

Nucleotide-binding site (NBS) genes constitute one of the largest families of plant disease resistance (R) genes, playing a crucial role in immune responses against pathogens [2]. These genes are characterized by a conserved NBS domain and are frequently organized in tandemly arrayed clusters across plant genomes, creating regions of high sequence homology that present significant challenges for genomic characterization [10] [2]. The evolutionary history of NBS-encoding genes reveals dynamic patterns of expansion and contraction, often through gene duplication and loss events, which contribute to their complex architecture [82]. In plant species, the distribution of NBS-encoding genes among chromosomes is nonrandom and uneven, with a strong tendency to form clusters [10]. This complex genomic architecture poses substantial obstacles for short-read sequencing technologies, which struggle to accurately resolve highly homologous regions due to mapping ambiguities [83].

The limitations of short-read sequencing become particularly problematic in the context of diploid versus tetraploid plant research, where distinguishing between homeologous loci in polyploid genomes adds another layer of complexity. Studies in cotton species have revealed that allotetraploid plants inherited NBS-encoding genes asymmetrically from their diploid progenitors, with G. hirsutum inheriting more genes from G. arboreum (A-genome) and G. barbadense inheriting more from G. raimondii (D-genome) [10]. This asymmetric evolution may explain differential disease resistance to pathogens like Verticillium wilt, highlighting the importance of accurately characterizing these gene families [10].

Technical Limitations of Short-Read Sequencing Platforms

Mapping and Assembly Challenges in Homologous Regions

Short-read sequencing technologies (e.g., Illumina) face fundamental limitations when applied to complex NBS gene clusters due to their limited read lengths relative to the size of repetitive regions and highly homologous sequences [83] [84]. When short reads are generated from homologous gene clusters, they often cannot be uniquely mapped to a reference genome, leading to mismapping, coverage gaps, and false variant calls [83]. This problem is exacerbated in polyploid genomes where homeologous genes further complicate accurate read assignment.

Research has demonstrated that homologous genomic regions significantly affect short-read mapping of genes, with the degree and length of homology being key factors impacting mapping success [83]. A study simulating 50 genomes from diverse populations identified widespread homology, with 525 matches of exonic regions to other genomic areas when applying stringent filters [83]. The study further identified 17 genes as particularly problematic for short-read mapping, with four genes (SMN1, SMN2, CBS, and CORO1A) exhibiting low-coverage regions within exons across all read lengths tested due to their high degree of similarity to other genomic regions [83].

Consequences for Gene Family Analysis

The technical limitations of short-read sequencing have direct implications for characterizing NBS gene family expansions:

Collapsed representations: Tandemly duplicated genes are often underrepresented or collapsed into consensus sequences in assemblies [84].
Incomplete gene models: Highly homologous regions result in fragmented gene models or missing genes [84].
Biased copy number estimation: Gene copy number variations (CNVs) are systematically underestimated due to mapping ambiguities [84].
Impaired phylogenetic inference: Inaccurate sequence assemblies lead to incorrect evolutionary relationships [2] [82].

Table 1: Impact of Read Length on Mapping Accuracy and Coverage Across NBS Genes

Read Length (bp)	Correctly Mapped Reads (%)	Average Depth of Coverage	Standard Deviation of Coverage	Genes with Low Depth Regions (<20X)
75	>99%	Lower	Higher	43
100	>99%	Moderate	Moderate	43
150	>99%	Higher	Lower	43
250	>99%	Highest	Lowest	8

Data adapted from simulation studies on NBS genes [83]. While all read lengths achieved >99% correctly mapped reads, longer reads significantly improved coverage consistency and reduced the number of genes with problematic regions.

Comparative genomic studies in Sapindaceae species revealed dramatic variation in NBS-encoding gene counts (X. sorbifolium: 180, A. yangbiense: 252, D. longan: 568), which could only be accurately resolved using approaches capable of distinguishing highly similar gene copies [82]. Similarly, research in Ipomoea species identified between 554-889 NBS-encoding genes across four species, with 76-90% of these genes occurring in clusters [61]. Such complex genomic arrangements are particularly challenging for short-read technologies.

Advanced Methodologies for Resolving Complex NBS Clusters

Hybrid Assembly Approaches

The Alpaca pipeline (ALLPATHS and Celera Assembler) represents a sophisticated hybrid approach that combines 20X long-read coverage with approximately 50X short-insert and 50X long-insert short-read coverage [84]. This method leverages the complementary strengths of different sequencing technologies: long reads provide scaffold information spanning repetitive regions, while short reads contribute high base-level accuracy. The Alpaca workflow involves several key steps:

Long-read correction: Base-call-corrected long reads are used for contig formation to preclude collapse of tandem repeats [84].
Contig formation: Celera Assembler generates unitigs from Illumina short-insert paired ends [84].
Mapping and correction: Unitigs are mapped to raw long reads with Nucmer, and long read base calls are corrected with ECTools [84].
Consensus polishing: Highly accurate short reads are used to polish assembly consensus sequences [84].

In comparative assessments on the rice genome, Alpaca demonstrated superior performance compared to other assembly protocols, showing the most reference agreement and repeat capture [84]. When evaluated against the rice Nipponbare reference, Alpaca generated contigs with NG50 of 67 Kbp and scaffolds with NG50 of 255 Kbp, outperforming ALLPATHS-LG (21 Kbp and 192 Kbp, respectively) [84]. Most importantly, Alpaca provided 88% reference coverage at 99% identity, compared to 82% for ALLPATHS-LG, and reduced the alignment span excess (indicative of collapsed repeats) from 46 Kbp to 35 Kbp [84].

Figure 1: Hybrid sequencing workflow overcoming short-read limitations in complex NBS gene clusters.

Targeted Enrichment and Sequencing Strategies

For projects focusing specifically on NBS gene families, targeted sequencing approaches offer a cost-effective alternative to whole-genome sequencing. The BabyDetect study implemented a targeted gene panel sequencing workflow that incorporated strict quality control thresholds for sequencing, coverage, and contamination [85]. Key aspects of their approach included:

Panel design: A custom target panel covering 359-405 genes for 126-165 diseases was designed using Twist Bioscience technology [85].
Capture optimization: The panel design focused on coding regions and intron-exon boundaries (~50 base pairs from intronic borders), excluding deep intronic regions, promoters, UTRs, and homopolymeric regions to improve capture efficiency [85].
Library preparation: High-performing probes were selected for target enrichment, with approximately 1.5 Mb targeted for capture and sequencing [85].

This targeted approach demonstrated that gene panel sequencing-based NBS is feasible, accurate, and scalable, addressing critical gaps in characterization of these complex genomic regions [85].

Bioinformatic Improvements for Variant Calling

Specialized bioinformatic pipelines can partially mitigate limitations of short-read data for NBS gene analysis. The Humanomics pipeline (v3.15) utilizes multiple algorithms optimized for different aspects of variant detection [85]:

BWA-MEM for read mapping to a reference genome
elPrep for read filtering and duplicate removal
HaplotypeCaller for variant detection
GenotypeGVCFs for producing variant-calling format files

This pipeline specifically targets single-nucleotide polymorphisms (SNPs) and short insertions and deletions (indels) within exons or at intron-exon boundaries [85]. However, it's important to note that such pipelines typically do not call copy-number variants (CNVs), large deletions, mosaicism, or other structural variants without additional validation [85].

Table 2: Experimental Protocols for Characterizing Complex NBS Gene Clusters

Method	Key Steps	Applications	Limitations
Hybrid Assembly (Alpaca)	1. 20X PacBio long-read coverage2. 50X short-insert & 50X long-insert Illumina reads3. Long-read correction with short reads4. Contig formation with Celera Assembler5. Scaffolding with ALLPATHS-LG [84]	De novo genome assemblyCNV detection in tandem arraysPopulation structural variation studies	Higher computational requirementsMore expensive than short-read onlyOptimized for 20X long-read coverage
Targeted Enrichment Sequencing	1. Custom panel design (1.5 Mb target)2. Probe-based capture (Twist Bioscience)3. Illumina sequencing (2×75 bp or 2×100 bp)4. Variant calling with specialized pipeline [85]	High-depth sequencing of specific gene familiesPopulation screeningClinical diagnostics	Limited to targeted regionsDesign challenges for novel genesCapture efficiency variability
Comparative Phylogenomics	1. HMM-based gene identification (Pfam NB-ARC domain)2. OrthoFinder for orthogroup analysis3. Maximum likelihood phylogenetics4. Synteny analysis [2]	Evolutionary studiesDiversification patternsSelection pressure analysis	Dependent on genome assembly qualityComputationally intensive for large families

Research Reagent Solutions for NBS Gene Studies

Table 3: Essential Research Reagents and Platforms for NBS Gene Analysis

Reagent/Platform	Specific Application	Function in Experimental Workflow
PacBio Long-Read Sequencing	Genome assembly spanning repetitive regions	Provides long reads (10-kb+) to connect tandem gene clusters and resolve haplotypes [84]
Illumina Short-Read Sequencing	High-accuracy base calling	Corrects long-read errors; provides high-confidence variant calls in unique regions [84]
Twist Bioscience Target Enrichment	Focused NBS gene capture	Enables deep sequencing of specific gene families; reduces costs compared to WGS [85]
QIAsymphony SP/ DNA Investigator Kit	Automated DNA extraction from dried spots	Standardizes nucleic acid isolation from precious samples (e.g., dried blood spots) [85]
BWA-MEM Aligner	Short-read mapping to reference	Aligns sequencing reads to reference genomes; handles small indels [85]
GATK HaplotypeCaller	Variant discovery	Identifies SNPs and indels using local de novo assembly [86]
OrthoFinder	Evolutionary analysis	Determens orthogroups and gene families across multiple species [2]

Evolutionary Insights from Resolved NBS Gene Clusters

Application of these advanced methodologies has revealed previously inaccessible patterns of NBS gene evolution in diploid and tetraploid plants. Genomic analyses in Gossypium species demonstrated that allotetraploid cotton inherited NBS-encoding genes asymmetrically from its diploid progenitors, with G. hirsutum inheriting more genes from G. arboreum (A-genome) while G. barbadense inherited more from G. raimondii (D-genome) [10]. This asymmetric evolution may explain differential disease resistance, as G. raimondii and G. barbadense show greater resistance to Verticillium wilt compared to G. arboreum and G. hirsutum [10].

Furthermore, studies have revealed that structural architectures, amino acid sequence similarities, and synteny of NBS-encoding genes were highest between G. arboreum and G. hirsutum, and between G. raimondii and G. barbadense, indicating distinct evolutionary trajectories following polyploidization [10]. The TNL subclass of NBS genes appears to have a significant role in disease resistance to Verticillium wilt in G. raimondii and G. barbadense, with the percentage of TNL genes being approximately 7 times higher in these species compared to their susceptible counterparts [10].

Figure 2: Evolutionary patterns of NBS genes in diploid and tetraploid cotton species, showing asymmetric inheritance contributing to differential disease resistance.

Comprehensive analyses across land plants have identified 12,820 NBS-domain-containing genes across 34 species, classified into 168 distinct classes with several novel domain architecture patterns [2]. Orthogroup analysis revealed 603 orthogroups with some core (e.g., OG0, OG1, OG2) and unique (e.g., OG80, OG82) orthogroups showing evidence of tandem duplications [2]. Expression profiling demonstrated putative upregulation of OG2, OG6, and OG15 orthogroups in different tissues under various biotic and abiotic stresses in cotton accessions with contrasting susceptibility to cotton leaf curl disease [2].

The limitations of short-read sequencing for complex NBS gene clusters are no longer insurmountable barriers to research. Hybrid approaches combining long-read and short-read technologies, complemented by advanced bioinformatic pipelines, now enable comprehensive characterization of these challenging genomic regions. The resulting insights have profound implications for understanding plant genome evolution, particularly the differential expansion of NBS gene families in diploid versus tetraploid plants and their contributions to disease resistance phenotypes.

Future developments in sequencing technologies, particularly improvements in long-read accuracy and read length, coupled with reduced costs, will further enhance our ability to resolve complex genomic regions. Additionally, emerging algorithms specifically designed for complex gene families and pangenome approaches will provide more comprehensive views of NBS gene diversity across plant populations. These advances will accelerate crop improvement programs by enabling precise manipulation of disease resistance genes and informed selection of optimal gene combinations for durable pathogen resistance.

For researchers investigating NBS gene expansion in diploid versus tetraploid plants, a strategic combination of hybrid sequencing for reference-quality assemblies followed by targeted sequencing for population-level studies represents the current gold-standard approach. This methodology successfully addresses the fundamental limitations of short-read sequencing while providing the comprehensive data needed to unravel the evolutionary dynamics of these critical plant immune genes.

Evidence and Evolution: Validating NBS Expansion Patterns Across Plant Lineages

The nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family constitutes one of the largest classes of plant disease resistance (R) genes, exhibiting remarkable diversity in size across plant lineages. This case study examines the extreme expansion of NBS genes in diploid apple (Malus domestica) compared to the limited numbers in cucurbit species, framing this divergence within broader patterns of R-gene evolution in diploid versus tetraploid plants. Through comparative genomics, phylogenetic analysis, and evaluation of evolutionary pressures, we elucidate the lineage-specific duplications and contrasting evolutionary trajectories that shape plant immune system architecture. Our analysis reveals that diploid Rosaceae species, particularly apple, have undergone significant NBS-LRR expansion through recent, lineage-specific duplications, while cucurbit genomes display extensive gene loss and limited diversification. These patterns provide crucial insights for researchers leveraging genomic approaches to enhance disease resistance in crop species.

Plant resistance (R) genes encoding nucleotide-binding site-leucine-rich repeat (NBS-LRR) proteins play a critical role in the innate immune system, mediating specific recognition of pathogen effectors and activation of defense responses [87]. The NBS-LRR gene family is divided into subclasses based on N-terminal domains: TIR-NBS-LRR (TNL) with Toll/interleukin-1 receptor domains, CC-NBS-LRR (CNL) with coiled-coil domains, and RPW8-NBS-LRR (RNL) with resistance to powdery mildew 8 domains [14]. All three subclasses are present in dicots, while monocots typically lack TNL genes [48].

NBS-LRR genes are evolving rapidly in plants, with significant variation in family size across species [88]. This diversity arises from dynamic processes of gene duplication and loss, driven by co-evolutionary arms races with pathogens [2]. Recent genome sequencing initiatives have enabled comparative analyses revealing striking disparities in NBS-LRR content between plant families. The Rosaceae, particularly diploid apple, exhibits extreme gene expansion, while cucurbit species show notably contracted NBS-LRR repertoires [87].

Understanding these divergent evolutionary patterns provides fundamental insights into plant-pathogen coevolution and informs strategies for engineering durable disease resistance in crops. This case study examines the genomic and evolutionary basis for NBS-LRR expansion in diploid apple versus contraction in cucurbits, with implications for R-gene discovery and breeding across plant lineages.

Comparative Analysis of NBS-LRR Gene Repertoires

Extreme Expansion in Diploid Apple

The diploid apple genome (Malus x domestica) harbors an extensive complement of NBS-LRR genes, consistent with the "continuous expansion" pattern observed across Maleae species [87]. Genome-wide analyses of Rosaceae species reveal dynamic evolution of NBS-LRR genes, with apple exhibiting one of the largest repertoires among documented species.

Table 1: NBS-LRR Gene Counts in Rosaceae Species

Species	Genome Type	Total NBS-LRR Genes	CNL Genes	TNL Genes	RNL Genes
Malus x domestica (Apple)	Diploid	~500-600 [88]	70,737 [2]	18,707 [2]	1,847 [2]
Fragaria vesca (Strawberry)	Diploid	144 [47]	Not specified	Not specified	Not specified
Prunus persica (Peach)	Diploid	~150 [87]	Not specified	Not specified	Not specified
Rosa chinensis	Diploid	"Continuous expansion" pattern [87]	Not specified	Not specified	Not specified

The ANNA (Angiosperm NLR Atlas) database documents 91,291 NBS-LRR genes across 304 angiosperm genomes, including 70,737 CNL genes, 18,707 TNL genes, and 1,847 RNL genes in apple, representing one of the largest repertoires among surveyed species [2]. This expansion reflects frequent lineage-specific duplication events preceding species diversification within Rosaceae.

Contracted NBS-LRR Repertoires in Cucurbits

In contrast to apple, cucurbit species (cucumber, melon, and watermelon) exhibit significantly contracted NBS-LRR gene families. Frequent lineage-specific gene losses and deficient gene duplications dominate NBS-LRR evolution in Cucurbitaceae, resulting in low copy numbers [87].

Table 2: NBS-LRR Gene Counts in Cucurbit Species

Species	Genome Type	Total NBS-LRR Genes	Evolutionary Pattern
Cucumber	Diploid	~50-80	"Contracting" pattern [87]
Melon	Diploid	~50-80	"Contracting" pattern [87]
Watermelon	Diploid	~50-80	"Contracting" pattern [87]

The limited NBS-LRR diversity in cucurbits reflects a different evolutionary trajectory compared to Rosaceae, with gene loss outweighing duplication events. This contraction may influence host-pathogen interaction dynamics and disease resistance mechanisms in these species.

Evolutionary Patterns and Mechanisms

Divergent Evolutionary Trajectories

Comparative analyses across plant families reveal distinct evolutionary patterns for NBS-LRR genes:

Rosaceae Evolutionary Patterns:

Apple and other Maleae species: "Early sharp expanding to abrupt shrinking" pattern [87]
Rosa chinensis: "Continuous expansion" pattern [87]
Fragaria vesca: "Expansion followed by contraction, then further expansion" pattern [87]

Cucurbitaceae Evolutionary Pattern:

Cucumber, melon, watermelon: "Contracting" pattern dominated by frequent lineage losses and deficient gene duplications [87]

Other Plant Families:

Solanaceae: Variable patterns within family - "consistent expansion" in potato, "expansion followed by contraction" in tomato, "shrinking" in pepper [87]
Fabaceae: "Consistently expanding" pattern in Medicago truncatula, pigeon pea, common bean, and soybean [87]

Gene Duplication Mechanisms

NBS-LRR gene family expansion primarily occurs through duplication mechanisms:

Lineage-specific duplications occurring before species divergence significantly contribute to NBS-LRR expansion. In Fragaria species, phylogenetic analyses reveal extremely short branch lengths and shallow nodes, indicating recent duplication events [47]. Similar patterns are observed in apple, where numerous tandemly arranged NBS-LRR genes form complex clusters across the genome.

Selective Pressures and Functional Diversification

The evolution of NBS-LRR genes is driven by contrasting selective pressures:

Diversifying selection: Acts on LRR domains involved in pathogen recognition, promoting amino acid diversity to recognize evolving pathogen effectors [47]
Purifying selection: Maintains conserved NBS domain structure required for nucleotide binding and hydrolyzation [2]

Analyses of synonymous (Ks) and nonsynonymous (Ka) substitution rates reveal significantly higher Ka/Ks ratios for TNL genes compared to non-TNL genes in Fragaria, suggesting TNLs evolve more rapidly under stronger diversifying selection [47]. This differential evolution may contribute to subfamily-specific expansion patterns.

Experimental Approaches for NBS-LRR Gene Identification

Genome-Wide Identification Pipeline

Detailed Methodologies

BLAST and HMMER Searches:

Use NB-ARC domain (PF00931) as query sequence for TBLASTN against whole-genome coding sequences with E-value ≤ 10⁻⁴ [47]
Perform HMMER search with NB-ARC HMM profile against whole-genome protein sequences using default parameters [87]
Merge hits from both approaches and eliminate redundancies

Domain Validation and Classification:

Validate NB-ARC and LRR domains using Pfam (http://pfam.xfam.org/) and SMART (http://smart.embl-heidelberg.de/) [47]
Classify genes into TNL, CNL, or RNL subfamilies based on N-terminal domains (TIR, CC, or RPW8) using Pfam and COILS [47]
Confirm classification using conserved residues in kinase-2 motif (aspartate in TNLs, tryptophan in CNLs) [89]

Evolutionary and Phylogenetic Analyses:

Identify gene families using all-versus-all BLASTN with coverage >60% and identity >60% [47]
Calculate nonsynonymous (Ka) and synonymous (Ks) substitution rates using MEGA software [47]
Construct phylogenetic trees with Maximum Likelihood method using FastTree or similar software [48]

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents for NBS-LRR Gene Studies

Reagent/Resource	Function	Example Use
NB-ARC HMM Profile (PF00931)	Identification of NBS domains in protein sequences	Initial discovery of NBS-encoding genes [87]
Pfam Database	Protein family validation and domain architecture analysis	Confirming presence of NB-ARC, TIR, CC, RPW8, LRR domains [2]
COILS Software	Prediction of coiled-coil domains	Distinguishing CNL from other NBS-LRR subclasses [47]
MEME Suite	Motif-based sequence analysis	Identifying conserved motifs in NBS-LRR proteins [89]
OrthoFinder	Orthogroup inference and comparative genomics	Determining evolutionary relationships among NBS-LRR genes [2]
MEGA Software	Evolutionary genetics analysis	Calculating Ka/Ks ratios and phylogenetic reconstruction [47]
Plant Genomic DNA Kit	High-quality DNA extraction	Preparing templates for NBS-LRR gene amplification [89]

Implications for Crop Improvement and Disease Resistance

The divergent evolutionary patterns between apple and cucurbits have practical implications for crop improvement strategies. In apple, with its expanded NBS-LRR repertoire, resistance breeding can leverage naturally occurring R-gene diversity through marker-assisted selection and gene pyramiding [89]. The extensive duplication events have created a rich source of genetic variation for pathogen recognition specificities.

In cucurbits, with limited NBS-LRR diversity, alternative approaches may be necessary, including:

Interspecific hybridization to introduce R-genes from wild relatives
Engineered disease resistance using synthetic NBS-LRR genes
Focusing on non-NBS-LRR resistance mechanisms, such as receptor-like kinases or defense pathway engineering

Understanding these genomic differences helps researchers prioritize strategies based on the genetic architecture of target species. For species with expanded NBS-LRR families, mining natural diversity is often productive, while species with contracted families may benefit from transgenic approaches or manipulation of downstream signaling components.

This case study highlights the extreme divergence in NBS-LRR gene family evolution between diploid apple and cucurbit species. Apple exemplifies the "expansion" trajectory with lineage-specific duplications creating a large, diverse R-gene repertoire, while cucurbits demonstrate the "contraction" trajectory with limited diversity resulting from gene loss and deficient duplication. These patterns reflect distinct evolutionary responses to pathogen pressure and have profound implications for disease resistance mechanisms and breeding strategies.

Future research should focus on functional characterization of expanded NBS-LRR genes in apple to identify specificities against economically important pathogens, and development of innovative resistance strategies for cucurbits that compensate for their limited R-gene diversity. The continuing decline of sequencing costs and advancement of gene editing technologies will enable more comprehensive comparative studies and targeted manipulation of NBS-LRR genes across crop species.

The evolutionary history of the Brassicaceae family has been profoundly shaped by polyploidization events, which provide raw genetic material for diversification and adaptation. This technical review examines the post-polyploidization dynamics of Nucleotide-Binding Site (NBS)-encoding genes, the primary class of plant disease resistance (R) genes in Brassica species. Following the Brassiceae-lineage-specific whole-genome triplication (WGT) event approximately 15.9 million years ago, Brassica genomes underwent extensive diploidization through asymmetric gene loss, fractionation, and neofunctionalization. We synthesize current genomic evidence demonstrating how differential retention patterns, evolutionary rates, and functional diversification of NBS-encoding genes have contributed to pathogen resistance mechanisms in extant Brassica species. This analysis frames NBS gene evolution within the broader context of plant polyploid genomics, providing insights for crop improvement strategies in Brassica vegetables and oilseeds.

The Brassica genus represents a premier model system for studying the effects of polyploidy on genome evolution and gene family dynamics. Brassica species, including important vegetable and oilseed crops, have experienced recursive whole-genome duplication (WGD) events, with the most recent being a lineage-specific whole-genome triplication (WGT) that occurred after the divergence of the Arabidopsis and Brassica lineages [90] [91]. This WGT event was followed by a process of diploidization, involving massive but selective gene loss, genome rearrangement, and functional divergence of retained genes [91].

Among the various gene families affected by polyploidization, NBS-encoding genes represent a critical component of the plant innate immune system. These genes typically encode proteins containing a nucleotide-binding site (NBS) domain and often C-terminal leucine-rich repeats (LRRs), which function in pathogen recognition and defense signal transduction [10] [60]. Based on their N-terminal domains, NBS-encoding genes are classified into TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), and RNL (RPW8-NBS-LRR) subtypes [10] [2].

This review integrates findings from multiple genome-wide studies to elucidate the mechanisms governing NBS gene loss, retention, and neofunctionalization following polyploidization in Brassica, with implications for understanding disease resistance evolution in polyploid crops.

Genomic Context: Brassica Polyploid Evolution

Sequence of Polyploidization Events

Brassica species share two ancient paleopolyploidy events (α and β) with other eudicots, plus a more recent Brassiceae-lineage-specific WGT. The Brassica triplication event has been dated to approximately 15.9 million years ago (MYA), with subsequent speciation leading to diploid Brassica species (B. rapa, B. oleracea) around 4.6 MYA [91]. Genomic analysis reveals that the triplicated genomes experienced differential fractionation, leading to the establishment of three subgenomes with distinct gene retention patterns: LF (Least Fractionated), MF1 (Medium Fractionated), and MF2 (Most Fractionated) [91] [92].

Genomic Stability and Rearrangement

Comparative genomic analyses between B. rapa and B. oleracea reveal abundant genome rearrangement following WGT, resulting in complex mosaics of triplicated ancestral genomic blocks [91]. Despite these rearrangements, synteny analysis has demonstrated extensive collinearity between homologous genomic regions, enabling detailed studies of gene loss and retention patterns [92]. The extent of genome restructuring varies between Brassica species, with B. oleracea exhibiting greater transposable element accumulation compared to B. rapa [91].

NBS Gene Loss and Retention Patterns

Differential Retention Following Whole-Genome Triplication

Genome-wide analyses demonstrate that NBS-encoding genes were subject to substantial loss following the Brassica WGT. Using Arabidopsis thaliana as a reference, studies have examined the loss/retention of orthologous NBS-encoding loci in the tripled Brassica rapa genome, discovering differential loss/retention frequencies across syntenic regions [93].

Table 1: NBS-Encoding Gene Retention Patterns in Brassica Species

Species	Total NBS Genes	CNL	TNL	RNL	Other Types	Reference
B. oleracea	157	28.1%	2.0%	1.2%	68.7%	[60]
B. rapa	206	29.32%	13.70%	0.82%	56.16%	[60]
A. thaliana	167	Not specified	Not specified	Not specified	Not specified	[60]

The "Other Types" category includes NBS genes lacking complete domain structures (N, NL, TN, CN, etc.). The differential retention of TNL genes between B. oleracea and B. rapa is particularly noteworthy, suggesting species-specific evolutionary paths following their divergence from a common ancestor.

Evolutionary Patterns of Retained Loci

Research by Wu et al. (2014) classified retained NBS-encoding loci into three categories based on retention frequency: Class I (single locus retention), Class II (two retained loci), and Class III (three retained loci) [93]. These classes exhibit distinct evolutionary patterns:

Multi-loci classes (II and III): Show sharper expansions through tandem duplications, faster evolutionary rates, and greater potential for association with novel gene functions
Single-locus class (I): Demonstrates opposite patterns with limited expansion and slower evolutionary rates [93]

Phylogenetic analyses indicate that recombination and translocation events were common among multi-loci in B. rapa, contributing to their differential evolutionary patterns compared to single-loci [93].

Asymmetric Evolution and Subgenome Bias

Following polyploidization, NBS-encoding genes exhibit asymmetric evolution between and within Brassica genomes. Comparative analysis of B. rapa and B. oleracea reveals differential gene loss and retention between subgenomes, with the LF subgenome generally retaining more genes compared to the MF1 and MF2 subgenomes [91]. This biased fractionation has implications for the genomic distribution of NBS-encoding genes and their associated functions.

Mechanisms of NBS Gene Diversification

Tandem Duplications

Following the initial post-polyploidization gene loss, NBS-encoding genes in Brassica species experienced species-specific gene amplification primarily through tandem duplication. This phenomenon has been particularly important for the expansion of specific NBS gene subfamilies after the divergence of B. rapa and B. oleracea [60]. The distribution of NBS-encoding genes among chromosomes is non-random and uneven, with genes frequently organized in clusters [10] [60].

Table 2: Duplication Patterns of NBS-Encoding Genes in Brassica Species

Species	Tandem Duplications	Segmental Duplications	Transposition Events	Key References
B. oleracea	Significant	Limited (post-WGT)	Evidence of TE-mediated	[60]
B. rapa	Significant	Limited (post-WGT)	Evidence of TE-mediated	[93] [60]
B. napus	Observed	Extensive from allopolyploidy	Not specified	[92]

Domain Architecture Variation

NBS-encoding genes in Brassica exhibit considerable diversity in their domain architectures. Beyond the canonical CNL, TNL, and RNL types, numerous variant forms have been identified, including:

TN (TIR-NBS)
CN (CC-NBS)
NL (NBS-LRR)
N (NBS-only) [10] [60]

This architectural diversity results from domain loss, fusion, and rearrangement events, potentially generating novel functions and specificities in the Brassica lineage following polyploidization.

Differential Expression and Alternative Splicing

Studies of NBS-encoding orthologous gene pairs between B. oleracea and B. rapa indicate differential expression patterns of retained copies [60]. Additionally, genome annotation of B. oleracea identified 13,032 genes producing alternative splicing variants, with intron retention and exon skipping as common mechanisms [91]. These regulatory mechanisms contribute to functional diversification of NBS-encoding genes following polyploidization.

Experimental Methodologies for NBS Gene Analysis

Genome-Wide Identification and Annotation

Protocol 1: Identification of NBS-Encoding Genes

Data Acquisition: Obtain genome assemblies and annotation files from relevant databases (BRAD, Bolbase, Phytozome) [60]
HMMER Search: Employ HMMER software (v3.0+) with Pfam NBS (NB-ARC) domain profile (PF00931) using "trusted cutoff" as threshold [10] [60]
Domain Confirmation: Identify additional domains (TIR, CC, RPW8, LRR) using HMMPfam, HMMSmart, and coiled-coil prediction tools (PAIRCOIL2, MARCOIL) with appropriate probability thresholds [60]
Manual Curation: Implement manual inspection and quality control to generate final set of NBS candidate genes [60]

Protocol 2: Evolutionary and Phylogenetic Analysis

Sequence Alignment: Perform multiple sequence alignment using CLUSTALW or MAFFT [60] [2]
Phylogenetic Reconstruction: Construct trees using maximum likelihood algorithms (FastTreeMP) with bootstrap validation [2]
Orthogroup Delineation: Identify orthogroups using OrthoFinder with DIAMOND for sequence similarity searches and MCL for clustering [2]
Syntery Analysis: Identify syntenic regions using MCscan with collinearity thresholds [10] [91]

Functional Validation Approaches

Protocol 3: Expression and Functional Analysis

Transcriptome Profiling: Analyze RNA-seq data from various tissues and stress conditions to determine expression patterns [60] [2]
qRT-PCR Validation: Design gene-specific primers and perform quantitative reverse-transcription PCR under pathogen challenge conditions [61]
VIGS Assays: Implement Virus-Induced Gene Silencing to test functional roles of candidate NBS genes in resistance [2]
Genetic Transformation: Conduct stable transformation for overexpression or knockout of target NBS genes

Figure 1: Experimental Workflow for NBS Gene Analysis. This diagram outlines the key methodological approaches for identifying, analyzing, and validating NBS-encoding genes in Brassica species.

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents for NBS Gene Studies

Reagent/Resource	Function/Application	Example Sources/References
Genome Databases	Access to genomic sequences and annotations	BRAD, Bolbase, Phytozome, TAIR [60]
HMMER Software	Identification of NBS domains using profile hidden Markov models	HMMER v3.0+ with Pfam NBS domain (PF00931) [10] [60]
OrthoFinder	Orthogroup inference and comparative genomics	OrthoFinder v2.5.1 with DIAMOND and MCL [2]
RNA-seq Datasets	Expression analysis under various conditions	GEO accession numbers GSE43245, GSE42891 [60]
VIGS Vectors	Functional validation through gene silencing	TRV-based vectors for Brassica [2]
qRT-PCR Reagents	Expression validation of candidate NBS genes	SYBR Green, gene-specific primers [61]

Implications for Disease Resistance and Crop Improvement

Association with Disease Resistance Variation

The evolutionary dynamics of NBS-encoding genes following polyploidization have direct implications for disease resistance in Brassica crops. Comparative studies in cotton (Gossypium species) provide a parallel example, where asymmetric evolution of NBS-encoding genes helps explain differential resistance to Verticillium wilt [10]. Specifically, G. raimondii and G. barbadense possess higher proportions of TNL genes and demonstrate greater resistance compared to susceptible species with fewer TNL genes [10].

In Brassica, the retention and diversification of specific NBS gene classes have likely contributed to the evolution of pathogen recognition specificities. The concentration of NBS-encoding genes in clusters facilitates the generation of diversity through unequal crossing over and gene conversion, potentially enabling rapid adaptation to evolving pathogen populations [93] [92].

Breeding Applications

Understanding post-polyploidization NBS gene dynamics provides valuable insights for Brassica crop improvement:

Marker Development: Identification of evolutionary patterns informs selection of durable resistance alleles
Interspecific Hybridization: Knowledge of subgenome biases guides effective gene transfer between species
Pyramiding Strategies: Understanding genomic distribution aids in combining multiple R genes
Engineering Applications: Diversity mechanisms inspire synthetic biology approaches for novel resistance

The post-polyploidization dynamics of NBS-encoding genes in Brassica exemplify the complex interplay between genome duplication, gene family evolution, and functional specialization. The differential retention, asymmetric evolution, and diversification mechanisms documented in Brassica species highlight the importance of polyploidy as a source of genetic novelty for plant immunity.

Future research directions should include:

Pan-genome analyses to capture full NBS gene diversity across Brassica germplasm
Single-cell expression profiling to understand spatial regulation of retained paralogs
Structural biology approaches to elucidate functional consequences of sequence diversification
Advanced genome editing to test evolutionary hypotheses and engineer improved resistance

The investigation of NBS gene evolution in Brassica not only advances our understanding of plant genome plasticity but also provides practical knowledge for developing durable disease resistance in economically important crops.

Plant nucleotide-binding site (NBS) genes constitute a major line of defense against pathogens, with their expression and genetic variation playing a pivotal role in disease resistance. This technical guide delves into the comparative analysis of NBS orthologs in cotton accessions with varying susceptibility to cotton leaf curl disease (CLCuD) and Verticillium wilt. We present genomic and transcriptomic evidence demonstrating how divergent expression patterns and sequence variations in core orthogroups underlie differential disease responses. The systematic profiling of 12,820 NBS genes across 34 plant species revealed significant expansion in flowering plants, with 168 distinct domain architecture patterns identified. Our analysis specifically highlights the role of tandem duplications and species-specific structural variations in shaping the NBS repertoire of resistant and susceptible cotton genotypes, providing a framework for leveraging these genetic elements in resistance breeding programs.

The evolution of disease resistance in plants is intrinsically linked to the expansion and diversification of nucleotide-binding site (NBS) encoding genes. These genes represent one of the largest superfamilies of plant resistance (R) genes, playing crucial roles in pathogen recognition and defense activation [94]. In the context of cotton species, the divergence between diploid and tetraploid genomes has created a complex landscape for NBS gene evolution, with significant implications for disease resistance.

Plant genomes exhibit a remarkable abundance of duplicate genes, with an average of 65% of annotated genes in plant genomes having a duplicate copy [40]. This duplication predominance stems primarily from whole-genome duplication (WGD) events, which have occurred multiple times over the past 200 million years of angiosperm evolution, in contrast to the more ancient WGD events in vertebrate lineages [40]. The tetraploid cotton species Gossypium hirsutum and G. barbadense originated from interspecific hybridization between A-genome species G. arboreum and D-genome species G. raimondii, resulting in significant expansion of their NBS gene repertoires [10].

Comparative genomic analyses have revealed striking asymmetries in NBS gene inheritance and evolution between tetraploid cotton species and their diploid progenitors. Allotetraploid cottons inherited NBS genes disproportionately from their diploid ancestors, with G. hirsutum inheriting more genes from G. arboreum, and G. barbadense inheriting more from G. raimondii [10]. This asymmetric evolution correlates with observed disease resistance patterns, as G. raimondii and G. barbadense demonstrate superior resistance to Verticillium wilt compared to the more susceptible G. arboreum and G. hirsutum [10].

Results: Genomic and Expression Divergence in NBS Orthologs

Genomic Architecture and Distribution of NBS Genes

Our genome-wide comparative analysis identified fundamental disparities in NBS gene composition across cotton species. The enumeration of NBS-encoding genes revealed 246 in G. arboreum, 365 in G. raimondii, 588 in G. hirsutum, and 682 in G. barbadense [10]. The distribution of these genes among chromosomes was nonrandom and uneven, with a strong tendency to form clusters, a characteristic arrangement for rapidly evolving gene families involved in plant-pathogen arms races [10].

Table 1: NBS Gene Distribution and Classification in Cotton Species

Species	Ploidy	Total NBS Genes	CNL (%)	TNL (%)	RNL (%)	N (%)	NL (%)
G. arboreum	Diploid (A)	246	32.52%	2.03%	1.22%	23.98%	21.54%
G. raimondii	Diploid (D)	365	29.32%	13.70%	0.82%	16.99%	24.38%
G. hirsutum	Allotetraploid (AD)	588	28.06%	0.85%	1.02%	28.57%	26.19%
G. barbadense	Allotetraploid (AD)	682	20.97%	6.45%	1.32%	25.07%	30.79%

Structural analysis revealed significant divergence in NBS gene architectures between resistant and susceptible genotypes. The TNL (TIR-NBS-LRR) subclass was particularly noteworthy, with G. raimondii and G. barbadense possessing substantially higher proportions of TNL genes (13.70% and 6.45%, respectively) compared to G. arboreum and G. hirsutum (2.03% and 0.85%, respectively) [10]. This distribution suggests TNL genes may play a significant role in Verticillium wilt resistance, which aligns with the observed resistance patterns in these species.

Orthogroup Analysis and Expression Profiling

Orthogroup (OG) analysis of NBS genes across 34 plant species identified 603 orthogroups, with both core (widely conserved) and unique (species-specific) orthogroups [94]. Expression profiling demonstrated putative upregulation of specific orthogroups (OG2, OG6, and OG15) in different tissues under various biotic and abiotic stresses in susceptible and tolerant cotton accessions responding to cotton leaf curl disease [94].

Genetic variation analysis between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified several unique variants in NBS genes, with Mac7 exhibiting 6,583 variants compared to 5,173 in Coker312 [94]. This substantial variation in the tolerant accession highlights the potential role of sequence polymorphisms in conferring disease resistance.

Table 2: Expression Patterns of Key NBS Orthogroups in Cotton Disease Response

Orthogroup	Expression Pattern	Stress Conditions	Putative Function	Validation Approach
OG2	Upregulated in tolerant genotypes	Biotic (CLCuD) and abiotic stresses	Putative role in virus tittering	VIGS silencing confirmed function
OG6	Differential expression in response to stresses	Biotic and abiotic stresses	Disease resistance signaling	Expression profiling
OG15	Tissue-specific expression patterns	Various biotic stresses	Pathogen recognition	Transcriptomic analysis

Functional Validation of NBS Genes in Disease Resistance

Functional studies using virus-induced gene silencing (VIGS) demonstrated the critical role of specific NBS genes in disease resistance. Silencing of GaNBS (OG2) in resistant cotton compromised its defense mechanism, demonstrating its putative role in virus tittering against cotton leaf curl disease [94]. This functional validation underscores the importance of specific orthogroups in mediating resistance responses.

Heterologous expression of the cotton NBS-LRR gene GbaNA1 in Arabidopsis thaliana conferred Verticillium wilt resistance and enabled the recovery of resistance in mutant lines that had lost the function of the GbaNA1 ortholog [95]. Investigations into the defense response mechanism revealed that GbaNA1 mediates resistance through enhanced production of reactive oxygen species (ROS) and potentiation of the ethylene signaling pathway [95]. Importantly, the G. hirsutum ortholog GhNA1 contains a premature termination that renders it non-functional, providing a molecular explanation for the susceptibility of certain cotton varieties to Verticillium wilt [95].

Methods: Experimental Protocols for NBS Gene Analysis

Genome-Wide Identification and Classification of NBS Genes

Protocol 1: Identification of NBS-Domain-Containing Genes

Data Collection: Obtain latest genome assemblies from publicly available databases (NCBI, Phytozome, Plaza) [94]. For cotton, include both diploid (G. arboreum, G. raimondii) and allotetraploid (G. hirsutum, G. barbadense) species.
Domain Screening: Use PfamScan.pl HMM search script with default e-value (1.1e-50) using background Pfam-A_hmm model to identify genes containing NB-ARC domains [94].
Filtering Criteria: Consider all genes having NB-ARC domain as NBS genes and filter for further analysis.
Architecture Classification: Identify additional associated decoy domains through domain architecture analysis following established classification methods [94]. Group similar domain-architecture-bearing genes into the same classes.
Comparative Analysis: Perform comprehensive comparison of classes among target species to identify conserved and species-specific structural patterns.

Protocol 2: Evolutionary Analysis and Orthogrouping

Sequence Comparison: Use OrthoFinder v2.5.1 package with DIAMOND tool for fast sequence similarity searches among NBS sequences [94].
Gene Clustering: Apply MCL clustering algorithm to identify gene families and orthogroups.
Phylogenetic Reconstruction: Perform multiple sequence alignment using MAFFT 7.0 and construct gene-based phylogenetic trees by maximum likelihood algorithm in FastTreeMP with 1000 bootstrap value [94].
Duplication Analysis: Identify tandem and segmental duplications through genomic position analysis and synteny mapping.

Expression and Functional Characterization

Protocol 3: Transcriptomic Analysis of NBS Genes

Data Retrieval: Obtain RNA-seq data from specialized databases (IPF database, Cotton Functional Genomics Database, Cottongen database) and NCBI BioProjects [94].
Expression Quantification: Extract FPKM values using gene accession as query IDs. Categorize data into:
- Tissue-specific (leaf, stem, flower, pollen, endosperm, seed)
- Abiotic stress-specific (dehydration, cold, drought, heat, dark, osmotic, salt, wounding)
- Biotic-stress specific (pathogen challenges) [94]
Differential Expression: Identify significantly differentially expressed NBS genes between susceptible and tolerant accessions under stress conditions.
Co-expression Analysis: Construct gene co-expression networks to identify potential regulatory relationships.

Protocol 4: Functional Validation through Genetic Approaches

Virus-Induced Gene Silencing (VIGS):
- Design specific constructs targeting candidate NBS genes
- Infect resistant cotton plants with VIGS constructs
- Challenge silenced plants with pathogens
- Monitor disease development and quantify pathogen load [94]
Heterologous Expression:
- Clone full-length NBS genes into appropriate expression vectors
- Transform susceptible plant systems (Arabidopsis thaliana)
- Evaluate complementation of resistance phenotype
- Characterize defense responses (ROS production, defense gene expression) [95]

Diagram 1: Experimental workflow for comprehensive analysis of NBS genes in cotton, encompassing genomic identification, expression profiling, and functional validation.

Table 3: Key Research Reagent Solutions for NBS Gene Analysis

Category	Specific Tool/Reagent	Function/Application	Example Use Case
Bioinformatics Tools	HMMER 3.1b2 with Pfam NB-ARC domain (PF00931)	Identification of NBS-encoding genes in genome assemblies	Initial domain screening in cotton genomes [10]
	OrthoFinder v2.5.1 with DIAMOND	Orthogroup analysis and evolutionary relationships	Identifying core and species-specific orthogroups [94]
	MAFFT 7.0 & FastTreeMP	Multiple sequence alignment and phylogenetic reconstruction	Constructing NBS gene phylogenies [94]
Genomic Resources	Cotton genome assemblies (G. hirsutum, G. barbadense, G. arboreum, G. raimondii)	Reference sequences for comparative genomics	Evolutionary analysis between diploid and tetraploid cottons [10]
	Cotton transcriptome databases (IPF, CottonFGD, Cottongen)	Expression data for diverse tissues and stress conditions	Expression profiling of NBS orthologs [94]
Functional Validation Tools	VIGS (Virus-Induced Gene Silencing) constructs	Transient silencing of candidate NBS genes	Functional testing of GaNBS (OG2) in cotton [94]
	Heterologous expression systems (Arabidopsis thaliana)	Functional complementation assays	Validating GbaNA1 resistance function [95]
Pathogen Resources	Verticillium dahliae isolates	Fungal pathogen for resistance assays	Verticillium wilt resistance testing [95]
	Cotton leaf curl virus isolates	Viral pathogen for resistance screening	CLCuD response evaluation [94]

Discussion: Implications for Cotton Improvement and Disease Resistance

The comparative analysis of NBS orthologs in susceptible and tolerant cotton accessions provides compelling evidence for the role of specific NBS gene classes and expression patterns in disease resistance. The significant divergence in TNL gene representation between resistant and susceptible genotypes, with G. raimondii and G. barbadense possessing substantially higher proportions of TNL genes, suggests this subclass may be particularly important for Verticillium wilt resistance [10]. This finding is further supported by the observation that TNL genes generally exhibit higher evolutionary rates (Ka/Ks values) compared to non-TNL genes, indicating stronger selective pressures and potentially more rapid adaptation to pathogens [96].

The asymmetric evolution of NBS-encoding genes in allotetraploid cottons, with G. hirsutum inheriting more NBS genes from the susceptible G. arboreum and G. barbadense inheriting more from the resistant G. raimondii, provides a genomic explanation for their differential disease responses [10]. This inheritance pattern highlights the importance of considering progenitor contributions in polyploid crop improvement programs.

From a practical breeding perspective, the identification of core orthogroups (OG2, OG6, OG15) with differential expression in tolerant accessions under biotic stress provides valuable targets for marker-assisted selection [94]. The successful validation of GaNBS (OG2) function through VIGS demonstrates the potential of targeting specific orthologs for genetic engineering of resistant varieties.

Diagram 2: NBS-mediated defense signaling pathway. NBS receptor activation triggers conformational changes that initiate multiple defense responses, including ROS production and ethylene signaling, ultimately leading to disease resistance.

Future research directions should focus on elucidating the specific pathogen effectors recognized by these NBS orthologs and developing precision breeding strategies that pyramid multiple resistance orthologs to create durable resistance. The integration of functional haplotype analysis [97] with expression studies offers promising approaches for identifying superior NBS alleles for crop improvement. As genomic resources continue to expand, particularly for non-model plants with unique resistance profiles [17], our understanding of NBS gene evolution and function will continue to deepen, enabling more effective strategies for enhancing disease resistance in cotton and other crops.

This technical guide has synthesized current knowledge on NBS ortholog expression divergence in cotton, highlighting the complex evolutionary dynamics between diploid and tetraploid species and their implication for disease resistance. The integration of genomic, transcriptomic, and functional data provides a comprehensive framework for understanding how sequence variation, gene expression patterns, and specific NBS subclasses contribute to resistance mechanisms. The methodologies and resources outlined herein offer researchers a roadmap for conducting similar analyses in other crop systems, ultimately contributing to the development of more resistant varieties through informed breeding and genetic engineering strategies.

The study of subgenome dynamics in allopolyploid plants reveals fundamental evolutionary processes that govern genome organization and gene expression. This whitepaper examines the contrasting patterns of subgenome dominance and equivalence in two distinct allopolyploid systems: mangrove shrubs from the Acanthus genus and allopolyploid rice (Oryza). Focusing on Nucleotide-Binding Site (NBS) encoding genes—a critical class of disease resistance genes—we analyze how different allopolyploid lineages manage genomic conflicts following whole-genome duplication. Recent transcriptomic and genomic evidence demonstrates that while some allopolyploids exhibit pronounced subgenome dominance with biased gene expression and fractionation, others maintain balanced subgenome equivalence. These patterns have significant implications for understanding how polyploid plants adapt to environmental stresses and develop disease resistance mechanisms. The findings presented herein contribute to a broader thesis on NBS gene expansion in diploid versus tetraploid plants, offering insights for researchers investigating plant genomics, evolutionary biology, and disease resistance breeding.

Allopolyploidization, the process combining whole-genome duplication with interspecific hybridization, has been a major driving force in plant evolution and speciation. Most angiosperms have undergone at least one polyploidization event in their evolutionary history, with over 15% of extant angiosperm species being of recent polyploid origin [52]. When divergent genomes merge in a common nucleus, they undergo complex reorganization processes that can lead to two primary outcomes: subgenome dominance or subgenome equivalence.

Subgenome dominance occurs when one of the constituent genomes exerts greater influence on the transcriptome, exhibiting higher gene retention rates and expression levels, while the other subgenome experiences more gene loss and silencing [98] [39]. In contrast, subgenome equivalence describes systems where both subgenomes contribute more equally to the transcriptome without clear dominance patterns [52] [99]. The NBS-LRR gene family, which encodes numerous plant disease resistance (R) proteins, provides an excellent model for studying these dynamics due to its rapid evolution and importance in plant-pathogen interactions [39] [2] [100].

Subgenome Equivalence in Allotetraploid Mangroves

Genomic and Transcriptomic Evidence from Acanthus tetraploideus

The mangrove shrub Acanthus tetraploideus represents a compelling case of subgenome equivalence. Recent genomic analyses reveal that this tetraploid species originated from hybridization between the diploid species A. ilicifolius and A. ebracteatus, followed by whole-genome duplication [52]. Molecular dating indicates these diploid progenitors diverged approximately 9.59 million years ago (Mya), providing substantial evolutionary time for genomic differentiation before hybridization [52] [79].

Table 1: Genomic Features of Allotetraploid Acanthus tetraploideus and Its Diploid Progenitors

Feature	A. tetraploideus (Tetraploid)	A. ilicifolius (Diploid)	A. ebracteatus (Diploid)
Ploidy Level	4x	2x	2x
Phylogenetic Relationship	Hybrid descendant	Parental species	Parental species
Homeolog Clustering Ratio	~1:1 with both progenitors	N/A	N/A
Genes with Homeolog Expression Bias	22.87%	N/A	N/A
Nucleotide Sequence Similarity	High similarity to both progenitors	Reference	Reference

Transcriptomic analyses demonstrate that homeologous sequences in A. tetraploideus cluster preferentially with A. ilicifolius and A. ebracteatus in an approximately 1:1 ratio, indicating balanced contributions from both subgenomes [52]. High sequence similarity and shared homologous polymorphisms between the tetraploid and its putative diploid progenitors further support a recent allopolyploid origin without evident subgenome dominance [52] [79].

NBS Gene Expression Patterns in Equilibrated Subgenomes

Analysis of homeolog expression bias in A. tetraploideus reveals that only 22.87% of genes exhibit biased homeolog expression, significantly lower than the 67.66% observed in synthetic hybrids [52]. This general attenuation of homeolog expression divergence in natural tetraploids suggests evolutionary progression toward subgenome equilibration. The expression patterns show remarkable retention of parental expression dominance, where the transcriptional legacy of diploid progenitors is largely maintained in the derived tetraploid [52].

Notably, unbiased genes in A. tetraploideus are enriched in fundamental cellular processes, while novelly biased genes often relate to chromosome dynamics and cell cycle regulation [52]. This functional partitioning may represent an adaptive mechanism for stabilizing polyploid genomes, supporting the species' establishment and long-term ecological success in challenging mangrove ecosystems.

Subgenome Dominance in Allotetraploid Cereals

Genomic Evidence from Allotetraploid Rice

In contrast to mangrove systems, allopolyploid cereals often exhibit clear subgenome dominance. Genomic studies of neo-tetraploid rice lines reveal complex reorganization patterns following whole-genome duplication [101]. Population structure analyses based on whole-genome resequencing data classify neo-tetraploid rice lines into distinct subpopulations, with specific clustering patterns reflecting their genomic relationships to indica and japonica subspecies [101].

Table 2: Genomic Variation in Neo-Tetraploid Rice Lines

Variation Type	Count in Tetraploid Rice	Comparative Features
Total SNPs	66.9 million (against MSU7 reference)	0.21-3.50 million variations per individual
Moderate-to-High Effect Variations	0.79 million (10.61% of total)	Affect protein coding sequences
Variation Density	501.01 variations per 100 Kb (avg. in NTRs)	Lower diversity regions on Chr5 and Chr6
Specific Alleles	Novel SNP in HSP101 exon (named HSP101-1)	Conserved in all NTRs, absent in ATRs and databases

Genomic analyses of neo-tetraploid rice have identified specific genomic variations, including a novel SNP in the first exon of HSP101, a heat-inducible gene [101]. This allele, named HSP101-1, is conserved across all neo-tetraploid rice lines but absent in autotetraploid rice and public databases, indicating subgenome-specific evolutionary trajectories [101].

Asymmetric NBS Gene Evolution in Allotetraploid Cotton

Although not directly in Oryza, studies of allotetraploid cotton (Gossypium species) provide relevant insights into cereal subgenome dominance patterns. Genomic analyses reveal asymmetric evolution of NBS-encoding genes in allotetraploid cottons [39]. G. hirsutum inherits more NBS-encoding genes from its A-genome progenitor (G. arboreum), while G. barbadense inherits more from its D-genome progenitor (G. raimondii) [39].

Table 3: NBS Gene Distribution in Allotetraploid Cotton Species

NBS Gene Type	G. arboreum (A-genome)	G. raimondii (D-genome)	G. hirsutum (Allotetraploid)	G. barbadense (Allotetraploid)
CN/CNL/N Genes	Higher proportion (74.39%)	Lower proportion (56.99%)	Higher proportion (similar to A-genome)	Lower proportion (similar to D-genome)
TNL Genes	Lower proportion	~7x higher proportion	Lower proportion	Higher proportion
RN/RNL Genes	Relatively unchanged	Relatively unchanged	Relatively unchanged	Relatively unchanged

This asymmetric distribution correlates with disease resistance phenotypes. G. raimondii and G. barbadense, which possess higher proportions of TNL-type NBS genes, demonstrate greater resistance to Verticillium wilt compared to G. arboreum and G. hirsutum [39]. The TNL genes show the greatest percentage changes (approximately 7-fold) between the diploid progenitors and their respective allotetraploid descendants, suggesting their significant role in subgenome-specific disease resistance [39].

Comparative Analysis: Mangrove vs. Cereal Systems

Factors Influencing Divergent Evolutionary Trajectories

The contrast between subgenome equivalence in mangroves and subgenome dominance in cereals reveals several influencing factors:

Evolutionary Age: Acanthus tetraploideus represents a relatively recent allopolyploid, where subgenome equilibration may still be ongoing [52]. In contrast, many cereal polyploids have undergone longer evolutionary periods allowing for dominance patterns to emerge.
Genomic Shock Response: The "transcriptome shock" following allopolyploidization triggers extensive reorganization [52] [99]. Mangroves appear to attenuate this shock through balanced expression, while cereals exhibit more asymmetric responses.
Ecological Pressures: Mangroves inhabit extreme environments with high salinity, hypoxia, and UV radiation [52] [102]. Maintaining genetic diversity through subgenome equivalence may enhance adaptive potential in these challenging ecosystems.
Breeding History: Cultivated cereals have undergone intensive artificial selection, potentially accelerating subgenome dominance through human-directed breeding practices [101] [100].

Implications for NBS Gene Evolution and Disease Resistance

The different subgenome dynamics have significant consequences for NBS gene evolution:

In balanced systems like Acanthus, NBS genes from both subgenomes remain available for plant defense, potentially broadening the spectrum of pathogen recognition [52]. In dominant systems like cotton, NBS gene repertoires become specialized according to their dominant subgenome inheritance patterns, potentially leading to lineage-specific resistance capabilities [39].

These patterns inform breeding strategies for crop improvement. Understanding subgenome dominance can guide selection for desirable resistance genes, while knowledge of equilibration mechanisms may help maintain genetic diversity in breeding programs.

Experimental Approaches and Methodologies

Genomic and Transcriptomic Workflows

Figure 1: Experimental workflow for analyzing subgenome dominance and NBS gene expression in allopolyploids

The Scientist's Toolkit: Essential Research Reagents and Tools

Table 4: Essential Research Reagents and Computational Tools for Subgenome Analysis

Tool/Reagent Category	Specific Examples	Function/Application
Sequencing Technologies	PacBio Sequel, Illumina HiSeq, Hi-C	Genome assembly, variant detection, chromatin interaction analysis [52] [98] [101]
Ploidy Verification	Flow cytometry, K-mer analysis (Smudgeplot, GenomeScope)	Confirmation of ploidy level, genome size estimation [52] [98]
Subgenome Assignment Tools	SubPhaser, Allo4D, Hi-C clustering	Distinguishing subgenomes in allopolyploids [98] [102]
Expression Analysis	RNA-seq, OrthoFinder, DIAMOND	Homeolog expression quantification, orthogroup clustering [52] [2]
Variant Detection	SnpEff, HMMER, custom pipelines	SNP, InDel, and structural variation identification [39] [101] [100]
NBS Gene Identification	PfamScan, NB-ARC domain HMM models	Annotation of NBS-LRR gene family members [39] [2] [100]

Detailed Methodological Protocols

Transcriptome Sequencing and Analysis for Homeolog Expression Bias

For transcriptome studies, researchers typically extract high-quality RNA from multiple biological replicates, followed by library preparation and Illumina sequencing [52]. The resulting reads are processed through quality control pipelines before mapping to reference genomes. For homeolog-specific expression analysis, specialized pipelines distinguish reads originating from different subgenomes based on single nucleotide polymorphisms [52] [79].

The analytical workflow involves:

Homeolog Identification: Phylotranscriptomic analysis clusters homeologous sequences with their respective progenitors [52].
Expression Quantification: Read counts are normalized (e.g., FPKM) for each homeolog [52] [2].
Bias Calculation: Homeolog expression bias is quantified using ratios of expression levels between subgenomes [52].
Statistical Testing: Significance thresholds (e.g., |log2(fold-change)| ≥ 1) identify genes with significant expression bias [52].

NBS Gene Identification and Classification

Protocols for NBS gene analysis include:

HMMER Search: Genome/proteome scans using hidden Markov models (e.g., PF00931 NB-ARC domain) with stringent e-value cutoffs (1.1e-50) [39] [2].
Domain Architecture Analysis: Identification of additional domains (TIR, CC, RPW8, LRR) using Pfam or InterProScan [39] [2].
Classification System: Categorization into structural classes (CN, CNL, TNL, RNL, etc.) based on domain combinations [39] [2].
Phylogenetic Analysis: Construction of gene trees to elucidate evolutionary relationships and origins [39].

The investigation of subgenome dominance versus equivalence in allopolyploid plants reveals diverse evolutionary strategies for managing genomic conflicts following whole-genome duplication. Mangrove systems like Acanthus tetraploideus demonstrate subgenome equivalence with balanced contributions from both parental genomes, while cereal systems often exhibit subgenome dominance with asymmetric gene expression and evolution.

For NBS disease resistance genes, these dynamics significantly impact plant defense capabilities. Balanced systems maintain diverse resistance gene repertoires, while dominant systems develop specialized resistance profiles based on their dominant subgenome inheritance patterns. These findings advance our understanding of polyploid evolution and provide practical insights for crop improvement strategies, particularly in developing disease-resistant varieties through targeted manipulation of subgenome-specific genes.

Future research directions should include:

Expanded comparative genomics across diverse allopolyploid systems
Functional validation of subgenome-specific NBS genes through gene editing
Investigation of epigenetic mechanisms regulating subgenome expression
Integration of pan-genome approaches to capture full NBS gene diversity
Exploration of connections between subgenome architecture and environmental adaptation

These efforts will further elucidate the complex interplay between polyploid genome evolution and disease resistance mechanisms, ultimately contributing to more sustainable agricultural practices and enhanced crop resilience.

Nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes constitute the cornerstone of the plant immune system, encoding a major class of disease resistance (R) proteins that detect diverse pathogens. The genomic architecture and evolutionary dynamics of these genes are fundamental to understanding plant-pathogen co-evolution. This review synthesizes recent genomic studies to explore the paradoxical existence of both conserved, syntenic regions and highly plastic, rapidly evolving genomic hotspots harboring NBS-encoding genes. Framed within the context of NBS gene expansion in diploid versus polyploid plants, we examine how selective pressures—including purifying selection, balancing selection, and tandem duplication events—sculpt these genomic landscapes. The analysis reveals that allopolyploid species often exhibit asymmetric evolution of NBS genes, with subgenomes from different diploid progenitors contributing unequally to disease resistance phenotypes. This comprehensive synthesis provides a framework for leveraging comparative genomics to identify durable resistance genes and accelerate crop improvement.

Plant genomes are dynamic entities where evolutionary forces create a mosaic of stable and plastic regions. Among the most variable components are nucleotide-binding site (NBS) encoding genes, which play a crucial role in plant immunity by recognizing pathogen effectors and initiating defense responses [103] [2]. These genes typically encode proteins with an N-terminal signaling domain (such as TIR, CC, or RPW8), a central NBS domain involved in nucleotide binding and activation, and a C-terminal LRR domain responsible for pathogen recognition [2] [60]. The NBS domain contains several conserved motifs including P-loop, kinase-2, kinase-3a, GLPL, and MHDL, which facilitate its role as a molecular switch in defense signaling [10].

The distribution of NBS-encoding genes across plant genomes is notably nonrandom and uneven, with genes frequently organized in clusters [103] [61]. This organization creates genomic regions with distinct evolutionary dynamics: some exhibit remarkable conservation across millions of years of evolution (conserved synteny), while others display exceptional plasticity, undergoing rapid expansion, contraction, and diversification. Understanding the tension between these conserved and plastic genomic regions is essential for deciphering the evolutionary arms race between plants and their pathogens.

This review examines the interplay between synteny and macroevolution in shaping NBS gene landscapes, with particular emphasis on differences between diploid and polyploid plants. The expansion of NBS genes in polyploid genomes—through mechanisms such as whole-genome duplication (WGD) and small-scale duplications (SSD)—creates unique opportunities for functional diversification and specialization that are not available to diploid species [2]. By synthesizing findings from diverse plant systems including sorghum, cotton, Brassica, and Ipomoea species, we aim to establish a comprehensive framework for understanding how evolutionary history and genomic context influence the structure and function of plant immune receptor repertoires.

Genomic Distribution and Organization of NBS Genes

Chromosomal Distribution and Clustering Patterns

Across plant genomes, NBS-encoding genes display non-random distribution patterns, consistently forming clusters on specific chromosomes. In sorghum, over 60% of NBS-encoding genes are located on just three chromosomes (SBI-02, SBI-05, and SBI-08), with approximately 68.7% organized in clusters [103]. Similar clustering patterns are observed in Ipomoea species, where 76.71-90.37% of NBS genes reside in clusters depending on the species [61]. This non-uniform distribution extends to cotton species, where NBS genes are distributed unevenly across chromosomes and tend to form clusters [10].

The tendency for NBS genes to cluster has significant functional implications. Clustered arrangements facilitate the emergence of new recognition specificities through mechanisms such as unequal crossing over and gene conversion, enabling plants to rapidly adapt to evolving pathogen populations. These clusters often represent genomic hotspots for disease resistance, as evidenced by the significant enrichment of NBS-encoding genes in regions containing fungal pathogen resistance quantitative trait loci (QTL) in sorghum [103].

Table 1: NBS Gene Distribution and Cluster Patterns Across Plant Species

Species	Ploidy	Total NBS Genes	Chromosomes with High Density	Genes in Clusters	Citation
Sorghum bicolor	Diploid	346	SBI-02, SBI-05, SBI-08	68.7%	[103]
Gossypium hirsutum	Allotetraploid	588	Not specified	Not specified	[10]
Gossypium barbadense	Allotetraploid	682	Not specified	Not specified	[10]
Ipomoea batatas	Hexaploid	889	Not specified	83.13%	[61]
Ipomoea trifida	Diploid	554	Not specified	76.71%	[61]
Ipomoea triloba	Diploid	571	Not specified	90.37%	[61]
Ipomoea nil	Diploid	757	Not specified	86.39%	[61]

Diversity of NBS Gene Architectures

NBS-encoding genes exhibit remarkable structural diversity, with variations in domain architecture leading to functional specialization. The major classes include:

TNL: TIR-NBS-LRR proteins, predominant in dicots
CNL: CC-NBS-LRR proteins, found in both monocots and dicots
RNL: RPW8-NBS-LRR proteins, functioning as "helper" NLRs in signal transduction [2] [61]

Additionally, numerous truncated variants exist, including TN, CN, NL, and N-type genes, which may fulfill specialized regulatory roles or act as decoys in defense signaling [103] [60]. The distribution of these architectural types varies significantly between species and is influenced by evolutionary history. For instance, comparative analysis of cotton species revealed that G. arboreum and G. hirsutum possess a greater proportion of CN, CNL, and N genes and a lower proportion of TNL genes compared to G. raimondii and G. barbadense [10]. This architectural variation contributes to differences in disease resistance profiles between species.

Table 2: NBS Gene Architecture Distribution in Cotton Species (%)

Gene Type	G. arboreum	G. raimondii	G. hirsutum	G. barbadense
CN	17.89	10.68	15.14	13.49
CNL	32.52	29.32	28.06	20.97
N	23.98	16.99	28.57	25.07
NL	21.54	24.38	26.19	30.79
TN	0.81	3.84	0.00	1.61
TNL	2.03	13.70	0.85	6.45
RN	0.00	0.27	0.17	0.29
RNL	1.22	0.82	1.02	1.32

Evolutionary Dynamics of NBS Genes

Selection Pressures Acting on NBS Genes

NBS-encoding genes are subject to contrasting evolutionary pressures that shape their diversity and distribution. In sorghum, these genes show significantly higher diversity compared to non-NBS-encoding genes and are enriched in genomic regions under both purifying selection (through domestication and improvement) and balancing selection [103]. This paradoxical situation arises because different NBS genes, or even different domains within the same gene, experience distinct selective pressures:

Purifying selection: Acts on conserved signaling domains (NBS region) to maintain functional integrity
Balancing selection: Maintains diversity in LRR domains to recognize evolving pathogens
Positive selection: Drives rapid diversification in specific solvent-exposed residues involved in pathogen recognition

The type of biotic stress resistance QTL co-locating with NBS genes influences their diversity patterns, suggesting pathogen-specific evolutionary trajectories [103]. Furthermore, ancestral genes predating species divergence are more abundant in regions under selection than species-specific genes, indicating that evolutionarily ancient NBS genes may play fundamental roles in plant immunity [103].

Impact of Whole Genome Duplication and Tandem Duplication

The expansion of NBS gene families has been driven by both small-scale duplications (SSD), including tandem duplications, and whole genome duplication (WGD) events [2]. The relative contributions of these mechanisms vary across species and have profound implications for NBS gene evolution:

In Brassica species, after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions were rapidly deleted or lost. However, species-specific gene amplification subsequently occurred through tandem duplication after the divergence of B. rapa and B. oleracea [60]. This pattern of "boom and bust" following polyploidization—initial gene loss followed by lineage-specific expansion—appears to be a common feature of NBS gene evolution in polyploids.

Similarly, in Ipomoea species, sweet potato (I. batatas) possesses more NBS genes (889) than its diploid relatives, with a higher proportion resulting from segmental duplications rather than tandem duplications [61]. This contrasts with the diploid Ipomoea species, where tandem duplication predominates, suggesting that the hexaploid nature of sweet potato has enabled different evolutionary trajectories for its NBS gene repertoire.

Figure 1: Evolutionary Dynamics of NBS Genes in Polyploid Plants. Polyploidization events trigger rapid gene loss followed by lineage-specific expansion and diversification through tandem duplication and diversifying selection.

Asymmetric Evolution in Allopolyploids

Allopolyploid species—formed through hybridization and genome doubling—provide particularly compelling insights into NBS gene evolution. In cotton, asymmetric evolution of NBS-encoding genes has been observed between subgenomes, with allotetraploid species inheriting different proportions of NBS genes from their diploid progenitors [10].

Sequence similarity and synteny analyses reveal that G. hirsutum inherited more NBS-encoding genes from the A-genome donor (G. arboreum), while G. barbadense inherited more NBS-encoding genes from the D-genome donor (G. raimondii) [10]. This asymmetric inheritance has functional consequences for disease resistance, as G. raimondii and G. barbadense are more resistant to Verticillium wilt, whereas G. arboreum and G. hirsutum are more susceptible [10]. The TNL class of NBS genes appears to play a significant role in this resistance difference, as they are more abundant in the resistant species.

This pattern of asymmetric evolution demonstrates that allopolyploid formation creates novel genomic contexts where selective pressures can act differently on homoeologous NBS genes, leading to subfunctionalization or neofunctionalization that expands the defensive capabilities of polyploid species compared to their diploid progenitors.

Synteny and Gene Cluster Conservation

Conserved Syntenic Blocks Harboring NBS Genes

Synteny—the conservation of gene order across related species—provides powerful evidence for functional constraint on genomic organization. Comparative genomic analyses have identified syntenic blocks harboring NBS genes that are conserved across millions of years of evolution [104]. These conserved syntenic blocks often contain arrays of highly conserved noncoding elements (HCNEs) clustered around developmental regulatory genes, forming genomic regulatory blocks (GRBs) [105].

In the context of NBS genes, conserved synteny indicates selective pressure to maintain gene linkage, potentially due to:

Coregulation of clustered NBS genes
Shared regulatory elements controlling expression
Functional interactions between neighboring genes

A study of four Ipomoea species identified 201 NBS-encoding orthologous genes forming syntenic gene pairs between species, indicating derivation from common ancestral genes [61]. The conservation of these syntenic relationships despite extensive genome reorganization highlights the functional importance of these genomic regions.

Genomic Regulatory Blocks and Bystander Genes

The concept of Genomic Regulatory Blocks (GRBs) helps explain the conservation of synteny around important regulatory genes, including some NBS genes [105]. GRBs are chromosomal segments spanned by highly conserved noncoding elements (HCNEs), their developmental regulatory target genes, and phylogenetically and functionally unrelated "bystander" genes.

Bystander genes are not under the control of the regulatory elements that define the GRB but are caught within the block due to the long-range nature of regulatory elements [105]. In teleost fishes, after whole-genome duplication, GRBs including HCNEs and target genes were often maintained in both copies, while bystander genes were typically lost from one GRB [105]. This selective retention demonstrates evolutionary pressure to maintain the integrity of these regulatory blocks.

While this phenomenon was initially characterized around developmental regulators, similar principles may apply to large clusters of NBS genes, particularly those showing conserved synteny across wide evolutionary distances. The maintenance of such gene clusters suggests coordinated regulation or functional interdependence that confers selective advantages.

Experimental Approaches and Methodologies

Genomic Identification and Annotation of NBS Genes

The identification and characterization of NBS-encoding genes relies on established bioinformatics workflows combining sequence similarity searches, domain identification, and manual curation:

Figure 2: Bioinformatics Workflow for NBS Gene Identification and Analysis. The pipeline illustrates the key steps in identifying and characterizing NBS-encoding genes from genome sequences, with essential bioinformatics tools for each stage.

Key steps in the workflow include:

Data acquisition: Genome assemblies and annotation data are downloaded from public databases such as NCBI, Phytozome, or species-specific databases [2] [60].
HMMER search: The Hidden Markov Model (HMM) profile for the NBS (NB-ARC) domain (PF00931) is used to scan predicted protein sequences with stringent e-value cutoffs (e.g., 1.1e-50) [2] [60].
Domain architecture analysis: Additional domains (TIR, CC, LRR, RPW8) are identified using HMMPfam, HMMSmart, and coiled-coil prediction tools (PAIRCOIL2, MARCOIL) [60].
Classification: Genes are classified into structural types (TNL, CNL, RNL, etc.) based on domain composition [10].
Manual curation: False positives are removed through manual inspection and sequence validation [60].

Phylogenetic Footprinting and Comparative Genomics

Phylogenetic footprinting—the identification of functional elements through sequence conservation across species—has emerged as a powerful approach for detecting regulatory sequences associated with NBS genes [106]. This method leverages the principle that functional sequences evolve more slowly than non-functional DNA due to selective constraints.

The ConSite algorithm integrates phylogenetic footprinting with transcription-factor binding-site predictions, significantly improving specificity by reducing false-positive rates by approximately 85% compared to single-sequence analysis [106]. This approach is particularly valuable for identifying conserved regulatory elements that control the expression of NBS gene clusters.

Evolutionary-based gene cluster discovery algorithms like EvolClust have been used to identify ~35,000 cluster families across 882 eukaryotic species, enabling systematic analysis of gene order conservation [104]. These resources facilitate the identification of conserved syntenic blocks containing NBS genes and the inference of evolutionary events such as gene gain, loss, or horizontal transfer.

Expression Analysis and Functional Validation

Understanding the functional significance of NBS genes requires moving beyond genomic identification to expression analysis and experimental validation:

Expression profiling: RNA-seq data from different tissues, developmental stages, and stress conditions are used to analyze expression patterns of NBS genes [2] [61]. For example, analysis of susceptible and tolerant cotton accessions identified differentially expressed NBS genes in response to cotton leaf curl disease [2].

Virus-Induced Gene Silencing (VIGS): This technique enables functional characterization of candidate NBS genes by knocking down their expression and assessing changes in disease resistance phenotypes [2]. Silencing of GaNBS in resistant cotton demonstrated its role in virus tolerance [2].

Genetic variation analysis: Comparison of NBS genes between resistant and susceptible genotypes identifies sequence variants associated with disease resistance [2]. In cotton, tolerant accessions showed a greater number of unique variants in NBS genes compared to susceptible varieties [2].

Table 3: Essential Research Reagents and Resources for NBS Gene Analysis

Resource Category	Specific Examples	Function/Application	Reference
Genomic Databases	NCBI, Phytozome, Plaza, BRAD, Bolbase	Source of genome assemblies and annotations	[2] [60]
Domain Databases	Pfam (PF00931, PF01582)	HMM profiles for NBS and TIR domains	[60]
Bioinformatics Tools	HMMER v3.0, OrthoFinder v2.5.1, MAFFT 7.0, DIAMOND	Domain identification, orthogroup inference, multiple sequence alignment	[2]
Expression Databases	IPF Database, CottonFGD, Cottongen	RNA-seq data for expression profiling	[2]
Cluster Analysis	EvolClustDB	Database of evolutionarily conserved gene neighborhoods	[104]
Functional Validation	VIGS vectors, RNAi constructs	Gene silencing for functional characterization	[2]

The study of synteny and macroevolution in NBS gene regions reveals a complex interplay between conservation and plasticity in plant genomes. Conserved syntenic blocks maintain core regulatory architectures and gene linkages over evolutionary timescales, while plastic regions serve as hotbeds for innovation through rapid duplication and diversification. The tension between these forces enables plants to maintain essential immune functions while retaining the capacity to adapt to new pathogen challenges.

In the context of diploid versus polyploid plants, allopolyploid species exhibit unique evolutionary dynamics, including asymmetric evolution of NBS genes from different subgenomes and the emergence of novel resistance specificities through interactions between homoeologous genes. These phenomena contribute to the enhanced disease resistance often observed in polyploid crops and provide opportunities for crop improvement through strategic manipulation of N gene repertoires.

Future research directions should include:

Pan-genome analyses to capture the full diversity of NBS genes across individuals and populations
Single-cell expression profiling of NBS genes to understand cell-type-specific defense responses
Advanced protein structure prediction to link sequence diversity with molecular recognition capabilities
Synthetic biology approaches to engineer novel resistance specificities by combining domains from different NBS genes

As genomic technologies continue to advance, our ability to decipher the complex evolutionary patterns of NBS genes will improve, enabling more precise manipulation of disease resistance traits in crop plants. The integration of comparative genomics, functional studies, and evolutionary analysis provides a powerful framework for developing durable disease resistance strategies that can withstand rapidly evolving pathogen populations.

Conclusion

The expansion of NBS disease resistance genes is a dynamic and complex process influenced profoundly by ploidy. While polyploidization provides raw genetic material for innovation, it does not guarantee uniform NBS gene expansion; evolutionary trajectories are shaped by lineage-specific duplications, rapid gene loss, and transcriptional rewiring. Diploid species can harbor immense NBS families through tandem duplication, whereas polyploids demonstrate diverse fates from gene retention and subgenome dominance to functional divergence. Future research must leverage long-read sequencing and single-cell transcriptomics to resolve haplotype-specific NBS expression in polyploids. For biomedical and clinical research, understanding how plants balance expanded immune gene repertoires against autoimmunity risks offers a valuable evolutionary model for studying gene family regulation and the development of synthetic immune systems.