This article provides a comprehensive analysis of the evolution of Nucleotide-Binding Site (NBS) domain genes, the largest class of plant resistance (R) genes.
This article provides a comprehensive analysis of the evolution of Nucleotide-Binding Site (NBS) domain genes, the largest class of plant resistance (R) genes. We explore the foundational evolutionary trajectory of these genes from early land plants to angiosperms, highlighting major diversification events and lineage-specific adaptations. The piece details cutting-edge methodologies for NBS gene identification, from traditional HMM-based searches to novel deep learning tools, and addresses key challenges in their study, including annotation difficulties and transcriptional regulation. Finally, we present validation techniques and comparative genomic insights that reveal the functional roles of specific NBS genes and discuss the emerging implications of this knowledge for disease resistance breeding and its unexpected connections to biomedical research, particularly in understanding immune receptor functions.
The evolutionary history of land plants is marked by their continuous adaptation to a pathogen-rich environment. Central to this adaptation is the expansion and diversification of intracellular immune receptors encoded by the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family. These genes, which constitute the largest class of plant disease resistance (R) genes, have undergone remarkable genomic changes from the early non-vascular plants to the diverse flowering plants we see today. This whitepaper traces the trajectory of NBS-LRR gene expansion, leveraging recent genomic studies to quantify this phenomenon and explore its functional implications for plant immunity. The investigation of these genes is not merely an academic exercise; it provides a fundamental resource for understanding the molecular basis of disease resistance and informs future crop breeding strategies [1] [2].
The NBS-LRR gene family originated in the common ancestor of all green plants, with early divergence into different subclasses [1]. However, the scale of this gene family differs dramatically across the plant kingdom.
Table 1: Genomic Content of NBS-LRR Genes Across Representative Plant Species
| Plant Species | Group | Total NBS-LRR Genes | TNL | CNL | RNL | Key Reference |
|---|---|---|---|---|---|---|
| Physcomitrella patens | Moss | ~25 | Not Specified | Not Specified | Not Specified | [3] |
| Selaginella moellendorffii | Lycophyte | ~2 | Not Specified | Not Specified | Not Specified | [3] |
| Arabidopsis thaliana | Eudicot | 149-159 | 94-98 | 50-55 | (Included in total) | [4] [5] |
| Euryale ferox (Basal Angiosperm) | Angiosperm | 131 | 73 | 40 | 18 | [1] |
| Oryza sativa (Rice) | Monocot | 553-653 | 0 | 553-653 | (Included in total) | [4] |
| Secale cereale (Rye) | Monocot | 582 | 0 | 581 | 1 | [6] |
| Glycine max (Soybean) | Eudicot | 319 | Not Specified | Not Specified | Not Specified | [4] |
| Manihot esculenta (Cassava) | Eudicot | 228 | 34 | 128 | (Not Specified) | [2] |
The expansion of NBS-LRR genes has not been uniform across all plant lineages. Independent gene duplication and loss events have resulted in distinct evolutionary patterns, even among closely related species.
Angiosperm NBS-LRR genes are phylogenetically divided into three major subclasses, each with distinct structural and functional characteristics [6].
Table 2: Characteristics of the Major NBS-LRR Subclasses in Angiosperms
| Feature | TNL Subclass | CNL Subclass | RNL Subclass |
|---|---|---|---|
| N-Terminal Domain | TIR (Toll/Interleukin-1 Receptor) | CC (Coiled-Coil) | RPW8 (Resistance to Powdery Mildew 8) |
| Primary Function | Pathogen recognition ("Sensor") | Pathogen recognition ("Sensor") | Signal transduction ("Helper") |
| Presence in Monocots | Absent | Predominant | Rare |
| Presence in Dicots | Widespread | Widespread | Widespread |
| Key Signaling Component | EDS1 | Often NRC proteins | ADR1/NRG1 |
A hallmark of NBS-LRR genes is their non-random genomic distribution, which has profound implications for their evolution.
Figure 1: Evolutionary Pathways of NBS-LRR Gene Expansion and Diversification. The diagram illustrates the divergence from an ancestral gene into major subclasses, followed by duplication mechanisms that create genomic clusters where recombination drives the evolution of new pathogen recognition capabilities.
A standard pipeline for the genome-wide identification and characterization of NBS-LRR genes has been established and refined across multiple studies [1] [2] [6]. The following protocol details the key steps.
Step 1: Data Acquisition
Step 2: Initial Candidate Identification using HMMER and BLAST
hmmsearch tool from the HMMER package is typically used with a relaxed E-value threshold (e.g., 1.0) to cast a wide net [2] [6].Step 3: Domain Verification and Subclassification
HMMscan against the Pfam database (E-value < 0.0001) or the NCBI Conserved Domain Database (CDD) [1] [6].Step 4: Analysis of Genomic Distribution and Duplication
Step 5: Phylogenetic and Evolutionary Analysis
Figure 2: Experimental Workflow for Genome-Wide Identification of NBS-LRR Genes. The flowchart outlines the bioinformatic pipeline from data acquisition to final profile generation, highlighting key steps of candidate identification, domain verification, and evolutionary analysis.
Table 3: Essential Resources for NBS-LRR Gene Family Research
| Resource/Solution | Function/Application | Example Tools/Databases |
|---|---|---|
| Genomic Databases | Source for genome sequences and annotations. | Phytozome, Ensembl Plants, NCBI Genome, GDR (Genome Database for Rosaceae) [7] [2] |
| HMMER Suite | Profile HMM-based sequence search for identifying NBS domains. | hmmsearch, hmmscan (Pfam model PF00931) [2] [6] |
| Domain Analysis Tools | Verification of protein domains and classification. | NCBI CDD, Pfam, Paircoil2 (for CC domains), MEME (for motif discovery) [1] [2] [6] |
| Orthology Analysis Software | Inferring gene families and evolutionary relationships. | OrthoFinder, DendroBLAST [3] |
| Phylogenetic Software | Reconstructing evolutionary history and ancestral states. | IQ-TREE (ModelFinder, UFBoot), MEGA, FastTree [1] [6] [3] |
| Expression Databases | Profiling gene expression under various conditions. | IPF Database, CottonFGD, NCBI BioProject [3] |
| Functional Validation Tools | Testing gene function in planta. | Virus-Induced Gene Silencing (VIGS) [3] |
The journey of NBS-LRR genes from compact repertoires in bryophytes to expansive, diverse families in angiosperms underscores their pivotal role in the evolutionary arms race between plants and their pathogens. This expansion, driven by varied duplication mechanisms and refined by natural selection, has equipped angiosperms with a sophisticated and adaptable immune system. The distinct evolutionary patterns observed across plant lineages—including contraction in specialized aquatic and parasitic species—reveal a complex interplay between genomic content, ecological adaptation, and life history. The continued application of standardized genomic and bioinformatic protocols, as outlined in this whitepaper, will be crucial for further elucidating the function of specific NBS-LRR genes. Ultimately, this knowledge serves as a cornerstone for future efforts in crop improvement and sustainable agriculture, enabling the development of disease-resistant plant varieties through informed breeding and biotechnological approaches.
The evolutionary history of Nucleotide-Binding Leucine-Rich Repeat (NBR-LRR) receptors reveals fundamental insights into plant immunity mechanisms that have diverged along moncot and dicot lineages. These intracellular immune receptors form a critical component of the plant innate immune system, enabling recognition of diverse pathogens through effector-triggered immunity [2]. The NBS-LRR gene family has undergone substantial lineage-specific evolution, culminating in the striking absence of entire receptor subclasses in certain plant families [9] [10]. This whitepaper examines the molecular basis and evolutionary implications of these divergent adaptations, focusing specifically on the loss of TNL genes in monocots and the subsequent functional diversification in both monocot and dicot lineages. Understanding these evolutionary trajectories provides crucial insights for plant immunity research and crop enhancement strategies.
The evolutionary split between monocots and dicots represents a fundamental divergence in angiosperm history, estimated to have occurred approximately 200 million years ago (with an uncertainty of about 40 million years) based on chloroplast DNA sequence analysis [11]. This temporal framework establishes the timeline for subsequent lineage-specific adaptations in immune gene families. Genomic analyses consistently place Acorales as the sister lineage to all other extant monocots, making it a critical taxon for understanding early monocot evolution and genomic architecture [12].
NLR genes encode a pivotal class of plant immune receptors that have undergone dynamic evolution through gene duplication, loss, and diversification. A novel classification system for angiosperm NLR genes, grounded in network analysis of microsynteny information, categorizes these genes into five distinct classes: CNLA, CNLB, CNL_C, TNL, and RNL [9]. This refined classification reveals the complex evolutionary history of NLR genes beyond the traditional grouping, enabling more precise tracking of lineage-specific adaptations.
Table 1: Classification of Plant NLR Genes
| Class | N-Terminal Domain | Distribution | Characteristics |
|---|---|---|---|
| TNL | TIR (Toll/Interleukin-1 Receptor) | Dicots only | Lost in monocots; specific signaling pathway |
| CNL_A | Coiled-Coil (CC) | Angiosperms | Further subdivided in new classification |
| CNL_B | Coiled-Coil (CC) | Angiosperms | Monocot-specific expansions |
| CNL_C | Coiled-Coil (CC) | Angiosperms | Distinct evolutionary trajectory |
| RNL | CC or other | Angiosperms | Helper NLRs |
The NBS domain itself can be divided into two major groups based on phylogenetic analysis. Group I NBS domains contain group-specific motifs that are always linked with TIR sequences at the N-terminus, while Group II NBS domains are always associated with putative coiled-coil domains in their N-terminus [10]. This fundamental division reflects deep evolutionary divergence in plant immune signaling pathways.
Diagram 1: Evolutionary trajectory of NLR genes following monocot-dicot divergence
Comprehensive genomic analyses have revealed that TNL family genes are conspicuously absent in monocot genomes [9] [10]. This pattern was initially identified through phylogenetic reconstruction of NBS domains, which demonstrated that Group I NBS domains (always associated with TIR sequences) are widely distributed in dicot species but undetectable in cereal genomes [10]. Experimental evidence further confirmed that Group I-specific NBS sequences could be readily amplified from dicot genomic DNA but not from cereal genomic DNA [10].
Recent synteny-informed classification provides a model explaining this extinction event, with compelling microsynteny evidence indicating a clear correspondence between non-TNLs in monocots and the extinct TNL subclass [9]. This suggests that specific genomic regions in monocots have undergone fundamental reorganization following TNL loss.
The loss of TNL genes in monocots represents a significant evolutionary event that has shaped subsequent immune receptor evolution. This extinction has potentially driven:
The absence of TNLs in monocots implies that their cognate signaling pathways have diverged from those in dicots, suggesting fundamental differences in how these major plant lineages perceive and respond to pathogens [10].
NLR genes typically exist in large multigene families and are often organized in genomic clusters, which facilitates their rapid evolution through recombination and gene conversion [13] [2]. Studies across multiple plant species have revealed that these clusters vary in size and complexity, with most containing closely related genes derived from recent common ancestors [2].
Two distinct patterns of evolution have been identified among NBS-LRR genes: Type I genes are often represented by multiple paralogs in a genome and evolve rapidly with frequent gene conversions, while Type II genes typically have fewer paralogs, evolve slowly, and experience rare gene conversion events [13]. This differential evolutionary rate contributes to the diverse repertoire of pathogen recognition capabilities across plant lineages.
Plants have evolved sophisticated regulatory mechanisms to control NBS-LRR gene expression, as high expression levels can be lethal to plant cells [13]. Diverse microRNAs (miRNAs) target NBS-LRRs in both eudicots and gymnosperms, creating a tight association between NBS-LRR diversity and miRNA regulation [13].
Table 2: miRNA Families Targeting NBS-LRR Genes
| miRNA Family | Target Site | Distribution | Evolutionary Origin |
|---|---|---|---|
| miR482/2118 | P-loop region | Gymnosperms to dicots | Prior to angiosperms |
| miR472 | Multiple sites | Specific lineages | Younger, lineage-specific |
| miR6019 | TIR-NBS-LRR | Dicots | Recent evolution |
| miR6020 | TNL genes | Dicots | Recent evolution |
The miRNAs typically target highly duplicated NBS-LRRs, while heterogeneous NBS-LRR families are rarely targeted by miRNAs in Poaceae and Brassicaceae genomes [13]. This suggests lineage-specific co-evolution between miRNAs and their NBS-LRR targets. New miRNAs periodically emerge from duplicated NBS-LRRs from different gene families, with most targeting the same conserved, encoded protein motif of NBS-LRRs, consistent with a model of convergent evolution [13].
Comprehensive identification of NBS-LRR genes in plant genomes involves multiple bioinformatic approaches:
These methods have been successfully applied to catalog NBS-LRR genes in numerous plant species, including cassava, where 228 NBS-LRR type genes and 99 partial NBS genes were identified, representing almost 1% of the total predicted genes [2].
Reconstructing evolutionary relationships among NLR genes requires:
These analyses have revealed that 63% of R genes in cassava occur in 39 clusters on chromosomes, with most clusters being homogeneous and containing NBS-LRRs derived from a recent common ancestor [2].
Diagram 2: Workflow for NLR gene identification and evolutionary analysis
Table 3: Essential Research Materials for NLR Gene Studies
| Reagent/Resource | Function/Application | Example Specifications |
|---|---|---|
| HMMER Suite | Hidden Markov Model searches for domain identification | v3 with cassava-specific NBS HMM (E-value < 0.01) |
| Pfam Databases | Conserved domain identification | NBS (NB-ARC) PF00931, TIR PF01582, LRR models |
| ClustalW | Multiple sequence alignment | Default parameters for NB-ARC domain alignment |
| MEGA Software | Phylogenetic tree estimation | Maximum Likelihood, Whelan and Goldman + freq. model |
| Paircoil2 | Coiled-coil domain prediction | P-score cut-off of 0.03 |
| Jalview | Alignment curation and visualization | Manual curation of poorly aligned regions |
| Phytozome | Genome annotations and resources | Cassava v4.1/v5.0 genome data |
The lineage-specific adaptations in plant NLR genes, particularly the loss of TNLs in monocots and their retention and diversification in dicots, exemplify the dynamic nature of plant immune system evolution. These divergent evolutionary trajectories have resulted in fundamentally different immune receptor repertoires between these major angiosperm lineages, with significant implications for plant-pathogen co-evolution. The emerging understanding of these patterns, facilitated by advanced genomic analyses and synteny-informed classification, provides crucial insights for future crop improvement strategies and enhances our fundamental knowledge of plant immunity evolution. Further comparative analyses across diverse plant lineages will continue to reveal the intricate interplay between genomic architecture, regulatory mechanisms, and pathogen pressure that has shaped the evolution of these critical immune receptors.
Plant immunity relies on a sophisticated innate immune system that actively protects against pathogen invasion [14]. A crucial component of this system involves intracellular immune receptors known as Nucleotide-binding and Leucine-rich Repeat (NLR) proteins, which mediate effector-triggered immunity (ETI) upon pathogen recognition [14] [15]. NLRs function as molecular switches that perceive pathogen effector proteins and initiate robust immune responses, typically accompanied by programmed cell death termed the hypersensitive response [14]. The structural classification and domain architecture diversity of these NLR proteins have evolved through constant evolutionary arms races with rapidly adapting pathogens, resulting in tremendous genetic innovation that makes NLR-encoding genes among the most diverse in plant genomes [14]. This technical guide examines the structural principles governing NLR function, their evolutionary trajectories across plant species, and the experimental frameworks for their investigation, providing a comprehensive resource for researchers studying plant immunity.
NLR proteins exhibit a conserved tripartite modular domain architecture that classifies them as STAND (Signal Transduction ATPases with Numerous Domains) proteins [14]. This canonical architecture consists of:
Table 1: Core Domains of Plant NLR Proteins
| Domain | Structural Type | Primary Function | Key Features |
|---|---|---|---|
| N-terminal | Signaling domain | Mediates cell death response | Determines NLR classification; most variable region |
| NB-ARC | Nucleotide-binding domain | Molecular switch via ADP/ATP exchange | Conserved in plants; controls activation state |
| LRR | Superstructure-forming repeats | Pathogen recognition & autoinhibition | Hypervariable; under positive selection |
The N-terminal signaling domains form the basis for classifying NLRs into distinct categories, with these classifications following the phylogeny of the NB-ARC domain, indicating a deep evolutionary origin [14]. Four main N-terminal domain types have been characterized in angiosperms:
In non-flowering plants, NLRs can carry additional N-terminal domain types, including α/β hydrolases and kinase domains, revealing even greater architectural diversity beyond flowering plants [14]. The recently generated RefPlantNLR collection of almost 500 experimentally validated NLRs illustrates the extensive structural diversity within this protein family [14].
Beyond the canonical tripartite structure, many NLRs have diversified into specialized proteins with additional non-canonical domains or degenerated features [14]. These include:
Figure 1: Domain Architecture Diversity in Plant NLR Proteins. NLRs display both canonical tripartite structures and various non-canonical variants with truncated forms, integrated domains, or degenerated features.
NLR genes represent one of the most dynamic and rapidly evolving gene families in plants, showing remarkable variation in copy number across species [14] [3]. Comparative genomic analyses reveal:
Large-scale comparative studies have identified 12,820 NBS-domain-containing genes across 34 species spanning from mosses to monocots and dicots, classified into 168 distinct domain architecture classes [3]. This comprehensive analysis reveals both classical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR, etc.) and species-specific structural configurations (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS, etc.) [3].
The dramatic expansion and diversification of NLR genes primarily result from three molecular mechanisms:
These duplication events are followed by intense positive selection, particularly in the LRR domains, enabling continuous adaptation to evolving pathogen effectors [15]. The "arms race" with pathogens subjects NLR genes to strong diversifying selection, resulting in rapid coevolution and neo-functionalization [15].
Table 2: Evolutionary Mechanisms Driving NLR Diversity in Plants
| Mechanism | Prevalence | Impact on NLR Repertoire | Examples |
|---|---|---|---|
| Tandem Duplication | Primary driver | Creates local clusters; rapid new allele generation | 53/288 NLRs in pepper [15] |
| Segmental Duplication | Significant contributor | Distributes NLRs across chromosomes | Widespread in eudicots [3] |
| Retrotransposition | Less common | Disperses NLRs throughout genome | Limited documentation |
| Positive Selection | Widespread in LRR domains | Enhances effector recognition | Hypervariable LRR regions [15] |
Beyond functioning as singleton receptors, NLRs increasingly operate in genetically linked pairs or complex networks with functionally specialized components [14] [18]. In these higher-order configurations:
The rice NLR pair Pik-1 and Pik-2 exemplifies this functional specialization, where Pik-1 acts as a sensor that binds AVR-Pik effectors through an integrated HMA domain, while Pik-2 functions as a helper NLR required for immune signaling activation [17]. This cooperative system demonstrates exquisite specificity, where matching pairs of allelic Pik NLRs mount effective immune responses, while mismatched pairs lead to autoimmune phenotypes [17].
Paired NLRs display diverse genomic architectures with varying functional constraints:
Figure 2: Evolution of NLR Systems from Singletons to Pairs and Networks. NLRs can function as individual receptors, specialized pairs, or complex networks with many-to-one and one-to-many sensor-helper connections.
Comparative analyses reveal significant variation in NLR abundance and architecture across plant families, independent of genome size [19] [3]:
Phylogenetic evidence indicates that NLR genes originated alongside their host species and underwent adaptive evolution that facilitated global colonization [20]. Several key evolutionary patterns emerge:
Table 3: NLR Repertoire Size Variation Across Plant Species
| Plant Species | Family | NLR Count | Notable Features |
|---|---|---|---|
| Arabidopsis thaliana | Brassicaceae | ~150 | Model for NLR studies [15] |
| Oryza sativa (rice) | Poaceae | ~500 | Well-characterized pairs [15] [17] |
| Capsicum annuum (pepper) | Solanaceae | 288 | High density on Chr09 [15] |
| Triticum aestivum (wheat) | Poaceae | >1,000 | High NLR content [14] |
| Asparagus officinalis | Asparagaceae | 27 | Domesticated reduction [16] |
| Asparagus setaceus | Asparagaceae | 63 | Wild relative with expanded repertoire [16] |
| Citrus species (average) | Rutaceae | ~160 | Diverse architectures [20] |
Comprehensive identification of NLR genes requires integrated computational approaches:
These methods typically identify candidate sequences containing NB-ARC domains, which are then validated for presence/completeness of N-terminal (TIR, CC, RPW8) and C-terminal (LRR) domains [15].
Multiple computational frameworks support evolutionary and functional characterization:
Several experimental methods confirm NLR function:
Table 4: Essential Research Reagents and Resources for NLR Studies
| Resource Category | Specific Tools | Function/Application | Key Features |
|---|---|---|---|
| Bioinformatics Tools | HMMER v3.3.2 [15] | Domain-based NLR identification | Hidden Markov Model searches |
| NLR-Annotator v2.1 [20] | Automated NLR annotation | Standardized classification | |
| OrthoFinder v2.2.7 [16] | Orthogroup analysis | Evolutionary relationships | |
| MCScanX [15] | Synteny and duplication analysis | Identifies gene pairs and clusters | |
| Databases | RefPlantNLR [14] | Curated NLR collection | ~500 experimentally validated NLRs |
| Pfam (PF00931) [15] | NB-ARC domain reference | Core domain identification | |
| PlantCARE [15] [16] | cis-element prediction | Promoter analysis | |
| Experimental Resources | VIGS vectors [3] | Functional validation | Gene silencing in plants |
| STRING database [15] | Protein interaction prediction | PPI network mapping | |
| PhytoPath [15] | Pathogen effector data | Effector-NLR interaction studies |
The structural classification and domain architecture diversity of plant NLRs reflects continuous evolutionary innovation driven by plant-pathogen arms races. From canonical tripartite structures to specialized pairs and complex networks, NLR proteins exhibit remarkable architectural flexibility that enables specific pathogen recognition and robust immune activation. The evolutionary mechanisms of tandem duplication, positive selection, and domain integration have generated extensive NLR diversity across plant lineages, while maintaining core signaling functions through conserved NB-ARC domains. Methodological advances in genomic identification, phylogenetic analysis, and functional validation continue to reveal new dimensions of NLR structural diversity, providing insights for engineering disease resistance in crop species. Future research elucidating the structure-function relationships of non-canonical NLR architectures and their higher-order assemblies will further advance our understanding of plant immunity and its evolution.
In the ongoing evolutionary arms race between plants and their pathogens, resistance (R) genes represent a critical line of defense. Among these, genes containing Nucleotide-Binding Site (NBS) and Leucine-Rich Repeat (LRR) domains form the largest and most extensively studied family, playing a pivotal role in pathogen recognition and activation of immune responses [21] [22]. The evolution of these NBS-LRR genes is characterized by extraordinary diversification, driven primarily by tandem gene duplications and the formation of genetically linked gene clusters [21] [22]. This dynamic genomic architecture enables plants to rapidly generate novel recognition specificities, allowing them to keep pace with evolving pathogenic threats. The NBS domain, highly conserved and responsible for ATP/GTP binding and hydrolysis, serves as a molecular switch for immune signaling, while the hypervariable LRR domain determines pathogen recognition specificity [22]. This review synthesizes current understanding of how tandem duplications and gene cluster organization have shaped the evolutionary trajectory of NBS domain genes in land plants, providing a framework for future research and biotechnological applications.
Genomic studies across plant species have revealed consistent patterns in the distribution and organization of NBS-LRR genes. These genes are frequently organized into complex clusters within plant genomes, with significant variation in cluster size, composition, and chromosomal distribution.
Table 1: Genomic Distribution of NBS-LRR Genes and Clusters in Selected Plant Species
| Plant Species | Total NBS-LRR Genes Identified | Genes in Clusters (%) | Number of Clusters | Largest Cluster Size (Genes) | Chromosomal Distribution |
|---|---|---|---|---|---|
| Pepper (Capsicum annuum) | 252 | 54% (136 genes) | 47 | 8 genes (Chromosome 3) | All chromosomes, highest on Chr3 [22] |
| Barley (Hordeum vulgare) | 1,199 LDPRs* identified | Significant association | Data Not Specified | Data Not Specified | Primarily subtelomeric regions [23] |
| Arabidopsis (Arabidopsis thaliana) | 149 NBS-LRR genes | Data Not Specified | Data Not Specified | Data Not Specified | Genome-wide distribution [21] |
*LDPRs: Long-Duplication-Prone Regions
The pepper genome illustrates a typical organizational pattern, with 252 identified NBS-LRR genes unevenly distributed across all chromosomes [22]. Chromosome 3 emerges as a particular hotspot, containing the highest number of genes (38) and the largest cluster comprising eight genes [22]. Notably, 54% of all NBS-LRR genes in pepper reside within 47 physical clusters, underscoring the prevalence of this genomic arrangement [22]. Similarly, recent analysis of the exceptionally repetitive barley genome has identified 1,199 Long-Duplication-Prone Regions (LDPRs) that show statistically significant associations with pathogen defense genes, indicating that natural selection has favored lineages where arms-race genes fall within these duplication-prone genomic regions [23].
Table 2: Classification and Structural Diversity of NBS-LRR Genes in Pepper
| Gene Classification | Number of Genes | Percentage of Total | Key Structural Features |
|---|---|---|---|
| nTNL (non-TIR-NBS-LRR) | 248 | 98.4% | Dominant class in pepper; includes CC-NBS-LRR |
| TNL (TIR-NBS-LRR) | 4 | 1.6% | Minor class in pepper |
| Genes with CC domain | 48 | 19.0% | Facilitate protein-protein interactions |
| Genes lacking both CC and TIR domains | 200 | 79.4% | Highlight structural diversity |
| Gene Subclasses (Domain Structure) | 7 subclasses identified | - | N, NL, NLL, NN, NLN, NLNLN, TN |
The quantitative distribution reveals striking lineage-specific adaptations, with nTNL genes dramatically dominating over TNL genes in pepper (248 versus 4) [22]. This pattern reflects broader evolutionary trends observed across angiosperms, which show significant losses of TNL genes in monocots compared to dicots [22]. The structural classification further reveals six distinct nTNL subclasses based on domain architecture, with the NLNLN subclass represented by only a single gene, illustrating the diverse evolutionary trajectories possible within this gene family [22].
Tandem gene duplications occur frequently through mechanisms such as non-allelic homologous recombination (NAHR) and replication slippage, creating arrays of closely related genes [23]. These duplication events provide the raw genetic material for evolutionary innovation through several pathways:
In barley, duplication-prone regions show a history of repeated long-distance dispersal to distant genomic sites, followed by local expansion by tandem duplication [23]. Often, the long tandemly duplicated motif differs between sites, suggesting these arise frequently throughout evolutionary history [23]. This dynamic creates a genomic environment where genes involved in arms races can form effectively cooperative associations with duplication-inducing sequences, representing an evolutionarily advantageous strategy at the lineage level [23].
The NBS-LRR gene family evolves through a birth-and-death process characterized by continuous cycles of gene duplication, functional diversification, and pseudogenization [21]. Strong positive selection acts primarily on the LRR domains, particularly on solvent-exposed residues involved in direct protein-protein interactions, reflecting continuous adaptation to recognize evolving pathogen effectors [21] [22]. This diversifying selection maximizes the repertoire of recognition specificities available to counter diverse pathogenic threats.
Gene clusters often include members from the same gene subfamily, but some clusters contain genes from different subfamilies, reflecting complex evolutionary histories [22]. In pepper, some clusters contain genes belonging to different subfamilies (CN, NL, and N) within the same cluster, indicating that non-homologous genes can become organized into functional units through genomic rearrangement [22]. This organizational pattern facilitates coordinated regulation and co-inheritance of functionally related genes, potentially enabling more rapid adaptive responses to pathogen pressure.
Experimental Workflow for NBS-LRR Gene Identification
The standard pipeline for comprehensive identification and characterization of NBS-LRR resistance genes involves multiple complementary approaches:
Sequence-Based Identification: Initial identification typically employs BLAST searches using known NBS domain sequences and Hidden Markov Model (HMM) searches against Pfam databases to identify conserved domains [22]. These searches target characteristic motifs including P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL motifs essential for ATP/GTP binding and resistance signaling [22].
Domain Architecture Analysis: Following identification, genes are classified based on their N-terminal domains using tools like COILS for coiled-coil domains and Pfam for TIR domains, categorizing them into TNL (TIR-NBS-LRR) or nTNL (non-TIR-NBS-LRR, including CNL) subfamilies [22].
Phylogenetic Reconstruction: A subset of conserved NBS domain sequences are selected for multiple sequence alignment and phylogenetic tree construction to elucidate evolutionary relationships and diversification patterns within the gene family [22].
Evolutionary Dynamics of Gene Clusters
Operational definition of gene clusters varies but typically involves identifying two or more non-homologous genes in close genomic proximity that participate in a common biosynthetic or recognition pathway [24]. In practice, researchers often employ physical distance thresholds (e.g., genes within 200-500 kb) combined with functional relatedness criteria [22] [24]. Comparative genomic analyses across related species can further distinguish conserved clusters from lineage-specific arrangements, revealing evolutionary dynamics.
Advanced genome assembly approaches are crucial for accurate characterization of these regions. Long-read sequencing technologies (PacBio SMRT, ONT) combined with chromosome conformation capture (Hi-C) techniques have dramatically improved the contiguity and completeness of genome assemblies, enabling resolution of complex repetitive regions characteristic of gene clusters [25]. The barley study (MorexV3 assembly) exemplifies how high-quality genome resources enable explicit testing of evolutionary hypotheses regarding duplication-selection dynamics [23].
Table 3: Essential Research Reagents and Resources for Resistance Gene Studies
| Reagent/Resource | Specific Examples | Application and Function |
|---|---|---|
| Genome Assemblies | Barley MorexV3, Pepper CM334 | Reference sequences for gene identification and synteny analysis [23] [22] |
| Software Tools | HMMER, Pfam, COILS, MEME, OrthoMCL | Domain identification, motif discovery, orthology assignment [22] |
| Sequencing Technologies | PacBio SMRT, Oxford Nanopore, Hi-C | Long-read sequencing for resolving repetitive regions; chromatin conformation for scaffolding [25] |
| Phylogenetic Software | MAFFT, MUSCLE, MrBayes, RAxML | Multiple sequence alignment and evolutionary inference [22] |
| Expression Analysis | RNA-seq, qPCR primers | Transcriptional profiling under pathogen challenge [22] |
The organization of NBS-LRR resistance genes into tandemly duplicated clusters represents a fundamental evolutionary strategy that enables land plants to maintain diverse and adaptable detection systems against rapidly evolving pathogens. The dynamic birth-and-death evolution observed in these gene families, driven by continuous cycles of duplication, diversification, and selection, creates a genomic environment conducive to rapid innovation in pathogen recognition.
Future research directions will likely focus on leveraging this understanding for crop improvement. The discovery that duplication-inducing elements effectively cooperate with arms-race genes suggests new approaches for targeted breeding or genome editing to enhance disease resistance [23]. As genomic technologies continue to advance, particularly in long-read sequencing and telomere-to-telomere assembly, our ability to resolve complex resistance gene clusters will improve, revealing new dimensions of plant-pathogen coevolution.
The comprehensive characterization of NBS-LRR gene clusters across diverse land plants will further illuminate the evolutionary principles governing immune gene diversification, potentially enabling predictive approaches to disease resistance breeding in agricultural systems.
The nucleotide-binding site (NBS) domain represents a fundamental component of plant immune receptors, constituting one of the largest and most diverse gene families in plant genomes. Within the context of land plant evolution, NBS-containing genes have undergone remarkable expansion and diversification, driven by constant evolutionary arms races with rapidly evolving pathogens [3]. These genes typically encode proteins containing a nucleotide-binding site domain and a leucine-rich repeat (LRR) domain, collectively known as NBS-LRR genes or NLR genes, which function as critical intracellular immune receptors responsible for recognizing pathogen effector proteins and initiating effector-triggered immunity (ETI) [26] [27].
The evolutionary history of NBS genes reveals a complex tapestry of gene duplication, loss, and divergence events that have shaped the resistance gene repertoire across different plant lineages. Recent studies have demonstrated that NBS genes originated in ancestral land plants, with bryophytes like Physcomitrella patens containing relatively small NLR repertoires of approximately 25 genes, while flowering plants have experienced substantial gene family expansion, resulting in hundreds to thousands of NBS genes [3]. This expansion has been facilitated by both whole-genome duplication (WGD) events and small-scale duplications (SSD), including tandem and segmental duplications [28] [3].
Phylogenetic analyses of NBS domains across diverse plant species have revealed distinct evolutionary patterns, with two major subclasses characterized by N-terminal Toll/Interleukin-1 Receptor (TIR) or coiled-coil (CC) domains, termed TNL and CNL genes, respectively [3] [27]. A third subclass containing RPW8 domains (RNL) has also been identified, primarily functioning in signal transduction within the immune system [3]. The comparative analysis of these NBS subfamilies across species boundaries provides crucial insights into both conserved evolutionary patterns and lineage-specific adaptations that have occurred throughout plant evolution.
The accurate identification of NBS-LRR genes within plant genomes requires a multi-step computational approach leveraging conserved protein domains and motif structures. The standard methodology involves:
Domain-Based HMMER Searches: Initial identification typically employs Hidden Markov Model (HMM) searches using HMMER software (v3.1b2 or later) with the PF00931 (NB-ARC) model from the PFAM database [28] [26]. This step identifies protein sequences containing the conserved nucleotide-binding domain. The search stringency is typically set with an E-value cutoff of 1 × 10⁻²⁰, though some studies apply less stringent thresholds (E-value < 0.01) followed by manual curation to identify divergent family members [26].
Domain Architecture Confirmation: Candidate sequences are subsequently scanned against additional domain databases to classify complete domain architectures. Key domains include:
Manual Curation and Validation: Automated predictions require manual verification to remove false positives (e.g., proteins with kinase domains but no NBS relationship) and to identify fragmented or misannotated genes through sequence extension and re-annotation [27]. This may involve extending gene models by 3 kb at both 5' and 3' ends to capture complete domain architectures.
Based on domain architecture, NBS genes are classified into distinct subfamilies:
Comprehensive Classification System (8 categories):
Simplified Classification Systems:
Sequence Alignment: Multiple sequence alignment of the NB-ARC domain regions is performed using MUSCLE v3.8.31 or MAFFT 7.0 with default parameters [28] [3]. The NB-ARC domain is typically extracted by counting 250 amino acids after the p-loop motif, and sequences with less than 90% of the full-length NB-ARC domain are excluded from analysis [26].
Tree Construction: Phylogenetic trees are inferred using maximum likelihood methods implemented in MEGA11 or FastTreeMP with 1000 bootstrap replicates to assess node support [28] [3]. The Whelan and Goldman + frequency model or Neighbor-Joining method with Nei-Gojobori evolutionary model are commonly employed [26] [28].
Orthogroup Analysis: OrthoFinder v2.5.1 with DIAMOND for sequence similarity searches and MCL clustering algorithm can identify orthogroups across multiple species, differentiating between core (conserved) and unique (lineage-specific) orthogroups [3].
Figure 1: Computational workflow for identification and phylogenetic analysis of NBS gene families in plants.
The number of NBS genes exhibits remarkable variation across plant species, reflecting differential evolutionary pressures and diversification histories. Recent comparative analyses of 34 plant species identified 12,820 NBS-domain-containing genes, which were classified into 168 distinct domain architecture classes, revealing both conserved structural patterns and species-specific innovations [3].
Table 1: NBS-LRR Gene Distribution Across Selected Plant Species
| Plant Species | Total NBS Genes | TNL Genes | CNL Genes | Other/Partial | Genome Reference |
|---|---|---|---|---|---|
| Akebia trifoliata | 73 | Not specified | Not specified | Not specified | [28] |
| Dioscorea rotundata | 167 | Not specified | Not specified | Not specified | [28] |
| Vitis vinifera | 352 | Not specified | Not specified | Not specified | [28] |
| Triticum aestivum | 2,151 | Not specified | Not specified | Not specified | [28] |
| Manihot esculenta (cassava) | 327 | 34 | 128 | 165 partial | [26] |
| Solanum tuberosum (potato) | 438 | 77 | 107 | 254 partial/other | [27] |
| Nicotiana tabacum | 603 | 73 | 224 | 306 other | [28] |
| Nicotiana sylvestris | 344 | 42 | 130 | 172 other | [28] |
| Nicotiana tomentosiformis | 279 | 40 | 112 | 127 other | [28] |
| Arabidopsis thaliana | ~150 | Not specified | Not specified | Not specified | [27] |
The data reveal substantial variation in NBS gene numbers, with early diverging land plants like the bryophyte Physcomitrella patens containing approximately 25 NLR genes, while angiosperm species typically possess hundreds to thousands of NBS genes [3]. Notably, the asterid species Solanum tuberosum (potato) contains 438 NB-LRR genes, while the closely related Nicotiana tabacum possesses 603 NBS genes, illustrating lineage-specific expansions even within the same family [28] [27].
NBS genes are frequently organized in clusters throughout plant genomes, a genomic architecture that facilitates rapid evolution through mechanisms such as unequal crossing over and gene conversion. In cassava, 63% of the 327 identified NBS-LRR genes occur in 39 clusters distributed across the chromosomes, with most clusters being homogeneous (containing NBS-LRRs derived from a recent common ancestor) [26]. Similarly, in potato, the majority of the 438 predicted NB-LRR genes are physically organized within 63 identified clusters, with 50 being homogeneous [27].
This clustering pattern is conserved across plant lineages, though cluster composition and complexity vary. Homogeneous clusters typically contain closely related genes of the same type (e.g., all TNL or all CNL), while heterogeneous clusters contain phylogenetically distant NBS-LRR genes, sometimes including both TNL and CNL genes [26] [26]. The preferential location of NBS genes in clusters is thought to facilitate the generation of novel resistance specificities through recombination and diversifying selection.
The expansion of NBS gene families has been driven by multiple duplication mechanisms, with varying contributions across plant lineages:
Whole-Genome Duplication (WGD): Paleopolyploidization events have contributed significantly to NBS gene family expansion. The Solanum lineage has experienced two consecutive genome triplications: one ancient event shared with rosids and a more recent one specific to this lineage [30]. These triplications established the genomic context for neofunctionalization of genes controlling various traits, including disease resistance components.
Small-Scale Duplications (SSD): Tandem duplications represent a major mechanism for NBS gene expansion, particularly in response to pathogen pressure. Comparative analyses of Nicotiana species revealed that whole-genome duplication contributed significantly to NBS gene family expansion, with 76.62% of NBS members in allotetraploid N. tabacum traceable to their parental genomes (N. sylvestris and N. tomentosiformis) [28].
Birth-and-Death Evolution: NBS gene families evolve through a process of birth-and-death evolution, where new genes are created by duplication and some duplicates are maintained while others are deleted or become pseudogenes [3]. This dynamic process generates substantial interspecific variation in NBS gene content and organization.
The evolution of NBS genes is characterized by contrasting selection pressures acting on different protein domains:
Diversifying Selection: LRR domains involved in pathogen recognition typically experience positive selection that increases polymorphism at specific residues, facilitating recognition of evolving pathogen effectors [26]. This diversifying selection is particularly pronounced in solvent-exposed residues of the LRR domain that directly interact with pathogen proteins.
Purifying Selection: The NBS domain responsible for nucleotide binding and activation signaling is predominantly under purifying selection that conserves structural and functional integrity [26]. Similarly, signaling domains such as TIR and CC domains experience stronger evolutionary constraints.
Lineage-Specific Selection Patterns: Comparative analyses between tomato and potato identified 18,320 orthologous gene pairs, with 138 (0.75%) showing significantly higher than average non-synonymous versus synonymous substitution rate ratios (ω), indicating diversifying selection, while 147 (0.80%) showed significantly lower than average ω, indicating purifying selection [30]. The proportions of genes under diversifying selection were higher than those observed in grass species, suggesting distinct evolutionary dynamics in Solanaceae.
Figure 2: Evolutionary dynamics driving NBS gene family expansion and diversification in plants.
The NBS gene superfamily exhibits remarkable diversity in domain architecture, which correlates with functional specialization:
Classical Architectures: Most NBS genes conform to classical domain arrangements including TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL). In cassava, among 228 full-length NBS-LRR genes, 34 contained TIR domains and 128 contained CC domains at their N-termini [26]. Similarly, in potato, 77 of 438 NB-LRR genes contain TIR-like domains, while 107 of the remaining non-TIR genes contain CC domains [27].
Non-Canonical Architectures: Recent comparative analyses have identified numerous non-canonical domain architectures, including TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, and Sugar_tr-NBS, representing species-specific structural innovations [3]. These unusual architectures likely represent functional adaptations to specific pathogen pressures.
Lineage-Specific Patterns: Significant variation exists in the relative proportions of NBS gene subfamilies across plant lineages. Monocot species generally display reduced TNL representation compared to eudicots, despite the ancient origin of TNL genes predating the angiosperm-gymnosperm split [27] [29]. In the Asteraceae family (sunflower, lettuce, chicory), comparative analysis revealed distinct families of R-genes composed of genes related to both CC and TIR domain-containing NBS-LRR R-genes, with striking similarity in CC subfamily composition between closely related species (lettuce and chicory) [31].
NBS genes display complex expression patterns reflecting their functional specialization:
Constitutive vs. Induced Expression: Some NBS genes are constitutively expressed, providing constant surveillance, while others are induced only upon pathogen recognition. Expression profiling of orthogroups in cotton identified putative upregulation of specific orthogroups (OG2, OG6, OG15) in different tissues under various biotic and abiotic stresses in susceptible and tolerant plants to cotton leaf curl disease [3].
Tissue-Specific Expression: RNA-seq analyses across multiple species reveal that NBS genes display tissue-specific expression patterns, with some genes preferentially expressed in roots, leaves, or reproductive tissues, potentially reflecting tissue-specific pathogen challenges [3].
Pseudogenization and Functional Loss: Not all NBS genes retain functionality; many represent pseudogenes resulting from frameshift mutations, deletions, or insertions. In cassava, 99 partial NBS genes were identified alongside 228 complete NBS-LRR genes, representing potential pseudogenes [26]. The proportion of pseudogenes varies substantially across lineages, reflecting different evolutionary histories and selection pressures.
Several experimental approaches are employed to validate the function of NBS genes identified through phylogenetic analyses:
Virus-Induced Gene Silencing (VIGS): VIGS has been successfully employed to validate NBS gene function. Silencing of GaNBS (OG2) in resistant cotton demonstrated its putative role in virus tittering, confirming its function in resistance to cotton leaf curl disease [3].
Heterologous Expression: Heterologous expression in model systems provides functional validation. For example, heterologous expression of a maize NBS-LRR gene improved resistance to Pseudomonas syringae in Arabidopsis thaliana [28]. Similarly, overexpression of a soybean TNL gene conferred broad-spectrum resistance to viral pathogens in soybean [28].
Differential Expression Analysis: RNA-seq datasets from infection time courses identify NBS genes responsive to specific pathogens. Analysis of tobacco responses to black shank (Phytophthora nicotianae) and bacterial wilt (Ralstonia solanacearum) identified numerous differentially expressed NBS genes, highlighting potential candidates for functional validation [28].
Analysis of genetic variation in NBS genes between resistant and susceptible genotypes provides evidence for functional importance:
Comparative Genomics: Comparison between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified numerous unique variants in NBS genes, with Mac7 displaying 6,583 variants compared to 5,173 in Coker312 [3]. This differential variation suggests association with resistance phenotypes.
Protein Interaction Studies: Protein-ligand and protein-protein interaction analyses demonstrate strong interaction of putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus, providing mechanistic insights into recognition specificity [3].
Table 2: Essential Research Reagents and Resources for NBS Gene Analysis
| Resource Category | Specific Tools/Databases | Application in NBS Research | Reference |
|---|---|---|---|
| Genome Databases | Phytozome, NCBI Genome, Plaza | Access to genome assemblies and annotations | [3] [26] |
| Domain Databases | PFAM, NCBI CDD, SMART | Identification of NBS, TIR, LRR, CC domains | [28] [26] |
| Software Tools | HMMER v3.1b2, MUSCLE, MEGA11 | Domain search, alignment, phylogenetics | [28] [26] |
| Orthology Analysis | OrthoFinder v2.5.1, DIAMOND | Identification of orthogroups across species | [3] |
| Expression Databases | IPF Database, CottonFGD, NCBI SRA | Tissue-specific and stress-responsive expression | [3] [28] |
| Selection Pressure | KaKs_Calculator 2.0 | Calculation of Ka/Ks ratios | [28] |
Phylogenetic analysis of NBS genes across land plants has revealed complex evolutionary patterns characterized by both deeply conserved subfamilies and lineage-specific expansions. The NBS gene superfamily has evolved through a combination of whole-genome duplications, tandem duplications, and birth-and-death evolution, resulting in substantial variation in gene content across species. Structural diversification in domain architectures has generated specialized immune receptors adapted to recognize diverse pathogen effectors, while conserved NBS domains maintain core signaling functions across lineages.
The genomic organization of NBS genes into clusters facilitates rapid evolution through recombination and diversifying selection, particularly in residues involved in pathogen recognition. Comparative analyses across species boundaries have identified both core orthogroups conserved across angiosperms and lineage-specific innovations reflecting adaptation to distinct pathogen pressures. Functional validation through modern genomic tools has confirmed the role of specific NBS genes in disease resistance, providing potential targets for crop improvement.
Future research directions should include expanded comparative analyses incorporating more diverse plant lineages, particularly non-angiosperm species, to reconstruct the deep evolutionary history of plant immune receptors. Integration of structural biology approaches with phylogenetic analysis will further elucidate the molecular basis of pathogen recognition specificity. The continued development of pangenome resources for crop species and their wild relatives will empower more comprehensive surveys of NBS gene diversity, accelerating the discovery of novel resistance genes for agricultural applications.
The study of gene family evolution, particularly for disease resistance genes in plants, relies on a suite of sophisticated bioinformatics tools. Research on the evolution of Nucleotide-Binding Site (NBS) domain genes—a major class of plant disease resistance genes—exemplifies the powerful synergy between traditional sequence analysis methods and modern orthology inference platforms [3]. These genes are part of the larger NLR (Nucleotide-binding Leucine-Rich Repeat) family and are crucial for plant immune responses against pathogens [3]. Understanding their diversification from basal land plants like bryophytes to higher angiosperms requires comparative genomic analyses across diverse species, a process greatly accelerated by tools such as HMMER, BLAST, and OrthoFinder [3] [32]. This technical guide details the methodologies for identifying and classifying these genes, framing them within a broader evolutionary context and providing actionable experimental protocols for researchers.
Principle and Application: Hidden Markov Models are probabilistic models used for identifying distantly related protein sequences based on conserved domain architecture. In studies of NBS domain gene evolution, HMM searches are the preferred initial step for identifying candidate genes across entire proteomes due to their high sensitivity in detecting conserved protein domains [3].
Experimental Protocol:
PfamScan.pl script or the hmmsearch tool from the HMMER package to scan the proteomes against the Pfam-A.hmm model library.Table 1: Key Resources for HMM-based Gene Identification
| Resource/Tool | Function | Specifications |
|---|---|---|
| Pfam Database | Repository of protein family HMM models | Provides the NB-ARC (PF00931) and other domain models [3]. |
| HMMER Suite | Software for sequence homology searches | Includes hmmsearch for scanning sequences against a profile HMM database. |
| PfamScan Script | Utility for scanning sequences against Pfam HMMs | Often used with default parameters and a customized e-value cutoff [3]. |
Principle and Application: BLAST (Basic Local Alignment Search Tool) and its accelerated alternative DIAMOND use heuristic algorithms to find regions of local similarity between sequences. They are fundamental for tasks requiring rapid, large-scale sequence comparison, such as building input for orthology inference or functional annotation.
Experimental Protocol:
Table 2: Comparison of Sequence Similarity Search Tools
| Tool | Primary Use Case | Speed | Typical E-value Cutoff |
|---|---|---|---|
| BLAST | Standard sequence similarity searches | Standard | 0.001 [33] |
| DIAMOND | Ultra-fast large-scale searches | 20,000x BLAST [34] | 0.001 [33] |
Principle and Application: OrthoFinder is a sophisticated phylogenomics tool that infers orthogroups (sets of genes descended from a single gene in the last common ancestor) and orthologs. It moves beyond simple similarity by incorporating gene tree inference, providing a robust evolutionary framework for comparative studies [34]. It has been benchmarked as one of the most accurate methods for ortholog inference [34].
Experimental Protocol:
orthofinder -f [proteome_directory]). The default workflow uses DIAMOND for all-vs-all sequence searches [3] [34].Orthogroups.tsv: A file listing all orthogroups and their constituent genes.Orthogroups_Genes.tsv: A file listing all orthogroups and their constituent genes.Gene_Duplication_Events.tsv: The location and timing of duplication events on the species tree.
OrthoFinder Phylogenomic Workflow
A comprehensive study on the evolution of NBS domain genes in land plants provides a prime example of these tools working in concert [3]. The research aimed to understand the diversification of these immune receptors across 34 plant species, from mosses to dicots.
PfamScan.pl with the NB-ARC domain HMM (PF00931) and a stringent e-value of 1.1e-50 to identify 12,820 NBS-domain-containing genes [3].This integrated approach demonstrates how HMM searches provide the initial gene set, while OrthoFinder places these genes into an evolutionary context, identifying conserved and lineage-specific elements that have shaped plant immunity.
Table 3: Key Reagent Solutions for Evolutionary Genomics of Gene Families
| Category/Resource | Specific Tool / Database | Function in Research |
|---|---|---|
| Software & Pipelines | OrthoFinder [3] [34] | Phylogenomic orthology inference from protein sequences. |
| HMMER / PfamScan [3] | Identification of genes based on protein domain content. | |
| DIAMOND [3] [34] | Ultra-fast sequence similarity search for large datasets. | |
| PlantTribes2 [35] | A specialized framework for gene family analysis in plants. | |
| Databases | Pfam Database [3] | Curated collection of protein family HMM models. |
| PLAZA [3] [35] | Platform for plant comparative genomics. | |
| NCBI Genome & BioProject [3] | Repository for genomic data and sequencing projects. | |
| Computational Resources | Galaxy Workbench [35] | Web-based platform for accessible, reproducible bioinformatics. |
| High-Performance Computing (HPC) | Essential for running genome-scale analyses in reasonable time. |
The identification of plant resistance (R) genes is crucial for understanding plant immunity and breeding disease-resistant crops. Traditional methods for R-gene identification face challenges due to gene diversity, complex genomic structures, and low sequence homology. This whitepaper presents PRGminer, a deep learning-based tool that revolutionizes the prediction and classification of R-genes. We examine PRGminer's architecture, performance metrics, and practical applications within the broader context of nucleotide-binding site (NBS) domain gene evolution in land plants. The tool achieves exceptional accuracy (95.72-98.75%) in identifying R-genes and classifying them into specific structural categories, providing researchers with an efficient solution for high-throughput R-gene discovery.
Plant resistance genes (R-genes) encode proteins that recognize pathogen effectors and activate robust immune responses through effector-triggered immunity (ETI) [36] [37]. This represents the second layer of plant defense, complementing the initial pathogen-associated molecular pattern (PAMP)-triggered immunity (PTI) [36] [37]. Among R-genes, the nucleotide-binding site leucine-rich repeat (NBS-LRR) family constitutes the largest class, with proteins characterized by modular domains including an N-terminal Toll/interleukin-1 receptor (TIR) or coiled-coil (CC) domain, a central NBS domain, and C-terminal LRR domains [3] [38] [22].
The evolutionary expansion of NBS-encoding genes across land plants reveals remarkable diversification. Recent studies identified 12,820 NBS-domain-containing genes across 34 species from mosses to monocots and dicots, classified into 168 distinct domain architecture patterns [3]. This diversity presents significant challenges for traditional gene annotation methods, which often produce incomplete and fragmented annotations due to the unique genomic structure of R-gene clusters [36] [37].
PRGminer implements a sophisticated two-phase deep learning approach for R-gene identification and classification. The tool addresses critical limitations of alignment-based methods, which often fail with sequences exhibiting low homology [36] [37].
The PRGminer workflow consists of two sequential phases:
Phase I: R-gene Identification
Phase II: R-gene Classification
Table 1: PRGminer Performance Metrics
| Phase | Training/Testing Method | Accuracy | MCC | Independent Testing Accuracy | Independent Testing MCC |
|---|---|---|---|---|---|
| Phase I | k-fold training/testing | 98.75% | 0.98 | 95.72% | 0.91 |
| Phase II | k-fold training/testing | 97.55% | 0.93 | 97.21% | 0.92 |
PRGminer was trained on comprehensive datasets derived from public databases including Phytozome, Ensemble Plants, and NCBI [36] [37]. The training data underwent rigorous processing:
Table 2: PRGminer Training Dataset Composition
| Dataset Component | Sequence Count | Description |
|---|---|---|
| Phase I - R-genes | 18,952 | Sequences with known R-gene domains |
| Phase I - Non-Rgenes | 19,212 | Sequences without R-gene domains |
| Phase II - CNL | 1,883 | Coiled-coil-NBS-LRR sequences |
| Phase II - KIN | 8,591 | Kinase domain sequences |
| Phase II - Other classes | 8,478 | RLP, LECRK, RLK, LYK, TIR, TNL |
For feature representation, dipeptide composition demonstrated superior performance with optimized computational pipelines processing large protein sequence datasets in approximately two minutes [39].
Understanding PRGminer's significance requires examining the evolutionary landscape of NBS genes across plant species. Comparative genomic analyses reveal patterns of gene duplication, diversification, and loss that have shaped plant immune systems over millions of years.
NBS genes exhibit remarkable evolutionary dynamics across land plants:
Table 3: NBS-LRR Gene Distribution Across Selected Plant Species
| Plant Species | Total NBS Genes | CNL-type | TNL-type | Unique Features |
|---|---|---|---|---|
| Vernicia montana | 149 | 98 (65.8%) | 12 (8.1%) | 2 genes with both CC and TIR domains |
| Vernicia fordii | 90 | 49 (54.4%) | 0 | Complete absence of TIR domains |
| Capsicum annuum | 252 | 48 (19.0%) | 4 (1.6%) | 200 genes lack both CC and TIR domains |
| Dendrobium officinale | 74 | 10 (13.5%) | 0 | Representative of monocot TNL absence |
| Arabidopsis thaliana | 210 | 40 (19.0%) | Not specified | Reference eudicot genome |
NBS-LRR genes display distinctive genomic architectures that influence their evolution:
Beyond computational prediction, experimental validation remains essential for confirming R-gene function. Several methodologies have proven effective for characterizing NBS-LRR genes.
VIGS has emerged as a powerful technique for functional characterization of R-genes:
Transcriptomic analyses reveal R-gene expression patterns in response to biotic and abiotic stresses:
Table 4: Key Research Reagents and Resources for R-gene Studies
| Resource | Function/Application | Specific Examples |
|---|---|---|
| PRGminer | Deep learning-based R-gene prediction and classification | Webserver (https://kaabil.net/prgminer/) and standalone tool [36] [39] |
| VIGS Systems | Functional validation of candidate R-genes | Tobacco rattle virus-based vectors for rapid gene silencing [3] [38] |
| HMMER Software | Domain identification and sequence annotation | PfamScan with NB-ARC domain models (e-value 1.1e-50) [3] [38] |
| OrthoFinder | Evolutionary analysis and orthogroup identification | DIAMOND for sequence similarity, MCL for clustering [3] |
| RNA-seq Databases | Expression profiling under various conditions | IPF database, Cotton Functional Genomics Database, NCBI BioProjects [3] |
R-gene mediated immunity involves complex signaling pathways that translate pathogen recognition into defense responses. The following diagram illustrates key pathways in plant immunity, particularly focusing on NBS-LRR gene function:
The NBS-LRR proteins function as intracellular immune receptors that recognize pathogen effectors directly or indirectly through guard mechanisms [38] [22]. Key aspects include:
The following diagram illustrates an integrated experimental-computational workflow for R-gene identification and validation:
PRGminer and similar approaches have enabled significant advances in crop improvement programs:
PRGminer represents a significant advancement in R-gene prediction, leveraging deep learning to overcome limitations of traditional alignment-based methods. Its high accuracy (>95%) in both identification and classification phases demonstrates the power of computational approaches for decoding plant immune systems. When integrated with experimental validation techniques like VIGS and expression analysis, PRGminer provides researchers with a comprehensive toolkit for accelerating R-gene discovery.
The evolution of NBS domain genes across land plants reveals a dynamic history of gene family expansion, diversification, and specialization. Understanding these evolutionary patterns provides essential context for interpreting PRGminer predictions and guiding targeted breeding strategies. As plant genomics continues to advance, deep learning approaches like PRGminer will play increasingly important roles in bridging genomic information and practical crop improvement, ultimately contributing to enhanced food security and sustainable agricultural practices.
Plant immunity relies on a sophisticated surveillance system where Nucleotide-Binding Site (NBS) domain genes play a pivotal role. These genes, often constituting one of the largest resistance (R) gene families, encode intracellular receptors that mediate effector-triggered immunity (ETI), a robust defense layer activated upon pathogen recognition [41]. The NBS domain forms the core of these proteins, functioning as a molecular switch by binding and hydrolyzing ATP to activate downstream immune signaling [41]. The typical structure of these proteins, frequently referred to as NBS-LRR or NLR proteins, includes a conserved NBS domain coupled with C-terminal leucine-rich repeats (LRR) and variable N-terminal domains such as Toll/Interleukin-1 receptor (TIR) or coiled-coil (CC), leading to their classification as TNL or CNL subfamilies, respectively [22].
The evolution of NBS genes across land plants showcases remarkable diversification and adaptation. From the relatively small repertoires in ancestral lineages like bryophytes to the expansive families in flowering plants, NBS genes have undergone significant expansion, primarily through mechanisms like tandem duplications and whole-genome duplications [3]. This evolutionary trajectory has resulted in substantial structural and functional diversity, encompassing both classical domain architectures (e.g., NBS, NBS-LRR, TIR-NBS) and numerous species-specific patterns, reflecting ongoing adaptive evolution to diverse pathogen pressures [3]. Placing expression profiling within this evolutionary context is crucial for understanding how these genes confer resistance and how their regulation has been fine-tuned across different plant lineages.
A robust transcriptomic analysis of NBS genes under stress requires careful experimental design, spanning from plant material selection to computational processing. The workflow can be broadly divided into wet-lab procedures for generating gene expression data and computational methods for its analysis, often performed in an integrated manner.
The foundation of a reliable expression study lies in the selection of appropriate plant material, often including genotypes with contrasting resistance phenotypes. For instance, studies on cotton leaf curl disease (CLCuD) have utilized susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions, enabling the identification of genetic variations and expression differences associated with resistance [3]. Similarly, research on Dalbergia sissoo involved screening two hundred plants and selecting resistant individuals after inoculation with the fungal pathogen Ceratocystis dalbergiae to study dieback disease resistance [42].
Stress treatments should be designed to mimic natural pathogen encounters or environmental challenges. For biotic stress, this can involve inoculation with pathogens such as bacteria (Pseudomonas syringae), fungi (Fusarium graminearum), or viruses (Begomoviruses) [3]. For abiotic stress, treatments may include dehydration, cold, drought, heat, osmotic stress, salt, or wounding [3]. Tissue collection should be performed at multiple time points post-stress application to capture both early and late responsive genes.
High-quality RNA extraction is a critical step. For tissues like roots or leaves, the CTAB (cetyltrimethylammonium bromide) method is widely used, often with modifications to optimize yield and purity [43]. The extracted RNA must be treated with DNase to remove genomic DNA contamination, and its quality and quantity should be assessed using spectrophotometry (e.g., NanoDrop), fluorometry (e.g., Qubit), and integrity analysis (e.g., TapeStation or agarose gel electrophoresis) [43].
For library preparation, the Illumina TruSeq Stranded Total RNA Library Prep Plant Kit is a common choice, which utilizes RiboZero beads to deplete ribosomal RNA, enriching for mRNA and other non-ribosomal transcripts [43]. The prepared libraries are then sequenced on high-throughput platforms like Illumina to generate short-read data (e.g., 150 bp paired-end reads), providing the raw data for subsequent transcriptome assembly and expression quantification.
For non-model organisms without a reference genome, a de novo transcriptome assembly approach is necessary. However, leveraging genomic data from phylogenetically close species is a valuable strategy. For example, in a study on Euterpe edulis, the Elaeis guineensis (oil palm) genome was used as a reference for mapping due to their phylogenetic proximity and the high quality of the oil palm genome assembly [43]. This cross-species mapping strategy enhances the precision of gene annotation and identification of conserved genes.
The subsequent workflow for transcriptomic analysis, from raw data to biological insight, involves several key processing stages which can be visualized as follows:
Figure 1: Transcriptomic Data Analysis Workflow. This flowchart outlines the key computational steps for processing RNA-seq data to identify differentially expressed NBS genes.
Expression levels are typically quantified as Fragments Per Kilobase of transcript per Million mapped reads (FPKM), which normalizes for both gene length and sequencing depth, allowing for cross-sample comparisons [3]. These FPKM values can be retrieved from public databases like the IPF database, Cotton Functional Genomics Database (CottonFGD), and Cottongen, or calculated from raw sequencing data using pipelines as mentioned by Zahra et al. [3]. Differential expression analysis is then performed to identify genes with statistically significant expression changes between stress conditions and controls, or between resistant and susceptible genotypes.
Before expression profiling, a comprehensive identification of NBS genes within the studied species is essential. The PfamScan.pl HMM search script is commonly employed with the Pfam-A_hmm model to screen for genes containing the NB-ARC domain (Pfam accession: PF00931), using a default e-value cutoff (e.g., 1.1e-50) [3] [44]. Additional associated domains (e.g., TIR, CC, LRR) are identified to determine domain architecture, allowing for the classification of NBS genes into classes such as CNL, TNL, NL (NBS-LRR only), and other atypical types [3] [22].
Orthogroup (OG) analysis using tools like OrthoFinder provides a deep evolutionary context. It clusters NBS genes from multiple species into orthogroups based on sequence similarity, helping identify core orthogroups (common across species) and unique ones (species-specific) [3]. This phylogenetic framework is crucial for understanding the conservation and divergence of NBS gene expression patterns.
Once NBS genes are identified and expression values are obtained, researchers categorize expression data into tissue-specific, abiotic stress-specific, and biotic stress-specific profiles [3]. Heat maps are then generated to visualize the expression patterns of NBS genes, particularly focusing on core orthogroups across different tissues and stress conditions. For example, in cotton, expression profiling revealed the putative upregulation of orthogroups OG2, OG6, and OG15 in various tissues under biotic and abiotic stresses in both susceptible and tolerant plants [3].
The analysis often extends to promoter analysis, identifying cis-acting elements related to plant hormones (e.g., salicylic acid, methyl jasmonate) and abiotic stress in the upstream regions of NBS genes, which provides mechanistic insights into their regulation [41] [44]. Furthermore, co-expression analysis can link the expression of specific NBS genes with secondary metabolism pathways, suggesting a broader role in defense mechanisms beyond pathogen recognition [41].
Transcriptomic analysis identifies candidate genes, but their functional validation is a critical next step. Virus-Induced Gene Silencing (VIGS) is a powerful reverse-genetics approach to transiently knock down the expression of a candidate NBS gene in a resistant plant. For instance, silencing of GaNBS (from OG2) in resistant cotton demonstrated its putative role in reducing the viral titer of cotton leaf curl disease [3]. This functional link between gene expression and disease outcome solidifies the importance of the candidate gene.
Additional validation methods include quantitative real-time PCR (qPCR) to confirm the expression patterns of selected NBS genes observed in the RNA-seq data under specific stress conditions, such as salt stress [44]. Moreover, in silico analyses like protein-ligand and protein-protein interaction modeling can provide insights into molecular functions, such as the strong interaction of certain NBS proteins with ADP/ATP and viral proteins [3].
Successful execution of NBS gene expression profiling relies on a suite of specific reagents, tools, and databases. The following table summarizes key resources used in the featured methodologies.
Table 1: Research Reagent Solutions for NBS Gene Transcriptomic Analysis
| Item Name | Function/Application | Specific Example/Usage |
|---|---|---|
| CTAB Extraction Buffer | RNA extraction from plant tissues, particularly recalcitrant tissues like roots. | Used in RNA extraction from Euterpe edulis seedlings [43]. |
| Illumina TruSeq Stranded Total RNA Library Prep Plant Kit | Preparation of strand-specific RNA-seq libraries with ribosomal RNA depletion. | Library preparation for transcriptome sequencing [43]. |
| PfamScan & HMMER3 | Identification of conserved protein domains (e.g., NB-ARC PF00931) in candidate genes. | Screening for NBS-domain-containing genes [3] [36]. |
| OrthoFinder | Clustering of genes into orthogroups across multiple species for evolutionary analysis. | Identifying core and unique orthogroups of NBS genes [3]. |
| Degenerate Oligonucleotide Primers | Amplification of diverse NBS-LRR gene family members from transcriptomes when genomic data is lacking. | Probing the Dalbergia sissoo transcriptome under dieback stress [42]. |
| VIGS Vectors | Functional validation through transient gene silencing in plants. | Silencing GaNBS to confirm its role in virus resistance [3]. |
The diversity of NBS domain genes, a result of their complex evolution, can be systematically classified based on their domain composition. The following diagram illustrates the primary structural classes and their relationships.
Figure 2: Classification of Plant NBS Domain Genes. This chart outlines the classification of NBS genes into typical and atypical NLRs based on the presence of complete N-terminal and LRR domains, with further subdivision into specific subfamilies like CNL, TNL, and RNL.
Expression profiling and transcriptomic analysis have proven indispensable for unraveling the complex roles and evolutionary dynamics of NBS genes in plant stress responses. The integrated methodology—combining high-throughput RNA sequencing, sophisticated bioinformatic analyses for identification and orthogroup clustering, and functional validation through techniques like VIGS—provides a powerful framework to link gene expression with biological function. This approach has illuminated the diverse expression patterns of NBS genes across tissues and stresses, identified key regulatory candidates for breeding, and revealed species-specific evolutionary paths, such as the marked reduction of TNL genes in certain lineages like Salvia miltiorrhiza and monocots [41] [45]. As genomic resources continue to expand for non-model plants, the application of these standardized protocols will further deepen our understanding of how this critical gene family has evolved to underpin plant adaptation and immunity, ultimately informing strategies for developing more resilient crops.
Bulked Segregant RNA-Seq (BSR-Seq) represents a powerful fusion of traditional genetics and modern high-throughput sequencing, enabling researchers to rapidly pinpoint genetic loci controlling traits of interest. This method combines the principles of bulked segregant analysis with the analytical power of RNA sequencing, facilitating simultaneous mapping and candidate gene identification. Within the broader context of plant evolutionary biology, BSR-Seq has proven particularly valuable for studying the evolution of complex gene families, such as the nucleotide-binding site (NBS)-encoding genes that form the backbone of plant innate immunity. This technical guide examines BSR-Seq methodologies, applications, and its growing role in elucidating the evolutionary dynamics of disease resistance genes across land plants.
Plant resistance to pathogens is often governed by a sophisticated surveillance system based on nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, which recognize pathogen effectors and initiate defense responses [4]. The NBS gene family exhibits remarkable diversity across land plants, with significant differences in gene number, organization, and subclass distribution between bryophytes and vascular plants [32], as well as between monocots and dicots [4]. This diversity results from continuous evolutionary arms races between plants and their pathogens, driving rapid diversification of resistance genes through mechanisms such as tandem duplication, ectopic recombination, and positive selection [4].
Traditional methods for mapping resistance loci, including quantitative trait locus (QTL) mapping and positional cloning, are often low-throughput and time-consuming [46]. BSR-Seq emerged as an efficient alternative that accelerates gene identification by combining bulked segregant analysis with RNA sequencing [47] [48]. This approach enables researchers to simultaneously identify genetic markers linked to traits of interest and analyze global gene expression patterns, providing valuable insights into both the genetic location and potential function of candidate genes [48].
NBS-LRR genes represent one of the largest and most variable gene families in plant genomes, with significant implications for plant adaptation and evolution. Comparative genomic analyses reveal substantial variation in NBS-LRR gene numbers across species, ranging from approximately 50 in papaya (Carica papaya) to 653 in rice (Oryza sativa spp. indica) [4]. This expansion results from both whole-genome duplication events and small-scale duplications, including tandem and segmental duplications [3].
Plant genomes typically organize NBS-LRR genes in clusters, which facilitates rapid evolution of new pathogen specificities through mechanisms such as tandem duplication and ectopic recombination [4]. The NBS domain itself contains several highly conserved motifs, including the P-loop and kinase-2 domains, while the LRR regions evolve rapidly under diversifying selection, particularly at solvent-exposed residues involved in pathogen recognition [4].
Table 1: NBS-LRR Gene Family Size in Various Plant Species
| Plant Species | Total NBS-LRR Genes | TNL Genes | CNL Genes | Reference |
|---|---|---|---|---|
| Arabidopsis thaliana | 149-159 | 94-98 | 50-55 | [4] |
| Oryza sativa spp. japonica | 553 | - | - | [4] |
| Glycine max (soybean) | 319 | - | - | [4] |
| Medicago truncatula | 333 | 156 | 177 | [4] |
| Brachypodium distachyon | 126 | 0 | 113 | [4] |
A striking evolutionary pattern in NBS-LRR gene distribution is the near-total absence of TIR-NBS-LRR (TNL) genes in monocotyledons, while they are present—and often abundant—in dicotyledons [4]. For example, the Brachypodium distachyon genome contains 126 NBS-LRR genes, all belonging to the CC-NBS-LRR (CNL) subclass, with no TNL representatives [4]. In contrast, dicot species like Arabidopsis thaliana and soybean (Glycine max) contain both subclasses, sometimes with TNL genes outnumbering CNL genes [4].
Recent pangenome analyses of bryophytes have revealed they possess a substantially greater diversity of gene families than vascular plants, including higher numbers of unique and lineage-specific gene families [32]. This rich genetic toolkit, which includes novel immune receptors, likely contributed to their successful colonization of diverse habitats despite their structural simplicity.
BSR-Seq integrates bulk segregant analysis with transcriptome sequencing to identify genomic regions associated with specific phenotypes. The methodology involves creating two bulked RNA samples from segregating populations exhibiting contrasting phenotypes, followed by high-throughput sequencing and computational analysis to detect regions with significant allele frequency differences between bulks [48] [46].
The fundamental principle underlying BSR-Seq is that genetic markers completely linked to a causal gene will show significant differences in allele frequency between bulks, while unlinked markers will segregate randomly [46]. In practice, for a SNP completely linked to a recessive mutant, only one allele will be present in the mutant pool, while both alleles will be present in the non-mutant pool [48].
The BSR-Seq workflow begins with developing a suitable segregating population, typically an F2 generation derived from crossing parents with contrasting phenotypes for the trait of interest [48] [49]. For example, in a study mapping root lodging resistance in maize, researchers crossed the lodging-resistant line CIMBL145 with the susceptible line CIMBL74 to generate an F2 population [50].
From the segregating population, individuals with extreme phenotypes are selected and divided into two pools. In the maize root lodging study, researchers selected 30 non-lodging plants and 30 complete-lodging plants from the F2 population to create resistant and susceptible bulks, respectively [50]. Similarly, in a soybean study investigating multifoliolate leaves, researchers selected 30 recombinant inbred lines with the highest multifoliolate frequencies and 30 with the lowest frequencies to create contrasting bulks [49].
RNA is extracted from each bulk using standard protocols, such as Trizol reagent [50] or commercial RNA extraction kits [51]. The extracted RNA is then used to prepare sequencing libraries, which are sequenced using an appropriate platform (e.g., Illumina) [50] [48].
Sequencing depth is a critical consideration in BSR-Seq experimental design. In the original BSR-Seq study mapping the maize glossy3 gene, researchers obtained more than 13 million 75-bp single-end reads per bulk using one lane of an Illumina GAIIx flowcell [48]. Adequate sequencing depth ensures sufficient coverage for both SNP discovery and expression analysis.
The bioinformatics workflow for BSR-Seq involves multiple steps, including read alignment, variant calling, and association analysis:
Read Quality Control and Alignment: Sequencing reads are first processed to remove low-quality regions and adapter sequences [50]. The clean reads are then aligned to a reference genome using splice-aware alignment tools such as GSNAP [48], HISAT2 [50], or BWA [46].
Variant Calling: Single nucleotide polymorphisms (SNPs) are identified using variant callers such as GATK [50]. In the maize glossy3 study, this approach identified more than 64,000 high-confidence SNPs [48].
Association Analysis: Statistical methods are applied to identify SNPs showing significant allele frequency differences between bulks. The original BSR-Seq publication used an empirical Bayesian approach to estimate the probability of each SNP being in complete linkage disequilibrium with the causal gene [48]. Alternatively, the QTLseqr package can be used to identify associated genomic regions [50].
Differential Expression Analysis: RNA-Seq data also enable identification of differentially expressed genes between bulks, providing functional insights into candidate genes [48]. In the glossy3 study, 1,095 genes were differentially expressed between mutant and non-mutant pools [48].
Table 2: Key Bioinformatics Tools for BSR-Seq Analysis
| Analysis Step | Tools | Function | Reference |
|---|---|---|---|
| Read Alignment | HISAT2, GSNAP, BWA, Bowtie2 | Map sequencing reads to reference genome | [50] [48] [46] |
| Variant Calling | GATK, FreeBayes | Identify SNPs and indels from aligned reads | [50] [46] |
| Association Analysis | QTLseqr, Bayesian Methods | Detect SNPs with allele frequency differences between bulks | [50] [48] |
| Differential Expression | edgeR, DESeq2 | Identify differentially expressed genes between bulks | [48] |
BSR-Seq has been successfully applied to map resistance loci in crop species. In a 2025 study on maize root lodging resistance, researchers used BSR-Seq to identify eight QTLs associated with root architecture and lodging resistance [50]. From the F2 population of 580 plants derived from a cross between lodging-resistant (CIMBL145) and lodging-susceptible (CIMBL74) lines, they created resistant and susceptible bulks by pooling roots from 30 non-lodging and 30 complete-lodging plants, respectively [50].
The BSR-Seq analysis identified four major QTLs (qRLR1, qRLR4, qRLR5, and qRLR6), which were subsequently validated through chromosomal region-based association study (CRAS) and linkage mapping [50]. Within these QTL regions, researchers identified 306 candidate genes, including root development- and cell wall-related genes. Further association and haplotype analysis pinpointed ZmNRT5, encoding a nitrate transporter, as a strong candidate gene [50]. Expression analysis revealed significantly lower ZmNRT5 expression in the susceptible parent (CIMBL74) compared to the resistant parent (CIMBL145), supporting its role in root lodging resistance [50].
In a 2024 study, researchers integrated traditional QTL mapping with BSR-Seq to identify genetic loci controlling the multifoliolate leaf phenotype in soybean [49]. From a recombinant inbred line population of 407 lines, they selected 30 lines with the highest multifoliolate frequencies and 30 with the lowest frequencies to create contrasting bulks for BSR-Seq analysis [49].
The combined approach identified ten QTLs associated with the multifoliolate trait, including a major QTL (qMF-2-1) on chromosome 2 that explained more than 10% of the phenotypic variation [49]. BSR-Seq analysis revealed two candidate genes within the associated regions: Glyma.06G204300 encoding the transcription factor TCP5, and Glyma.06G204400 encoding LONGIFOLIA 2 (LNG2) [49]. Transcriptome analysis further indicated that stress-responsive genes were differentially expressed between high- and low-multifoliolate lines, suggesting potential interplay between genetic and environmental factors in regulating this trait [49].
Table 3: Essential Research Reagents for BSR-Seq Experiments
| Reagent/Resource | Function | Examples/Specifications |
|---|---|---|
| Segregating Population | Genetic mapping resource | F2, RILs, or other segregating populations from parents with contrasting phenotypes |
| RNA Extraction Kit | Isolation of high-quality RNA | Trizol reagent or commercial kits (e.g., Aidlab RNA extraction kit) |
| cDNA Synthesis Kit | Reverse transcription of RNA | PrimeScript RT reagent kit with gDNA Eraser |
| Sequencing Library Prep Kit | Preparation of sequencing libraries | Illumina-compatible library preparation kits |
| Reference Genome | Read alignment and variant calling | Species-specific reference genome assembly |
| Alignment Software | Mapping reads to reference | HISAT2, GSNAP, BWA, Bowtie2 |
| Variant Caller | SNP and indel identification | GATK, FreeBayes |
| QTL Analysis Tool | Association mapping | QTLseqr, Bayesian approaches |
BSR-Seq provides a powerful tool for studying the evolution of NBS domain genes by enabling rapid identification of resistance loci and their associated gene families. The approach has been particularly valuable for investigating how NBS-LRR genes evolve in response to pathogen pressure and how different evolutionary mechanisms shape resistance gene repertoires across plant lineages.
Recent studies have revealed that bryophytes possess a substantially larger diversity of gene families compared to vascular plants, including unique immune receptors that may contribute to their ecological success [32]. BSR-Seq can help functionally characterize these lineage-specific genes and elucidate their roles in plant immunity and adaptation.
Furthermore, BSR-Seq facilitates comparative evolutionary analyses by enabling efficient mapping of resistance loci across multiple species. For example, the identification of ZmNRT5 as a candidate for root lodging resistance in maize [50] and TCP5/LNG2 as regulators of leaf development in soybean [49] demonstrates how BSR-Seq can reveal both conserved and lineage-specific genetic mechanisms underlying plant traits.
BSR-Seq has emerged as a powerful methodology that effectively bridges sequence and function in genetic studies. By combining the mapping power of bulk segregant analysis with the comprehensive data generation of RNA sequencing, this approach accelerates the identification of candidate genes controlling important traits, including disease resistance. In the broader context of plant evolutionary biology, BSR-Seq provides valuable insights into the dynamics of gene family evolution, particularly for rapidly evolving systems like the NBS-LRR genes that govern plant-pathogen interactions. As sequencing technologies continue to advance and become more accessible, BSR-Seq is poised to play an increasingly important role in both basic research and crop improvement programs.
Plants are in a constant evolutionary arms race with a wide array of pathogens, leading to significant yield losses that threaten global food security. It is estimated that plant diseases and pest infestations cause a 20–30% annual reduction in global crop yields [52]. To combat this threat, plants have evolved a sophisticated two-layered immune system. The second layer, known as effector-triggered immunity (ETI), is primarily mediated by a large class of resistance (R) proteins encoded by nucleotide-binding site leucine-rich repeat (NBS-LRR or NLR) genes [41]. These intracellular immune receptors recognize specific pathogen-secreted effector proteins, triggering robust defense responses that often include a hypersensitive response and programmed cell death to restrict pathogen spread [41] [52].
The NBS domain, a conserved feature across this gene family, functions as a molecular switch by binding and hydrolyzing ATP, which is essential for activating downstream immune signaling [41]. The remarkable diversity of NBS-LRR genes, coupled with their ability to evolve rapidly, makes them a cornerstone of plant immunity and an invaluable resource for crop improvement programs. This technical guide explores the applications of NBS-LRR gene research in developing disease-resistant cultivars, framed within the broader context of land plant evolution.
The NBS-LRR gene family exhibits extraordinary evolutionary dynamics across the plant kingdom. Comparative genomic analyses reveal significant variation in the composition and expansion of NBS-LRR subfamilies among different plant lineages, reflecting their distinct evolutionary paths and host-pathogen co-evolution histories.
Recent studies on Salvia miltiorrhiza, an important medicinal plant, revealed a striking reduction in specific NBS-LRR subfamilies. Among 196 NBS-LRR genes identified, only 62 possessed complete N-terminal and LRR domains, with a notable reduction in TNL (TIR-NBS-LRR) and RNL (RPW8-NBS-LRR) subfamily members [41]. This pattern extends across the Salvia genus, with comparative analysis of five Salvia species (S. miltiorrhiza, S. bowleyana, S. divinorum, S. hispanica, and S. splendens) showing a complete absence of TNL subfamily members and only one or two RNL copies in each species—far fewer than observed in other angiosperms like Arabidopsis thaliana, Nicotiana tabacum, and Vitis vinifera [41].
This pattern of subfamily expansion and contraction is observed across plant lineages. Gymnosperms such as Pinus taeda have experienced significant expansion of the TNL subfamily, which comprises 89.3% of their typical NBS-LRRs. In contrast, monocotyledonous species including Oryza sativa (rice), Triticum aestivum (wheat), and Zea mays (corn) have completely lost the TNL and RNL subfamilies through evolution [41]. These findings highlight the fluid nature of the NLR immune repertoire across plant evolution.
Table 1: NBS-LRR Gene Family Composition Across Plant Species
| Plant Species | Total NLRs | CNL | TNL | RNL | Other/Partial | Reference |
|---|---|---|---|---|---|---|
| Salvia miltiorrhiza | 196 | 61 | 0 | 1 | 134 | [41] |
| Nicotiana benthamiana | 156 | 25 | 5 | 4* | 122 | [53] |
| Triticum aestivum (Wheat) | ~460 | Predominant | 0 | 0 | Not specified | [52] |
| Arabidopsis thaliana | 207 | Not specified | Not specified | Not specified | Not specified | [41] |
| Oryza sativa (Rice) | 505 | Predominant | 0 | 0 | Not specified | [41] |
| 100 Plant Species (PlantNLRatlas) | 68,452 | 3,689 (full-length) | - | - | 64,763 (partial-length) | [54] |
Note: The 4 RNL-type proteins in N. benthamiana contain RPW8 domains but are distributed among N, CN, and NL subfamilies [53].
NBS-LRR proteins are classified based on their domain architecture into typical and atypical types. Typical NBS-LRR proteins contain three principal domains: an N-terminal domain, a central NBS domain, and a C-terminal LRR domain [41] [55]. The N-terminal domain determines classification into three major subfamilies:
Atypical NBS-LRR proteins lack complete domains and are further categorized based on specific domain deletions into subtypes such as N (NBS only), TN (TIR-NBS), CN (CC-NBS), and NL (NBS-LRR) [41]. Both CNL and TNL proteins serve as intracellular receptors in ETI, while RNL proteins act as helper nodes for signaling transduction [41] [53].
The identification and characterization of NBS-LRR genes have been revolutionized by computational biology approaches. Traditional methods relied on domain-based bioinformatics pipelines using tools like InterProScan, HMMER, and MEME to scan genomes for conserved domains and motifs [52]. However, recent advances in machine learning and deep learning have significantly enhanced prediction accuracy.
PRGminer represents a cutting-edge deep learning-based tool specifically designed for accurate prediction of resistance proteins. This tool operates in two phases: Phase I predicts input protein sequences as R-genes or non-R-genes, while Phase II classifies the predicted R-genes into eight different classes [36]. PRGminer achieves remarkable accuracy rates of 95.72% on independent testing in Phase I and 97.21% in Phase II, with Matthews correlation coefficient values of 0.91 and 0.92, respectively [36]. This demonstrates superior performance compared to traditional alignment-based methods, particularly for sequences with low homology.
Table 2: Computational Tools for NBS-LRR Gene Identification
| Tool Name | Approach | Input Data | Key Features | Reference |
|---|---|---|---|---|
| PRGminer | Deep Learning | Protein sequences | Two-phase classification; 95.72% accuracy | [36] |
| NLRtracker | Domain-based | Protein sequences | High sensitivity for plant proteomes | [55] |
| NLR-Annotator | Domain-based | Nucleotide sequences | Suitable for non-Linux users | [55] |
| PlantNLRatlas | Database | Pre-annotated genomes | 68,452 NLRs from 100 plant species | [54] |
| RefPlantNLR | Database | Experimentally validated | Curated collection of confirmed NLRs | [54] |
Large-scale datasets have been developed to support comparative investigations of NLRs across diverse plant taxa. The PlantNLRatlas represents one of the most comprehensive resources, containing 68,452 full-length and partial-length NLR genes identified across 100 high-quality plant genomes [56] [54]. This dataset includes 83 eudicots, 10 monocots, and 7 other plants representing 31 orders and 48 families, with an average of 685 NLRs per genome [54]. The extreme variation in NLR numbers between species—from 28 in coriander (Coriandrum sativum) to 3,428 in alfalfa (Medicago sativa)—highlights the diverse evolutionary paths of plant immune systems [54].
A step-by-step computational protocol has been developed for identifying evolutionarily conserved motifs in plant NLR proteins, which is essential for understanding their molecular functions [55]. This pipeline can be applied to identify molecular signatures that have remained conserved in the gene family over evolutionary time across plant species.
Figure 1: Workflow for Phylogenomic Analysis of Plant NLR Immune Receptors
The key steps in this protocol include:
Data Acquisition: Download protein sequences from reference genome databases. As a test dataset, proteomes from six representative plant species (Arabidopsis thaliana, Beta vulgaris, Solanum lycopersicum, Nicotiana benthamiana, Oryza sativa, and Hordeum vulgare) can be utilized [55].
NLR Annotation: Annotate NLRs from input protein sequence files using NLRtracker with the command: ./NLRtracker -s NLRtracker_input_protein.fasta -o NLRtracker_output [55]. NLRtracker demonstrates higher sensitivity compared to alternative tools and can detect functionally validated NLRs that might otherwise be missed.
Domain Sequence Parsing: Based on InterProScan results, parse domain sequences from corresponding protein sequences of each identified NB-LRR gene. For genes containing multiple domains, splice them in the order they appear in the gene sequence.
Multiple Sequence Alignment: Perform alignment using Clustal Omega with default parameters [55].
Phylogenetic Tree Construction: Build phylogenetic trees using FastTree with the parameter -lg [55].
Motif Prediction: Identify conserved sequence motifs using the MEME Suite, which can be implemented either through local installation or the online web server [55].
This pipeline has successfully identified conserved sequence motifs such as the MADA and EDVID motifs within the CC-NLR subfamily, providing insights into functionally important regions [55].
The identification of Ym1, a wheat CC-NBS-LRR protein that confers resistance to wheat yellow mosaic virus (WYMV), exemplifies a sophisticated gene cloning approach [57]. The experimental workflow involved:
Figure 2: Map-Based Cloning Workflow for Wheat Ym1 Gene
Genetic Population Development: Create a double hybrid F1 cross using Yining Xiaomai (YNXM, the donor of Ym1), 2011I-78 (WYMV susceptible), and Chinese Spring ph1b mutant (WYMV susceptible) [57]. The ph1b mutation promotes homoeologous recombination, overcoming the challenge of recombination suppression between alien introgression fragments and their wheat counterparts.
Fine-Mapping: Identify plants heterozygous for Ym1 and homozygous for the ph1b mutant gene using diagnostic markers. Genotype 326 BC1F2 individuals with flanking markers InDelM41 and InDelM412 to screen for recombinants [57].
Candidate Gene Identification: Narrow the Ym1 locus to a physical interval flanked by markers 2ESTK2 and InDel_FA192, corresponding to a 5.6 Mbp region containing 65-73 annotated genes [57]. Use reciprocal BLAST sequence alignment to compare annotated genes in resistant and susceptible varieties.
Functional Validation: Validate the candidate gene through knockdown/knockout experiments that compromise WYMV resistance, and overexpression studies that enhance WYMV resistance in wheat [57].
This comprehensive approach led to the successful isolation of Ym1, which recognizes the WYMV coat protein and activates resistance by triggering hypersensitive responses [57].
Table 3: Essential Research Reagents for NBS-LRR Gene Characterization
| Reagent/Resource | Specification | Application | Example/Reference |
|---|---|---|---|
| Plant Materials | Resistant and susceptible cultivars; Mapping populations | Genetic analysis and phenotyping | BR 18-Terena wheat [58] |
| Pathogen Isolates | Characterized strains with known effectors | Disease assays and Avr-R interaction studies | MoT isolate BR32 [58] |
| Genotyping Markers | SSR, InDel, CAPS markers derived from SNPs | Genetic mapping and recombinant screening | MAT, MGM301, 1338.1.2, 1106.3.1 [59] |
| Cloning Vectors | pBluescript II SK(+), pSH75 (hygromycin resistance) | Gene cloning and transformation | [59] |
| HMM Profiles | NB-ARC (PF00931) from Pfam database | Initial identification of NBS domains | [53] |
| Software Tools | NLRtracker, MEME Suite, InterProScan | Bioinformatics analysis of NLR genes | [55] |
| Database Resources | PlantNLRatlas, RefPlantNLR | Comparative genomics and reference data | 68,452 NLRs from 100 plants [54] |
Wheat blast, caused by the fungus Magnaporthe oryzae Triticum (MoT) pathotype, is a devastating disease that has spread from South America to Bangladesh and India, posing a global threat to wheat production [59] [58]. Genetic resistance has been identified in several wheat genes, including Rmg2, Rmg3, Rmg7, and Rmg8, with Rmg7 and Rmg8 providing resistance at both seedling and heading stages [58].
Notably, Rmg8 (on chromosome 2B in hexaploid wheat) and Rmg7 (on chromosome 2A in tetraploid wheat) both recognize the same avirulence gene AVR-Rmg8, suggesting these resistance genes are equivalent from a breeding perspective [59]. The corresponding avirulence gene AVR-Rmg8 was isolated from a wheat blast isolate through a map-based strategy, encoding a small protein containing a putative signal peptide [59].
The Brazilian wheat cultivar BR 18-Terena represents an important source of quantitative resistance to wheat blast, with genetic analysis revealing nine quantitative trait loci (QTL) associated with resistance at either seedling or heading stages [58]. This resistance is largely tissue-specific, with different QTL providing protection at different developmental stages, highlighting the complexity of breeding for comprehensive disease resistance [58].
The recent cloning of Ym1 represents a significant advancement in controlling wheat yellow mosaic virus (WYMV), a soil-borne disease that threatens over 2.2 million square hectometers of Chinese wheat growing areas and causes 30% to 70% yield reduction [57].
Ym1 encodes a typical CC-NBS-LRR type R protein that is specifically expressed in roots and induced upon WYMV infection [57]. The resistance mechanism involves Ym1 recognizing and interacting with the WYMV coat protein (CP), which leads to nucleocytoplasmic redistribution—a process that transitions Ym1 from an auto-inhibited to an activated state [57]. This activation subsequently elicits hypersensitive responses and establishes WYMV resistance by likely blocking viral transmission from the root cortex into steles, thereby preventing systemic movement to aerial tissues [57].
Ym1 has been introgressed from the sub-genome Xn or Xc of polyploid Aegilops species into common wheat, demonstrating the value of harnessing wild relatives for crop improvement [57]. This gene is the most widely utilized source for WYMV resistance control in worldwide wheat breeding programs.
The identification and functional characterization of NBS-LRR genes have direct applications in marker-assisted selection (MAS) and genetic engineering for disease-resistant crop varieties. The findings from BR 18-Terena have enabled haplotype analysis of 100 Brazilian wheat cultivars, revealing that 11.0% already possess a BR 18-Terena-like haplotype for more than one of the identified heading stage QTL [58]. This facilitates targeted breeding efforts to combine multiple resistance QTL for more durable resistance.
Future perspectives in this field include:
Gene Pyramiding: Combining multiple NBS-LRR genes with different recognition specificities to develop cultivars with broader and more durable resistance [52].
Engineered NLRs: Modifying the LRR domains of NLR proteins to recognize new pathogen effectors, creating synthetic resistance genes [52].
Network Biology Approaches: Understanding how NLR proteins function within immune signaling networks, including interactions between helper NLRs and sensor NLRs [54].
Pan-NLRome Characterization: Leveraging multiple reference genomes for key crops to understand the complete diversity of NLR genes within species [54].
As genomic technologies continue to advance, the integration of evolutionary insights with functional characterization will accelerate the development of disease-resistant cultivars, reducing reliance on chemical pesticides and enhancing global food security. The rich diversity of NBS-LRR genes across land plants represents a vast resource for crop improvement that we are only beginning to tap systematically.
Plant nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes constitute one of the largest and most diverse gene families involved in disease resistance, presenting substantial annotation challenges due to their complex genomic architecture. Recent studies have identified 12,820 NBS-domain-containing genes across 34 plant species, classified into 168 distinct classes with both classical and species-specific structural patterns [3]. The pepper (Capsicum annuum L.) genome alone contains 252 NBS-LRR resistance genes distributed unevenly across all chromosomes, with 54% forming 47 gene clusters driven by tandem duplications and genomic rearrangements [22]. In tung trees, comparative analysis revealed 239 NBS-LRR genes across two Vernicia species, with 90 in the Fusarium wilt-susceptible V. fordii and 149 in the resistant V. montana [60].
This remarkable diversity, coupled with the clustered organization of these genes, creates significant obstacles for accurate genome annotation. Standard annotation pipelines frequently fragment R-gene predictions due to their repetitive nature and sequence similarity, while their typically low expression levels complicate transcriptome-based annotation approaches [36]. The annotation challenge is further compounded by the fact that R-genes can be mistaken for repetitive sequences, causing public databases for transposable elements to obscure R-gene detection during genome annotation processes [36].
The NBS-LRR gene family exhibits extraordinary structural diversity, encompassing significant variations in domain architecture across plant species. These genes typically encode large proteins ranging from approximately 860 to 1,900 amino acids with at least four distinct domains: a variable amino-terminal domain, the NBS domain, the LRR region, and variable carboxy-terminal domains [61]. Based on their N-terminal domains, NBS-LRR genes are classified into two major subfamilies: TIR-NBS-LRR (TNL) and CC-NBS-LRR (CNL), also referred to as non-TIR-NBS-LRR (nTNL) [22].
Table 1: NBS-LRR Gene Distribution in Selected Plant Species
| Plant Species | Total NBS-LRR Genes | TNL Genes | CNL/nTNL Genes | Gene Clusters | Reference |
|---|---|---|---|---|---|
| Capsicum annuum (pepper) | 252 | 4 | 248 | 47 clusters | [22] |
| Vernicia montana (tung tree) | 149 | 12 | 137 | Not specified | [60] |
| Vernicia fordii (tung tree) | 90 | 0 | 90 | Not specified | [60] |
| Arabidopsis thaliana | ~150 | ~62 | ~88 | Multiple | [61] |
| Oryza sativa (rice) | ~400 | 0 | ~400 | Multiple | [61] |
The distribution of these genes across genomes is highly non-random, with significant enrichment in specific genomic regions. For example, in pepper, chromosome 3 harbors the highest number of NBS-LRR genes (38), while chromosomes 2 and 6 contain the lowest number (5 each) [22]. This uneven distribution reflects the lineage-specific adaptations and evolutionary pressures that have shaped R-gene repertoires in different plant species.
NBS-LRR genes evolve through a birth-and-death process characterized by frequent gene duplications and losses, resulting in two distinct evolutionary patterns [61]. Type I genes evolve rapidly with frequent gene conversions and are often represented by multiple paralogs in a genome, while Type II genes evolve slowly with rare gene conversion events and typically have fewer paralogs [13]. This heterogeneous evolutionary rate creates substantial challenges for annotation pipelines, particularly when leveraging comparative genomics approaches.
The evolution of NBS-LRR genes is further complicated by their engagement with RNA silencing pathways. Multiple microRNA families target conserved regions within NBS-LRR transcripts, creating an additional layer of regulatory complexity that must be considered during annotation [13]. These miRNAs typically target highly duplicated NBS-LRRs, with duplicated genes from different families periodically giving birth to new miRNAs in a classic example of co-evolution [13].
Specialized bioinformatics pipelines have been developed to address the unique challenges of R-gene annotation. The nf-annotate pipeline represents a comprehensive approach that integrates multiple evidence sources for accurate R-gene prediction [62]. This pipeline employs a structured workflow that combines homology-based prediction, ab initio gene finding, and transcriptomic evidence to generate high-confidence annotations.
Table 2: Key Tools for R-gene Annotation and Their Applications
| Tool/Pipeline | Methodology | Key Features | Application Scope | |
|---|---|---|---|---|
| nf-annotate | Integrated homology-based and evidence-driven | Combines InterProScan, MEME, MAST, Miniprot | Comprehensive R-gene annotation | [62] |
| BRAKER2 | Automated genome annotation with protein evidence | Integrates GeneMark-EP+ and AUGUSTUS | General eukaryotic annotation with R-gene capability | [63] |
| PRGminer | Deep learning-based prediction | Uses dipeptide composition and convolutional neural networks | R-gene identification and classification | [36] |
| OrthoFinder | Orthogroup inference | Uses DIAMOND and MCL clustering | Evolutionary analysis of R-gene families | [3] |
The nf-annotate pipeline implements several specialized subworkflows for R-gene annotation. The HRP (Homology-based R-gene Prediction) subworkflow begins with protein sequence extraction from genome annotations, followed by domain identification using InterProScan Pfam, NB-LRR extraction, motif analysis with MEME/MAST, and refinement through InterProScan Superfamily [62]. This comprehensive approach enables the identification of both canonical and non-canonical R-genes that might be missed by standard annotation pipelines.
Recent advances in deep learning have enabled the development of specialized tools like PRGminer, which implements a two-phase prediction approach for R-gene identification and classification [36]. In Phase I, the tool distinguishes R-genes from non-R-genes with 98.75% accuracy in k-fold testing and 95.72% on independent testing using dipeptide composition features. Phase II classifies the identified R-genes into eight different classes with an overall accuracy of 97.55% in k-fold testing and 97.21% on independent testing [36].
This deep learning approach offers significant advantages over traditional alignment-based methods, particularly for identifying R-genes with low sequence homology to known references. By extracting sequential and convolutional features from raw encoded protein sequences, PRGminer can recognize patterns indicative of R-genes that might be missed by BLAST-based or motif-based approaches [36].
Functional validation of annotated R-genes typically employs a multi-faceted approach combining expression analysis, genetic variation studies, and functional characterization. A comprehensive protocol for validating NBS-LRR gene predictions includes the following key steps:
Expression Profiling: RNA-seq data from various tissues and stress conditions is analyzed to identify R-genes with responsive expression patterns. For example, in cotton, expression profiling revealed the putative upregulation of specific orthogroups (OG2, OG6, and OG15) in different tissues under various biotic and abiotic stresses in susceptible and tolerant plants to cotton leaf curl disease [3]. The retrieved FPKM values are categorized into tissue-specific, abiotic stress-specific, and biotic stress-specific expression profiles to identify context-dependent regulation.
Genetic Variation Analysis: Comparison of resistant and susceptible accessions identifies sequence variants potentially contributing to resistance phenotypes. In cotton, genetic variation analysis between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified several unique variants in NBS genes of Mac7 (6,583 variants) and Coker312 (5,173 variants) [3].
Protein Interaction Studies: Protein-ligand and protein-protein interaction assays validate the functional potential of annotated R-genes. Research has demonstrated strong interaction of some putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus [3].
Functional Characterization through VIGS: Virus-induced gene silencing (VIGS) provides direct evidence of gene function. In resistant cotton, silencing of GaNBS (OG2) through VIGS demonstrated its putative role in virus tittering, confirming its functional importance in disease resistance [3].
Table 3: Essential Research Reagents for R-gene Functional Characterization
| Reagent/Resource | Function/Application | Example Use Case | Reference |
|---|---|---|---|
| VIGS Vectors | Virus-induced gene silencing for functional validation | Silencing of GaNBS in cotton to confirm role in virus resistance | [3] |
| RNA-seq Libraries | Expression profiling under stress conditions | Identifying differentially expressed NBS-LRR genes in tolerant vs susceptible varieties | [3] |
| OrthoDB Protein Database | Source of protein sequences for homology-based annotation | Providing reference data for BRAKER2 automated annotation | [63] |
| Pfam Domain Databases | Domain identification and classification | Identifying NB-ARC domains in candidate R-genes | [3] |
| InterProScan | Integrated domain and motif prediction | Comprehensive domain architecture analysis in nf-annotate pipeline | [62] |
R-gene Annotation and Validation Workflow
Addressing the annotation challenges presented by complex R-gene clusters requires specialized bioinformatics approaches that integrate multiple evidence sources and leverage both homology-based and machine learning methods. The remarkable diversity of NBS-LRR genes, with 168 distinct domain architecture classes identified across land plants [3], necessitates moving beyond standard annotation pipelines to specialized workflows that account for their unique genomic organization and evolutionary dynamics.
Future directions in R-gene annotation will likely involve improved integration of long-read sequencing technologies to resolve complex cluster regions, enhanced deep learning models trained on expanding collections of validated R-genes, and more sophisticated evolutionary analysis tools to reconstruct the birth-and-death dynamics that shape these gene families. As these technical challenges are addressed, researchers will be better positioned to leverage the vast diversity of R-genes for crop improvement, with more than 450 R genes already cloned from 42 plant species [64] providing a foundation for engineering disease-resistant crops protected by genetics rather than pesticides.
The continued development of specialized tools and pipelines for R-gene annotation, coupled with experimental validation approaches, will be essential for unlocking the full potential of this remarkable gene family in crop protection and sustainable agriculture.
Nucleotide-binding site leucine-rich repeat (NBS-LRR) genes constitute the largest family of plant resistance (R) genes, encoding intracellular immune receptors that confer pathogen-specific immunity through effector-triggered immunity (ETI). However, high constitutive expression of NBS-LRR genes incurs significant fitness costs and can be lethal to plant cells, necessitating sophisticated regulatory mechanisms. This technical review examines the evolutionary dynamics and molecular mechanisms governing NBS-LRR expression, focusing on transcriptional regulation and post-transcriptional control mediated by diverse microRNAs (miRNAs). We explore how plants balance the benefits of pathogen recognition against the autotoxicity costs of NBS-LRR overexpression through co-evolutionary networks, with emphasis on the convergent evolution of miRNA families that target conserved NBS-LRR motifs. The comprehensive analysis presented herein integrates genomic, phylogenetic, and experimental perspectives to elucidate the complex regulatory landscape of plant immune receptors.
NBS-LRR genes encode STAND (signal-transduction ATPases with numerous domains) P-loop ATPases that function as central hubs in plant immunity, detecting polymorphic pathogen effectors and initiating robust defense responses [13]. These proteins typically contain three fundamental domains: an N-terminal coiled-coil (CC) or Toll/Interleukin-1 receptor (TIR) domain, a central nucleotide-binding site (NBS) domain that functions as a molecular switch, and a C-terminal leucine-rich repeat (LRR) domain responsible for pathogen recognition [13]. The NBS domain contains conserved motifs including the P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL, which are essential for ATP/GTP binding and hydrolysis during immune signaling [22].
Plant genomes maintain highly variable NBS-LRR repertoires ranging from under 100 to over 1,000 genes, with their sum in a host population defining the detection repertoire for polymorphic pathogen effectors [13]. Two distinct evolutionary patterns characterize NBS-LRR genes: type I genes feature multiple paralogs and rapid evolution with frequent gene conversions, while type II genes have fewer paralogs and evolve slowly with rare gene conversion events [13]. Most NBS-LRRs are organized in genomic clusters generated through tandem duplications and genomic rearrangements [22]. This expansion creates a recognition capacity balancing act—sufficient diversity to detect evolving pathogens without the autotoxicity of overexpression. The fitness costs associated with NBS-LRR maintenance have driven the evolution of multilayer regulatory systems, with miRNA-mediated control representing a crucial mechanism for maintaining this balance [13].
The link between NBS-LRRs and their regulation by small RNAs traces back to gymnosperms, emerging more than 100 million years after the origin of NBS-LRR genes in early land plants like mosses and spike mosses [13]. Comprehensive analyses across land plants reveal that NBS-LRR genes have undergone significant expansion and contraction events throughout plant evolution, with lineage-specific adaptations reflected in their domain architecture and subfamily distribution [3] [41].
Table 1: Evolutionary Distribution of NBS-LRR Subfamilies Across Plant Lineages
| Plant Species/Lineage | Total NBS Genes | CNL Subfamily | TNL Subfamily | RNL Subfamily | Notable Characteristics |
|---|---|---|---|---|---|
| Arabidopsis thaliana (dicot) | 207 | ~70% | ~30% | Present | Balanced subfamily distribution |
| Oryza sativa (monocot) | 505 | ~100% | Absent | Absent | Complete TNL loss |
| Triticum aestivum (monocot) | >1,000 | ~100% | Absent | Absent | Complete TNL loss |
| Salvia miltiorrhiza (medicinal dicot) | 196 | 61 CNLs | Absent | 1 RNL | Near-complete TNL loss |
| Pinus taeda (gymnosperm) | 311 | ~10% | ~89% | ~1% | TNL dominance |
| Capsicum annuum (pepper) | 252 | 248 nTNLs | 4 TNLs | Present | Extreme nTNL dominance |
| Physcomitrella patens (moss) | ~25 | Mixed | Mixed | Mixed | Limited repertoire |
Comparative genomics reveals striking lineage-specific patterns in NBS-LRR evolution. Monocots, including Poaceae family members like rice and wheat, demonstrate complete loss of TNL genes, while most dicots maintain both CNL and TNL subfamilies [41] [65]. However, exceptions exist even within dicots, with species like Mimulus guttatus and Salvia miltiorrhiza showing near-complete TNL loss [41] [65]. These distribution patterns reflect deep evolutionary pressures that have shaped immune receptor repertoires, possibly influencing concomitant miRNA regulator evolution.
NBS-LRR genes exhibit non-random genomic distribution, with approximately 54% forming physical clusters across plant genomes. Pepper (Capsicum annuum) exemplifies this pattern, with 136 of 252 NBS-LRR genes (54%) forming 47 gene clusters distributed across all chromosomes [22]. Chromosome 3 contains the highest concentration with 10 clusters, including the largest 8-gene cluster, while chromosome 6 contains no clusters [22]. Cluster members typically belong to the same gene subfamily, though mixed-cluster organizations also occur, suggesting complex evolutionary histories involving local duplications and rearrangements [22].
The correlation between cluster size and NBS-LRR numbers implies that tandem duplication represents a key mechanism for immune receptor diversification [65]. This clustering has profound implications for regulation, as duplicated NBS-LRRs from different gene families periodically give birth to new miRNAs, creating localized regulatory networks [13]. The birth of new miRNAs typically occurs through inverted duplication of target gene sequences, with subsequent mutations refining precursor processing and target specificity [13].
At least eight families of miRNAs targeting NBS-LRRs have been identified across plant species, with the miR482/2118 superfamily representing the most deeply conserved [13]. These miRNAs typically target highly duplicated NBS-LRRs, while families of heterogeneous NBS-LRRs are rarely targeted in Poaceae and Brassicaceae genomes [13]. The tight association between NBS-LRR diversity and miRNA regulation represents a co-evolutionary adaptation allowing plants to maintain expansive immune receptor repertoires while mitigating fitness costs.
Table 2: Characterized miRNA Families Targeting NBS-LRR Genes
| miRNA Family | Target Site | Conservation | Representative Functions |
|---|---|---|---|
| miR482/2118 | P-loop motif | Gymnosperms to dicots | Targets multiple NBS-LRR lineages; triggers phasiRNA production |
| miR5300 | P-loop motif | Specific lineages | Secondary layer of NBS-LRR regulation |
| miR6019 | TIR domain | Specific lineages | TNL-specific regulation |
| miR6020 | TIR domain | Specific lineages | TNL-specific regulation |
| miR7122 | Multiple sites | Specific lineages | Family-specific NBS-LRR regulation |
| miR2118-3p | LRR domains | Common bean | Differential expression during fungal infection |
| miR5374 | LRR domains | Common bean | Modulation during anthracnose infection |
| tae-miR1714 | LysM receptors | Wheat | Novel regulator targeting non-NBS-LRR immune receptor |
Most newly emerged miRNAs target the same conserved, encoded protein motifs of NBS-LRRs, particularly the P-loop region, consistent with convergent evolution [13]. This targeting strategy allows single miRNAs to regulate multiple NBS-LRR lineages, providing broad regulatory potential with minimal genetic investment. The conservation of these miRNAs from gymnosperms to dicots indicates they originated prior to the emergence of angiosperms [13].
Plant miRNAs typically exhibit extensive complementarity to their target sequences, enabling transcript cleavage or translational repression through Argonaute (AGO) protein-containing RISC complexes [66]. Two primary mechanisms govern miRNA-mediated regulation: transcript cleavage reduces specific mRNA levels, while translation repression decreases protein accumulation without substantial transcript reduction [66]. The binding accessibility of target sites within mRNA molecules significantly influences regulatory efficacy, with flanking sequences playing crucial roles in allowing or restricting miRNA access [13].
A specialized regulatory mechanism involves 22-nt miRNAs, which trigger the generation of phased secondary siRNAs (phasiRNAs) from their target mRNAs [13]. This amplification system creates a robust regulatory cascade, particularly effective for large gene families like NBS-LRRs. In this process, 22-nt miRNAs (often resulting from precursors containing asymmetrical bulges) initiate phased siRNA production from NBS-LRR transcripts, generating secondary silencing signals that reinforce the primary miRNA regulation [13].
Figure 1: miRNA-Mediated Regulatory Mechanisms for NBS-LRR Genes. miRNAs guide RISC complexes to complementary NBS-LRR mRNAs, leading to transcript cleavage or translation repression. Twenty-two nucleotide miRNAs can trigger phasiRNA production, creating an amplified silencing cascade.
Cutting-edge methodologies enable researchers to dissect the complex regulatory relationships between miRNAs and their NBS-LRR targets. The following integrated workflow represents state-of-the-art approaches for characterizing these interactions:
Figure 2: Integrated Workflow for miRNA-NBS-LRR Interaction Analysis. Comprehensive approach combining high-throughput sequencing, bioinformatic prediction, and experimental validation.
Step 1: High-Throughput Sequencing
Step 2: Bioinformatic Identification and Prediction
Step 3: Expression Correlation Analysis
Virus-Induced Gene Silencing (VIGS) The barley stripe mosaic virus (BSMV) VIGS system enables functional characterization of miRNA-NBS-LRR interactions in plants [67]. For miRNA silencing, engineer constructs expressing short tandem target mimics (STTMs) that sequester endogenous miRNAs. For overexpression, clone pre-miRNA sequences into viral vectors [67]. Key steps include:
Dual-Luciferase and Fluorescent Reporter Assays Validate direct miRNA-target interactions through heterologous expression systems:
Genetic and Transgenic Approaches
Table 3: Essential Research Reagents and Resources
| Reagent/Resource | Specific Application | Function and Utility | Example Products/Codes |
|---|---|---|---|
| miRNeasy Micro Kit | Small RNA extraction | Isolation of high-quality small RNAs from plant tissues | QIAGEN 217084 |
| NEBNext Small RNA Library Prep Set | sRNA library construction | Preparation of sequencing libraries from small RNAs | NEB E7330S |
| ExoQuick Plasma Kit | Exosome isolation | Extraction of circulating exosomes for cross-kingdom studies | SBI EXOQ5A-1 |
| BSMV VIGS System | Functional validation | Virus-induced gene silencing for miRNA and target characterization | Custom vectors |
| Dual-Luciferase Reporter System | Target validation | Quantitative measurement of miRNA-target interactions | Promega E1910 |
| Agrobacterium tumefaciens | Transient transformation | Delivery of genetic constructs into plant tissues | GV3101, LBA4404 |
| PRGminer | R-gene prediction | Deep learning-based identification of resistance genes | https://kaabil.net/prgminer/ |
| psRobot | miRNA target prediction | Bioinformatics tool for plant miRNA target identification | http://omicslab.genetics.ac.cn/psRobot/ |
The intricate regulatory networks controlling NBS-LRR expression represent evolutionary solutions to the fundamental challenge of maintaining effective immunity without autotoxicity. The co-evolution of NBS-LRR genes and their miRNA regulators has created a dynamic system that balances detection capacity against fitness costs, enabling plants to adapt to evolving pathogen pressures. The convergent evolution of miRNAs targeting conserved NBS-LRR motifs demonstrates the power of natural selection to arrive at similar regulatory solutions across plant lineages.
Future research directions should focus on several key areas: (1) exploring the potential for engineered miRNAs to enhance crop resistance without yield penalties, (2) investigating cross-kingdom RNA regulation as a potential mechanism for pathogen manipulation of host immunity, and (3) developing computational models that predict regulatory outcomes from miRNA-NBS-LRR interaction networks. The integration of multi-omics approaches with advanced gene editing technologies will further illuminate this complex regulatory landscape, potentially enabling precise manipulation of plant immunity for sustainable agriculture.
Understanding miRNA-mediated control of NBS-LRR expression extends beyond fundamental plant biology, offering practical applications in crop improvement and disease management. As climate change and global trade accelerate pathogen spread, leveraging these natural regulatory mechanisms may prove essential for developing durable resistance in crop plants.
The nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family constitutes the most prevalent class of disease resistance (R) genes in plants, playing a critical role in effector-triggered immunity. The evolutionary maintenance of these genes represents a fundamental trade-off between the fitness benefits of pathogen resistance and the costs associated with gene expression and function. This whitepaper synthesizes current research on the selective pressures acting on NBS-encoding genes, examining the molecular signatures of purifying and balancing selection, genomic distribution patterns, and the experimental methodologies used to characterize these evolutionary dynamics. Within the broader context of land plant evolution, understanding these mechanisms provides crucial insights into plant-pathogen co-evolution and informs strategies for developing durable disease resistance in crops.
Plants employ a sophisticated two-layered immune system to defend against pathogens. The second layer, effector-triggered immunity (ETI), is primarily mediated by NBS-LRR proteins that recognize pathogen-secreted effectors, often activating a hypersensitive response and programmed cell death [41]. Approximately 80% of functionally characterized R genes belong to the NBS-LRR family, making them a central component of the plant immune system [41].
The NBS domain serves as a molecular switch, binding and hydrolyzing ATP to activate downstream immune signaling, while the LRR domain is responsible for recognizing diverse effectors released by pathogens [41]. This gene family exhibits remarkable plasticity, with copy numbers varying significantly across plant species—from approximately 150 in Arabidopsis to almost 500 in rice [69]. This rapid copy number evolution is driven by repeated cycles of duplication, divergence, and eventual loss via pseudogene formation or deletion in response to diverse pathogenic pressures [69].
The maintenance of this extensive genetic arsenal involves significant fitness costs, creating an evolutionary trade-off that shapes NBS gene diversity within plant genomes. This review examines the mechanisms balancing these costs and benefits through integrated molecular, genomic, and ecological perspectives.
NBS-encoding genes display non-random distribution patterns across plant genomes, often clustering in specific chromosomal regions. In sorghum, over 60% of NBS-encoding genes are located on just three chromosomes (SBI-02, SBI-05, and SBI-08), with approximately 68.7% arranged in clustered configurations [69]. Similar clustering patterns occur in radish, where 72% of NBS-encoding genes form 48 clusters distributed across 24 crucifer blocks on chromosomes [70].
This clustered organization facilitates evolutionary plasticity through mechanisms such as tandem duplication. Comparative analyses reveal that NBS-LRR genes are significantly enriched in regions containing fungal pathogen resistance quantitative trait loci (QTL), highlighting their functional importance in disease resistance [69].
Table 1: NBS-LRR Gene Distribution Across Plant Species
| Plant Species | Total NBS Genes | Notable Distribution Features | Reference |
|---|---|---|---|
| Sorghum (Sorghum bicolor) | 346 | 60% on 3 chromosomes; 68.7% in clusters | [69] |
| Radish (Raphanus sativus) | 225 | 72% clustered in 48 groups across chromosomes | [70] |
| Salvia (Salvia miltiorrhiza) | 196 | 62 typical NLRs with complete domains | [41] |
| Vernicia montana | 149 | Higher numbers on Vmchr2, Vmchr7, Vmchr11 | [60] |
| Vernicia fordii | 90 | Higher numbers on Vfchr2, Vfchr3, Vfchr9 | [60] |
NBS-LRR genes are classified based on their domain architecture, primarily according to their N-terminal domains:
Additional structural variations include partial genes lacking complete domains (e.g., TN, CN, NL) [70]. The distribution of these subfamilies varies significantly across plant lineages, with TNL subfamilies markedly reduced or absent in certain species. In Salvia miltiorrhiza, for instance, the 62 typical NLRs include 61 CNLs and only 1 RNL protein, with complete absence of TNL subfamilies [41]. Similarly, monocotyledonous species like rice have completely lost TNL and RNL subfamilies [41].
Table 2: NBS-LRR Gene Classification Across Species
| Species | TNL | CNL | RNL | Partial/Other | Total | Reference |
|---|---|---|---|---|---|---|
| Raphanus sativus | 80 | 19 | 0 | 126 | 225 | [70] |
| Arabidopsis thaliana | 75 | - | - | 89 | 164 | [70] |
| Vernicia montana | 12 | 98 | - | 39 | 149 | [60] |
| Vernicia fordii | 0 | 49 | - | 41 | 90 | [60] |
| Sorghum bicolor | 0 | 24 | - | 322 | 346 | [69] |
NBS-encoding genes exhibit molecular signatures of contrasting evolutionary processes. In sorghum, these genes are significantly enriched in genomic regions under both purifying selection (evident during domestication and improvement) and balancing selection [69].
Purifying selection acts to remove deleterious alleles, characterized by elevated differentiation between wild and cultivated groups, low nucleotide diversity, and negatively skewed allele frequency spectra. This selective pressure conserves essential resistance functions while eliminating costly variants.
Balancing selection maintains genetic variation within populations, potentially through frequency-dependent selection or heterozygote advantage. This process preserves diversity in resistance genes, enabling populations to respond to evolving pathogenic threats.
The diagram below illustrates how these selection pressures interact with NBS gene evolution:
The maintenance of NBS-LRR genes incurs significant fitness costs that drive evolutionary trade-offs. Several molecular mechanisms underlie these costs:
Resource Allocation Costs: NBS-LRR proteins are structurally complex, requiring substantial energetic resources for expression and maintenance. In radish, expression analyses revealed that 75 NBS-encoding genes contribute to resistance against Fusarium wilt, representing significant metabolic investment [70].
Autoimmunity Risks: Inappropriate activation of NBS-LRR-mediated immunity can lead to autoimmune responses, where the immune system attacks host tissues. This is particularly problematic under conditions without pathogenic challenge.
Pleiotropic Effects: Some NBS-LRR genes exhibit pleiotropic effects on plant development. For instance, while most trichome development genes affect both leaf trichomes and root hairs, GL1 specifically influences trichome development without affecting root hairs [71].
Evidence from ecological studies demonstrates these fitness trade-offs. Research on trichome production in Arabidopsis halleri subsp. gemmifera revealed equivalent fitness of hairy and glabrous plants under natural herbivory, allowing their coexistence in contemporary populations [71]. However, under weak herbivory conditions, a fitness cost of trichome production became apparent, illustrating the context-dependent nature of these trade-offs [71].
HMMER-Based Domain Identification The standard approach for comprehensive identification of NBS-encoding genes involves hidden Markov model (HMM) profiling:
This methodology successfully identified 225 NBS-encoding genes in radish [70], 196 in Salvia miltiorrhiza [41], and 239 across two Vernicia species [60].
Polymorphism Analysis To assess selection pressures on NBS-encoding genes:
In sorghum, this approach revealed significantly higher diversity in NBS-encoding genes compared to non-NBS genes, with enrichment in the upper 5% tail of the empirical distribution of nucleotide diversity [69].
Expression Analysis
In radish, this approach identified RsTNL03 (Rs093020) and RsTNL09 (Rs042580) as positive regulators of resistance to Fusarium oxysporum, while RsTNL06 (Rs053740) acted as a negative regulator [70].
Functional Validation via VIGS Virus-induced gene silencing (VIGS) provides an efficient method for functional characterization:
This method demonstrated that Vm019719 confers resistance to Fusarium wilt in Vernicia montana [60]. The experimental workflow for functional characterization is summarized below:
Table 3: Essential Research Reagents for NBS-LRR Gene Studies
| Reagent/Method | Application | Key Features | Reference |
|---|---|---|---|
| HMMER Software with NB-ARC Profile (PF00931) | Genome-wide identification of NBS-encoding genes | Hidden Markov Model approach for comprehensive domain detection | [70] [41] |
| Virus-Induced Gene Silencing (VIGS) Systems | Functional characterization of candidate NBS-LRR genes | Rapid gene silencing without stable transformation | [60] |
| RNA-seq Transcriptome Profiling | Expression analysis of NBS-LRR genes under pathogen challenge | Genome-wide expression quantification | [70] [41] |
| qRT-PCR with Specific Primers | Validation of NBS-LRR gene expression patterns | High sensitivity and quantitative accuracy | [70] |
| Pfam and NCBI Conserved Domain Databases | Domain architecture classification | Curated domain models and annotations | [70] |
| Resequencing Panels (Wild, Landrace, Improved) | Selection pressure analysis | Polymorphism detection and diversity calculations | [69] |
The evolutionary maintenance of NBS-LRR genes represents a dynamic equilibrium between the imperative for pathogen recognition and the constraints of fitness costs. Evidence from diverse plant species reveals that this balance is achieved through contrasting selection pressures—purifying selection that conserves essential functions while minimizing costs, and balancing selection that maintains diversity for evolving pathogenic threats.
The genomic distribution of NBS-LRR genes in clusters, frequently enriched in disease resistance QTL regions, facilitates rapid evolution through tandem duplication and diversifying selection. The structural reduction of specific subfamilies (particularly TNL) in certain lineages further illustrates how evolutionary trajectories shape this gene family in response to selective constraints.
Future research directions should include:
Understanding these evolutionary dynamics provides a framework for developing disease-resistant crop varieties with optimized trade-offs between defense investment and agricultural productivity.
Functional validation of genes is a cornerstone of modern plant molecular biology, providing the foundational knowledge required for advanced breeding and genetic engineering. Within the context of studying the evolution of NBS domain genes in land plants—a major class of disease resistance genes—researchers require robust methodologies to link genetic sequences to biological functions [72] [3]. Virus-Induced Gene Silencing (VIGS) and Genetic Transformation have emerged as two powerful, yet distinct, strategies for this purpose. VIGS offers a rapid, transient silencing approach that exploits the plant's own antiviral defense mechanisms, while stable genetic transformation provides permanent genetic modification. This guide provides an in-depth technical comparison of these methodologies, detailing their protocols, applications, and integration, with a specific focus on validating the function of NBS domain genes involved in plant immunity and evolution [3] [73]. The choice between these strategies depends on research goals, time constraints, and the plant species under investigation, and their synergistic use can powerfully accelerate functional genomics research.
VIGS is a post-transcriptional gene silencing (PTGS) technique that utilizes a recombinant viral vector to trigger sequence-specific degradation of endogenous plant mRNAs [72] [74]. The process begins when an engineered virus containing a fragment of the plant target gene is introduced into the plant. The plant's cellular machinery replicates the viral RNA, forming double-stranded RNA (dsRNA) intermediates. These dsRNAs are recognized and cleaved by Dicer-like enzymes (DCL) into 21-24 nucleotide small interfering RNAs (siRNAs). The siRNAs are incorporated into an RNA-induced silencing complex (RISC), which uses the siRNA as a guide to identify and cleave complementary endogenous mRNA molecules, thereby silencing the target gene [72] [73]. The entire process is outlined in Figure 1.
Figure 1: Mechanism of Virus-Induced Gene Silencing (VIGS)
The success of VIGS largely depends on the choice of viral vector. Different vectors are suited to different plant families and experimental needs. Table 1 summarizes the most commonly used VIGS vectors.
Table 1: Key Viral Vectors Used in VIGS
| Vector Name | Virus Type | Host Range/Applications | Key Features | References |
|---|---|---|---|---|
| Tobacco Rattle Virus (TRV) | RNA virus | Broad host range; Solanaceae (pepper, tomato, tobacco), Arabidopsis, soybean | Efficient systemic movement, mild symptoms, targets meristematic tissues | [72] [75] [74] |
| Bean Pod Mottle Virus (BPMV) | RNA virus | Soybean | High efficiency in legumes; often requires particle bombardment | [75] |
| Barley Stripe Mosaic Virus (BSMV) | RNA virus | Monocots (barley, wheat) | One of the few reliable vectors for cereal crops | [74] |
| Geminiviruses (CLCrV, ACMV) | DNA virus | Cotton, tomato | DNA-based vectors; useful for species recalcitrant to RNA vectors | [72] |
| Satellite Virus-Based Systems | DNA/RNA satellite | Tomato, etc. | Two-component system; strong silencing with minimal viral symptoms | [74] |
The following is a generalized TRV-based VIGS protocol, optimized for challenging species like soybean, which can be adapted for other plants including those used in NBS gene research [75].
Vector Construction:
Plant Preparation & Agroinfiltration:
Plant Growth and Phenotyping:
Stable genetic transformation involves the permanent integration of a foreign gene into the plant genome, enabling the study of gene function through overexpression, knockout, or knock-in modifications. This results in heritable changes that can be studied over multiple generations [76]. Recent breakthroughs aim to overcome the major bottleneck of plant transformation: the reliance on lengthy, genotype-dependent tissue culture processes.
Table 2: Key Genetic Transformation Methods
| Method | Principle | Key Applications | Advantages | Limitations |
|---|---|---|---|---|
| Agrobacterium-mediated | Uses A. tumefaciens to transfer T-DNA containing the gene of interest into the plant genome [76]. | Most dicots (tomato, tobacco, soybean); some monocots. | Relatively simple, low cost, typically single-copy insertions. | Genotype-dependent, often requires tissue culture, can be time-consuming. |
| Pollen-tube Pathway | Exogenous DNA is applied to the site of pollination and enters the fertilized egg via the pollen tube [76]. | Cotton, soybean, wheat. | Bypasses tissue culture; technically simple. | Efficiency can be low and variable; not universally applicable. |
| Tissue Culture-Free (Wound-Induced) | Activates the plant's innate wound-response and regeneration pathways directly on the parent plant [77] [78]. | Tomato, soybean, tobacco. | Dramatically speeds up process (weeks vs. months); avoids tissue culture; works with CRISPR. | Still being optimized for broad species application. |
A groundbreaking method developed by Patil et al. (2025) combines wound-induced regeneration with Agrobacterium delivery to accelerate the creation of transgenic plants [77] [78]. The workflow is depicted in Figure 2.
Figure 2: Tissue Culture-Free Transformation Workflow
Detailed Steps:
The evolution of the NBS-LRR gene family is characterized by extensive expansion, diversification, and tandem duplications, leading to large, variable repertoires in plant genomes [3] [13] [22]. Functional validation is crucial to understand the role of specific NBS genes in pathogen resistance and evolutionary adaptation.
For a comprehensive research program on NBS gene evolution, an integrated approach is most powerful. VIGS should be used for rapid, high-throughput preliminary screening of multiple candidate NBS genes identified from genomic studies. Promising candidates can then be subjected to more detailed, heritable functional analysis using stable genetic transformation (including CRISPR/Cas9 editing) to create permanent mutant lines for in-depth phenotypic and evolutionary analysis.
Research Reagent Solutions for Functional Validation
| Reagent / Tool | Function / Application | Examples & Notes |
|---|---|---|
| TRV-based VIGS Vectors | Induces transient gene silencing in a wide range of dicot plants. | pTRV1 and pTRV2 binary vectors; the insert is cloned into pTRV2 [72] [75]. |
| Gateway-Compatible Vectors | Allows rapid recombination-based cloning of target gene fragments. | Reduces time and increases throughput for vector construction. |
| Agrobacterium Strains | Delivery vehicle for genetic material into plant cells. | GV3101 is a common disarmed strain for VIGS and transformation [75]. |
| Marker Genes | Visual indicators of successful transformation or silencing. | GFP for tracking infection [75]; PDS for visualizing silencing (photobleaching) [72] [75]. |
| Wound-Response Plasmids | Enable tissue culture-free transformation. | Plasmids carrying WIND1 and IPT genes under wound-responsive promoters [77] [78]. |
| CRISPR/Cas9 Systems | For precise gene editing in stable transformation. | Used to create knockouts or precise modifications in NBS-LRR genes [78]. |
The functional validation of genes, particularly within large and complex families like the NBS-LRR genes, is critical for understanding plant immunity and evolution. VIGS stands out as a rapid, flexible, and powerful tool for initial, high-throughput functional screening. In parallel, advancements in stable genetic transformation, especially the new tissue culture-free methods, are breaking down long-standing technical barriers, enabling the faster creation of stable genetic lines for deeper analysis. By strategically combining these approaches, researchers can efficiently bridge the gap from genomic sequence to biological function, accelerating the pace of discovery in plant evolutionary genetics and the development of improved, disease-resistant crops.
The evolution of land plants is fundamentally linked to their ability to adapt to pathogenic threats, a process mediated significantly by nucleotide-binding site (NBS) domain genes. These genes encode one of the largest families of plant resistance (R) proteins and play a crucial role in effector-triged immunity [3]. Current research has identified 12,820 NBS-domain-containing genes across 34 plant species, spanning from mosses to monocots and dicots, displaying remarkable structural diversity with 168 distinct domain architecture classes [3]. This diversity encompasses both classical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific structural variations, underscoring the dynamic evolutionary history of this gene family.
Accurate genome assembly and annotation present particular challenges for NBS genes due to their characteristic genomic organization. These genes are often arranged in clusters of tandemly duplicated sequences, and their inherent similarity can lead to local genome assembly collapse and annotation problems [79]. Furthermore, standard annotation pipelines frequently misannotate NBS loci because their multiplicity of similar sequences causes issues with repeat masking, and they are often expressed at low levels, providing limited RNA-seq evidence for gene prediction [79]. These technical challenges necessitate specialized approaches for genome assembly and annotation when the research focus includes comprehensive characterization of NBS gene families.
The quality of genome assembly directly impacts the completeness and accuracy of NBS gene prediction. High-quality reference genomes are the cornerstone of modern genomics, yet error-free eukaryotic genome assembly remains challenging despite technological advances [80]. For NBS genes specifically, the combination of their repetitive nature, tandem duplication patterns, and sequence similarity creates obstacles for conventional assembly algorithms.
Long-read sequencing technologies from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) theoretically solve many of these problems by spanning repetitive regions, thus providing larger "puzzle pieces" for assembly [81]. However, ONT sequencing presents unique challenges, as errors tend to accumulate and assembly statistics plateau as sequencing depth increases [81]. Robust experimental design is therefore essential, with evidence suggesting that eukaryotic genome assembly requires high-molecular-weight DNA extractions that increase read length, coupled with computational protocols that reduce error through pre-assembly correction and read selection [81].
Recent studies indicate that pure ONT sequencing and assembly outperforms hybrid approaches, with contiguous assemblies achievable at sequencing coverage of >60× [81]. However, simply increasing sequencing depth is insufficient; pre-assembly filtering and read correction improve contiguity, while post-assembly polishing using short Illumina reads increases accuracy [81]. These findings highlight the importance of rigorous experimental design in obtaining assemblies suitable for comprehensive NBS gene family analysis.
A particularly challenging aspect of assembling NBS genes involves "haplotypic duplications," where alleles in heterozygous regions are mistakenly assembled as paralogous genes [80]. This problem is especially pertinent for NBS genes in diploid or polyploid plant genomes, where false duplicates can create illusions of gene family expansions, leading to incorrect conclusions about genome evolution and functioning [80].
Specialized tools such as Mabs have been developed to optimize parameters of popular genome assemblers Hifiasm and Flye, creating assemblies with more accurately assembled genes [80]. Mabs employs a novel metric called AG (number of Accurately assembled Genes) that improves upon traditional BUSCO assessments by differentiating between true multicopy orthogroups (composed of paralogues) and false multicopy orthogroups (composed of uncollapsed alleles) based on coverage analysis [80]. This approach is particularly valuable for NBS gene research, where distinguishing true gene family expansions from assembly artifacts is essential for evolutionary studies.
Table 1: Key Challenges in NBS Gene Assembly and Annotation
| Challenge | Impact on NBS Genes | Potential Solution |
|---|---|---|
| Tandem Repeats | Causes assembly collapse in clustered NBS regions [79] | Long-read sequencing to span repetitive regions [81] |
| Haplotypic Duplications | Alleles mistaken for paralogs, inflating gene counts [80] | Tools like Mabs with AG metric for parameter optimization [80] |
| Repeat Masking | NBS genes incorrectly masked as transposable elements [79] | Homology-based prediction (HRP) bypassing automated annotation [79] |
| Low Expression | Limited RNA-seq evidence for gene prediction [79] | Combined evidence approach using protein homology [79] |
Conventional genome annotation pipelines often fail to adequately predict NBS genes due to their complex genomic organization. Standard gene annotation tools that rely on automated gene prediction followed by protein motif/domain-based search (PDS) prove imprecise for NBS genes, as repeat masking prior to genome annotation often prevents comprehensive detection [79]. This has led to the development of specialized methods designed specifically for resistance gene annotation.
The full-length Homology-based R-gene Prediction (HRP) method represents a significant advance in NBS gene identification [79]. This approach uses a two-level homology search: first identifying an initial set of R-genes in the automated gene prediction using protein domains, then using these R-genes for full-length homology searches in the genome assembly. This strategy successfully addresses the complex genomic organization of NBS-LRR gene loci and has proven more effective than well-established methods like RenSeq [79].
In practical tests, HRP identified 363 NB-LRR genes in the tomato genome, including 103 of 105 novel genes previously identified by the manually curated RenSeq method [79]. The method's efficiency was further demonstrated in Beta vulgaris genomes, where it identified up to 45% more full-length NB-LRR genes compared to previous approaches [79]. HRP also proved valuable for R-gene allele mining, enabling identification of previously undiscovered Fom-2 homologs in five Cucurbita species genomes [79].
Figure 1: HRP Method Workflow for comprehensive NBS gene identification
Successful NBS gene annotation typically requires an integrated approach combining multiple evidence types. This includes de novo, homology, and transcriptome-based predictions [82]. For example, in the high-quality eggplant genome assembly, researchers used RNA from five different tissues (root, stem, leaf, flower, and fruit) for both next-generation transcriptome sequencing and full-length transcriptome sequencing, enabling prediction of 36,582 coding genes [82]. Such comprehensive transcriptomic data provides valuable supporting evidence for gene prediction, though it may be insufficient alone for lowly expressed NBS genes.
Additional quality assessment tools such as BUSCO (Benchmarking Universal Single-Copy Orthologs) help evaluate assembly completeness by assessing the presence of evolutionarily conserved single-copy genes [82]. For the eggplant genome, BUSCO evaluation showed that 2,190 homologous single-copy genes were assembled, representing 94.2% of all expected single-copy genes [82]. This metric provides a useful indicator of overall assembly quality, though specialized metrics like AG may be more appropriate for assessing gene families prone to haplotypic duplications.
Table 2: Comparison of NBS Gene Annotation Methods
| Method | Principle | Advantages | Limitations |
|---|---|---|---|
| Domain Search (PDS) | Searches for NBS domains in predicted proteins [79] | Standardized, works with any annotation | Misses fragmented/false genes; repeat masking issues [79] |
| RenSeq | Resistance gene enrichment & sequencing [79] | High-quality manual curation; targeted approach | Labor-intensive; requires specialized libraries [79] |
| HRP Method | Two-level homology using initial R-gene set [79] | Comprehensive; identifies full-length genes | Depends on initial gene set quality [79] |
| Combined Evidence | Integrates de novo, homology, transcriptome [82] | Multiple supporting evidence types | Resource-intensive; may miss low-expression genes [82] |
Based on current research, an optimized workflow for genome assembly targeting NBS gene characterization involves multiple stages with specific quality control checkpoints. The following protocol integrates best practices from recent studies to maximize the accuracy of NBS gene prediction.
Successful assembly begins with high-quality input DNA. For eukaryotic genomes, high-molecular-weight DNA extractions are critical, as they increase sequence read length, which is particularly beneficial for spanning repetitive NBS gene clusters [81]. Protocols should include verification of DNA quality through pulsed-field gel electrophoresis and quantification using fluorometric methods (e.g., Qubit) rather than spectrophotometry alone [81].
For nematode samples, a recommended approach includes growing organisms on specialized growth medium, harvesting by centrifugation, and performing repeated washing until supernatant is clear [81]. DNA extraction then utilizes a modified phenol-chloroform approach after flash-freezing in liquid nitrogen and proteinase K digestion [81]. Size selection using kits such as the Short Read Eliminator Kit from Circulomics Inc. further enhances read length by removing fragmented DNA [81].
Sequencing technology selection should be guided by research goals. Oxford Nanopore Technologies (ONT) offers advantages in versatility, low input DNA requirements, and cost, making it suitable for individual research laboratories [81]. ONT library preparation can be modified from standard protocols by replacing the first AmpureXP bead clean step with additional treatment with the Short Read Eliminator Kit, improving read length [81].
The assembly process should incorporate specialized tools and metrics designed to address challenges specific to gene families like NBS genes. The Mabs suite provides parameter optimization for Hifiasm and Flye assemblers, creating genome assemblies with more accurately assembled genes than default parameters in 5 out of 6 tested cases [80].
Figure 2: Optimized Assembly Pipeline for NBS Gene Research
Post-assembly processing should include decontamination steps using tools like Blobtools2 and SIDR, which utilize taxonomic assignment, read coverage depth, and GC content to identify non-target contigs [81]. SIDR employs ensemble-based machine learning to train models capable of discriminating target and contaminant contigs based on measured predictor variables, allowing assignment of probable taxonomic origin to contigs that lack BLAST identification [81].
Quality assessment should move beyond traditional metrics like N50, incorporating gene-specific assessments such as the AG metric that differentiates between true and false multicopy orthogroups based on coverage [80]. This is particularly relevant for NBS genes, which often exist in multicopy families and are prone to haplotypic duplication artifacts.
Table 3: Essential Research Reagents and Computational Tools
| Tool/Reagent | Function | Application in NBS Research |
|---|---|---|
| Circulomics SRE Kit | Size selection for long DNA fragments [81] | Increases read length for spanning NBS clusters |
| ONT SQK-LSK109 | Ligation sequencing kit [81] | Produces long reads for repetitive region assembly |
| Mabs Suite | Parameter optimizer for Hifiasm/Flye [80] | Reduces haplotypic duplications in gene families |
| HRP Pipeline | Homology-based R-gene prediction [79] | Identifies full-length NBS genes missed by annotation |
| BlobTools2 | Taxonomic identification of scaffolds [81] | Removes contamination from assembly |
| BUSCO/AG Metric | Assembly completeness assessment [80] | Evaluates gene space completeness accurately |
| Purge_dups | Haplotig removal tool [80] | Addresses allele duplication in assemblies |
Accurate genome assembly and annotation are fundamental to understanding the evolution of NBS domain genes in land plants. The structural diversity of these genes—with 168 distinct classes identified across land plants—reflects their dynamic evolutionary history and adaptation to diverse pathogenic challenges [3]. The development of specialized methods like the HRP annotation pipeline and assembly optimization tools such as Mabs represents significant advances in our ability to comprehensively characterize this important gene family.
Future directions in this field will likely involve even more integrated approaches, combining emerging sequencing technologies with improved computational methods. As genome assembly techniques continue to advance toward telomere-to-telomere resolution, opportunities will expand for studying complex genomic regions harboring NBS gene clusters. Similarly, machine learning approaches show promise for further improving gene prediction accuracy, particularly for challenging gene families with unique characteristics like NBS genes.
For researchers focusing on plant-pathogen coevolution, implementing the optimized workflows described in this guide will enable more accurate characterization of NBS gene families, leading to better understanding of plant immunity evolution and facilitating the development of disease-resistant crops through targeted breeding strategies. The continued refinement of these methods remains essential for advancing our knowledge of plant genome evolution and the molecular basis of disease resistance.
Virus-Induced Gene Silencing (VIGS) has emerged as a powerful reverse genetics tool for rapidly characterizing gene function in plants, particularly in species with complex genomes that pose challenges for stable transformation. This technique leverages the plant's innate RNA interference (RNAi) machinery, using recombinant viral vectors to trigger sequence-specific degradation of target endogenous mRNA transcripts, leading to transient gene knockdown and observable phenotypic changes [72]. The application of VIGS is especially valuable in the context of plant immunity research, where it enables direct functional testing of candidate resistance genes, including those encoding nucleotide-binding site (NBS) domain proteins [3].
The NBS gene family represents one of the largest and most diverse classes of plant resistance (R) genes, playing crucial roles in effector-triggered immunity (ETI) against various pathogens [41]. These genes exhibit remarkable structural diversity and evolutionary dynamics, with significant expansion observed across land plants from bryophytes to higher angiosperms [3]. Recent studies have identified numerous NBS-encoding genes with diverse domain architectures, including classical (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific structural patterns [3]. However, the functional validation of these genes remains a critical step in understanding their roles in plant defense mechanisms.
This technical guide provides comprehensive methodologies for implementing VIGS assays to functionally validate NBS domain genes, with particular emphasis on experimental design, protocol optimization, and integration with evolutionary genomics frameworks. By bridging evolutionary insights with functional validation, researchers can effectively decipher the molecular mechanisms underlying plant immunity and accelerate the development of disease-resistant crops.
NBS domain genes constitute a major component of the plant immune system, with their evolution characterized by significant diversification and expansion across land plants. Comparative genomic analyses have revealed 12,820 NBS-domain-containing genes across 34 species ranging from mosses to monocots and dicots, classified into 168 distinct classes based on domain architecture patterns [3]. This diversity encompasses both classical configurations and species-specific structural patterns, highlighting the dynamic evolutionary history of this gene family.
The evolutionary trajectory of NBS genes is marked by several key mechanisms:
Table 1: Evolutionary Distribution of NBS-LRR Genes Across Plant Species
| Plant Species | Total NBS Genes | CNL Subfamily | TNL Subfamily | RNL Subfamily | Key Evolutionary Features |
|---|---|---|---|---|---|
| Arabidopsis thaliana | ~207 [41] | ~61% [83] | ~35% [83] | ~4% [83] | Balanced subfamily distribution |
| Oryza sativa (rice) | ~505 [41] | ~100% | Lost | Lost | Complete absence of TNL genes |
| Solanum tuberosum (potato) | ~447 [41] | Majority [22] | Minority [22] | - | nTNL dominance |
| Salvia miltiorrhiza | 196 [41] | 61 (typical) | 2 | 1 | Severe reduction in TNL/RNL |
| Capsicum annuum (pepper) | 252-288 [22] [83] | 248 [22] | 4 [22] | - | Extreme nTNL dominance |
| Gossypium hirsutum (cotton) | 12,820 (across 34 species) [3] | Multiple classes | Multiple classes | Multiple classes | Extensive diversification |
The functional implications of this evolutionary diversity are profound. NBS genes are often organized in clusters throughout plant genomes, with approximately 54% of pepper NBS-LRR genes forming 47 distinct clusters [22]. This genomic arrangement facilitates the rapid generation of new resistance specificities through unequal crossing over and gene conversion, enabling plants to keep pace with evolving pathogens. The integration of evolutionary analysis with functional validation through VIGS provides a powerful framework for identifying key genetic elements contributing to disease resistance in crop species.
VIGS operates through the plant's natural antiviral defense mechanism known as post-transcriptional gene silencing (PTGS). The process begins when recombinant viral vectors containing fragments of target plant genes are introduced into plant tissues. Once inside plant cells, these vectors replicate and produce double-stranded RNA (dsRNA) replication intermediates, which are recognized by the plant's RNAi machinery as foreign molecules [72].
Cellular Dicer-like (DCL) enzymes then process these dsRNA molecules into 21-24 nucleotide small interfering RNAs (siRNAs). These siRNAs are incorporated into an RNA-induced silencing complex (RISC), which uses the siRNA as a guide to identify and cleave complementary mRNA sequences, including both viral RNAs and endogenous transcripts sharing sequence similarity with the inserted fragment [72]. This results in targeted degradation of the corresponding plant mRNA, effectively reducing expression of the gene of interest and enabling functional characterization through observation of resulting phenotypes.
Several viral vectors have been developed for VIGS applications, with each offering distinct advantages and limitations:
Tobacco Rattle Virus (TRV) is one of the most widely used VIGS vectors due to its broad host range, efficient systemic movement, and mild symptom development [75] [72]. The TRV system utilizes a bipartite design with two plasmid vectors: TRV1 encodes replicase and movement proteins, while TRV2 contains the coat protein gene and a multiple cloning site for insertion of target gene fragments [72].
Bean Pod Mottle Virus (BPMV) has been successfully employed in soybean functional genomics, though it often requires particle bombardment for delivery and can induce leaf phenotypic alterations that complicate phenotypic evaluation [75].
Other viral vectors including Pea Early Browning Virus (PEBV), Soybean Yellow Common Mosaic Virus (SYCMV), Apple Latent Spherical Virus (ALSV), and Cucumber Mosaic Virus (CMV) have also been adapted for VIGS in various plant species [75].
The following diagram illustrates the molecular mechanism of TRV-mediated VIGS:
Effective VIGS relies on careful selection of target gene fragments. For NBS domain genes, researchers should identify specific regions that maximize silencing efficiency while minimizing off-target effects:
The following protocol details the construction of TRV-based VIGS vectors and preparation of Agrobacterium cultures:
Amplification of target fragment: Using high-fidelity DNA polymerase, amplify the selected gene fragment from cDNA with gene-specific primers containing appropriate restriction sites (e.g., EcoRI and XhoI) [75].
Vector ligation: Digest the pTRV2 vector with corresponding restriction enzymes and ligate the purified PCR product using standard molecular cloning techniques [75].
Transformation and sequence verification: Transform ligation products into E. coli DH5α competent cells, select positive colonies, and verify insert sequence through Sanger sequencing [84].
Agrobacterium transformation: Introduce verified recombinant plasmids and empty vector controls into Agrobacterium tumefaciens strain GV3101 through heat shock or electroporation [85].
Culture preparation: Inoculate single colonies of Agrobacterium harboring TRV1, TRV2-empty, and TRV2-target constructs into liquid LB media containing appropriate antibiotics (kanamycin 50 μg/mL, gentamicin 25 μg/mL) and grow overnight at 28°C with shaking [85].
Induction: Dilute cultures 1:10 in fresh LB media with antibiotics, 10 mM MES, and 200 μM acetosyringone, and grow until OD600 reaches 0.8-1.2. Harvest bacterial pellets by centrifugation and resuspend in induction buffer (10 mM MES, 10 mM MgCl2, 200 μM acetosyringone) to final OD600 of 0.5-1.5 [85]. Maintain at room temperature for 3-4 hours before infiltration.
Multiple infiltration methods have been developed for different plant species and tissue types:
Cotyledon infiltration: The most common method for dicot plants like cotton, tomato, and pepper. Puncture superficial wounds on the abaxial side of cotyledons from 7-10-day-old seedlings using a 25G needle, then flood with Agrobacterium mixture using a needleless syringe until fully saturated [85].
Pericarp cutting immersion: Particularly effective for recalcitrant tissues like Camellia drupifera capsules. Bisect explants and immerse fresh cut surfaces in Agrobacterium suspension for 20-30 minutes [84]. This method achieved ~94% infiltration efficiency in optimized systems [84].
Other methods: Direct injection, peduncle injection, and fruit-bearing shoot infusion can be effective for specific tissues and plant species [84].
Several factors significantly influence VIGS efficiency and must be optimized for each plant system:
Plant developmental stage: Optimal silencing effects vary with developmental stage. In Camellia drupifera capsules, maximum silencing efficiency for CdCRY1 (69.80%) and CdLAC15 (90.91%) was observed at early and mid developmental stages, respectively [84].
Agroinoculum concentration: OD600 values between 0.5-1.5 generally provide good results, with optimal concentration potentially species-dependent [72].
Environmental conditions: Temperature, humidity, and photoperiod significantly impact VIGS efficiency. Most systems perform well at 20-23°C with 14:10 light:dark photoperiod and high humidity maintained immediately after infiltration [85] [72].
Co-cultivation period: Maintaining high humidity for 16-24 hours post-infiltration enhances Agrobacterium infection efficiency [85].
The following workflow diagram illustrates the complete VIGS experimental process:
Confirming successful gene silencing is crucial for interpreting VIGS results. Multiple molecular techniques provide complementary validation:
Reverse-transcription quantitative PCR (RT-qPCR): The gold standard for quantifying silencing efficiency at the transcript level. Proper reference gene selection is critical for accurate normalization. Studies in cotton have identified GhACT7 and GhPP2A1 as the most stable reference genes under VIGS conditions, while commonly used genes like GhUBQ7 and GhUBQ14 showed poor stability [85].
Protein-level analysis: Western blotting or specific immunoassays can confirm reduction of target protein levels, though antibodies are not always available for NBS domain proteins.
Visual markers: For optimized systems, GFP fluorescence can indicate successful infection and silencing distribution when using pTRV2-GFP derivatives [75].
Functional validation of NBS domain genes typically involves challenging silenced plants with target pathogens and assessing disease responses:
Disease scoring: Quantitative assessment of disease symptoms, lesion size, pathogen proliferation, and hypersensitive response compared to control plants.
Biochemical assays: Measurement of defense-related compounds like reactive oxygen species, callose deposition, and pathogenesis-related (PR) protein expression.
Comparative analysis: Evaluate responses in susceptible versus resistant plant genotypes. For example, in cotton NBS genes, significant genetic variation was identified between susceptible (Coker 312; 5,173 variants) and tolerant (Mac7; 6,583 variants) accessions [3].
Table 2: Key Research Reagents for VIGS Experimental Workflow
| Reagent/Resource | Specifications | Function/Application | Considerations |
|---|---|---|---|
| TRV Vectors | pTRV1 (pYL192), pTRV2 (pYL156) | Bipartite vector system for VIGS | TRV1 encodes replication proteins; TRV2 for target insertion |
| Agrobacterium Strain | GV3101 | Delivery of TRV constructs to plant cells | Optimized for plant transformations |
| Antibiotics | Kanamycin (50 μg/mL), Gentamicin (25 μg/mL) | Selection for vector maintenance | Concentration critical for bacterial viability & selection |
| Induction Compounds | Acetosyringone (200 μM), MES (10 mM) | Induce vir genes; buffer pH | Essential for T-DNA transfer efficiency |
| Reference Genes | GhACT7, GhPP2A1 [85] | RT-qPCR normalization in cotton | Species-specific validation required |
| Positive Controls | TRV:PDS (photobleaching), TRV:CLA1 (albinism) [85] | System functionality assessment | Visual confirmation of silencing |
| Bioinformatics Tools | SGN VIGS Tool, Primer3, PlantCARE | Target selection, primer design, CRE analysis | Ensure specificity & effectiveness |
A comprehensive study demonstrates the application of VIGS for functional validation of NBS domain genes in cotton. Researchers identified 12,820 NBS-domain-containing genes across 34 plant species and classified them into 168 architectural classes [3]. Expression profiling revealed putative upregulation of specific orthogroups (OG2, OG6, OG15) in different tissues under various biotic and abiotic stresses in cotton accessions with contrasting responses to cotton leaf curl disease (CLCuD) [3].
Key experimental findings include:
This case study highlights the power of integrating evolutionary genomics with VIGS-mediated functional validation to identify key genetic elements contributing to disease resistance.
Low silencing efficiency: Optimize fragment design to ensure uniqueness and appropriate length. Adjust Agrobacterium density and infiltration method for specific tissues. Extend the incubation period before phenotypic assessment [84] [72].
Inconsistent silencing across plants: Standardize plant growth conditions, developmental stage at infiltration, and environmental parameters post-infiltration. Ensure uniform Agrobacterium culture preparation [72].
Non-specific phenotypes: Include multiple controls (empty vector, non-infiltrated, positive control) to distinguish target gene effects from viral symptoms or experimental artifacts [75] [85].
Poor systemic spread: Verify vector construction and Agrobacterium viability. Consider alternative infiltration methods or viral vectors better suited to the target species [84].
When applying VIGS to study NBS domain gene evolution, consider these specialized approaches:
Orthogroup-targeting: Design VIGS constructs to target conserved regions within specific orthogroups to assess functional conservation across species [3].
Lineage-specific genes: Include species-specific NBS genes in functional screens to identify novel resistance determinants that have emerged in particular lineages [41] [22].
Expression-correlated silencing: Prioritize NBS genes showing differential expression during pathogen challenge or between resistant and susceptible genotypes [3] [83].
Virus-Induced Gene Silencing represents a powerful approach for functionally validating NBS domain genes within an evolutionary framework. The integration of phylogenetic analyses with targeted functional studies enables researchers to identify key genetic elements governing plant immunity and understand how these systems have evolved across land plants. As genomic resources continue to expand for non-model species, VIGS provides an accessible, rapid, and cost-effective method for bridging sequence information with biological function, ultimately accelerating the development of disease-resistant crops through molecular breeding approaches.
The technical guidelines presented in this document provide a comprehensive framework for implementing VIGS assays to study NBS gene function, with emphasis on experimental design, protocol optimization, and integration with evolutionary genomics. By following these methodologies and considering the troubleshooting recommendations, researchers can effectively leverage this powerful technology to advance our understanding of plant immunity and its evolution.
This technical whitepaper synthesizes findings from a large-scale comparative genomic analysis of Nucleotide-Binding Site (NBS) domain genes across 34 plant species, from mosses to monocots and dicots. The study identified 12,820 NBS-domain-containing genes, classifying them into 168 distinct classes with both classical and novel domain architecture patterns. Evolutionary analyses revealed 603 orthogroups with core and lineage-specific expansions, while expression profiling demonstrated differential regulation under biotic and abiotic stresses. The research provides a comprehensive framework for understanding the evolutionary dynamics of plant immune receptor genes and their implications for disease resistance breeding.
Plant immunity relies on a sophisticated surveillance system where intracellular nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, also known as NLR proteins, function as critical immune receptors. These proteins recognize pathogen effectors and initiate effector-triggered immunity (ETI), providing plants with specific resistance against diverse pathogens [3]. NBS genes represent one of the largest and most variable gene families in plants, with repertoires ranging from fewer than 100 to over 1,000 members across different species [13]. This remarkable diversity stems from continuous evolutionary arms races with rapidly evolving pathogens, making the comparative analysis of NBS gene repertoires essential for understanding plant-pathogen coevolution.
The typical structure of an NBS-LRR protein includes three fundamental domains: an N-terminal domain (either Toll/Interleukin-1 receptor [TIR] or coiled-coil [CC]), a central nucleotide-binding site (NBS) domain, and a C-terminal leucine-rich repeat (LRR) region [22]. Based on their N-terminal domains, NLRs are classified into distinct subfamilies: CNLs (containing CC domains), TNLs (with TIR domains), and RNLs (featuring RPW8 domains) [86]. The NBS domain contains several conserved motifs—including the P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL—that are essential for ATP/GTP binding and hydrolysis, which activate downstream immune signaling [22].
Recent advances in sequencing technologies have enabled comprehensive comparative analyses of NBS gene repertoires across multiple plant species. This whitepaper examines the evolutionary patterns, structural diversification, and functional specialization of NBS genes across 34 plant species, providing insights into the genetic basis of disease resistance in plants.
The comprehensive analysis of 34 plant species covering lineages from mosses to monocots and dicots identified 12,820 NBS-domain-containing genes [3]. These genes were classified into 168 distinct classes based on their domain architecture, revealing significant diversity among plant species. The study discovered both classical structural patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific patterns (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS) [3].
Table 1: NBS Gene Distribution Across Major Plant Lineages
| Plant Lineage | Number of Species Analyzed | Total NBS Genes Identified | Notable Structural Patterns |
|---|---|---|---|
| Bryophytes | Included | Not specified | Minimal NLR repertoires (~25 in Physcomitrella patens) |
| Lycophytes | Included | Not specified | Highly reduced NLR repertoires (~2 in Selaginella moellendorffii) |
| Monocots | Multiple | Not specified | Complete absence of TNL genes |
| Dicots | Multiple | Not specified | Both TNL and CNL subtypes present |
| Total | 34 | 12,820 | 168 domain architecture classes |
The research further demonstrated that the number of NBS-LRR genes varies substantially across plant genomes. For example, while bryophytes and lycophytes possess minimal NLR repertoires (approximately 25 in Physcomitrella patens and only 2 in Selaginella moellendorffii), extensive gene expansion has occurred in flowering plants [3]. This expansion is particularly pronounced in angiosperms, with some species harboring thousands of NBS-LRR genes.
Orthogroup (OG) analysis revealed 603 orthogroups across the examined species, with evidence of both core (widely conserved) and unique (lineage-specific) orthogroups [3]. Core orthogroups (OG0, OG1, OG2, etc.) represent evolutionarily conserved NBS genes maintained across multiple species, while unique orthogroups (OG80, OG82, etc.) are highly specific to particular lineages, likely reflecting species-specific pathogen pressures.
Tandem duplications were identified as a major mechanism driving the expansion and diversification of NBS gene repertoires. These duplication events frequently lead to the formation of gene clusters, with 54% of NBS-LRR genes in pepper (Capsicum annuum) forming 47 physical clusters across the genome [22]. Similar clustering patterns have been observed across diverse plant species, contributing to the rapid evolution of novel recognition specificities.
Table 2: NBS Gene Subfamily Distribution in Selected Species
| Plant Species | Total NBS Genes | CNL Subfamily | TNL Subfamily | RNL Subfamily | Atypical/Other |
|---|---|---|---|---|---|
| Arabidopsis thaliana | 210 | 40 | Not specified | Not specified | 170 |
| Dendrobium officinale | 74 | 10 | 0 | Not specified | 64 |
| Capsicum annuum | 252 | 2 (typical) | 4 | 1 (RN) | 245 |
| Salvia miltiorrhiza | 196 | 61 | 0 | 1 | 134 |
| Solanaceae (9 species) | 819 | 583 | 182 | 54 | Not specified |
| Asparagus officinalis | 27 | Not specified | Not specified | Not specified | Not specified |
The distribution of NBS gene subfamilies shows remarkable lineage-specific patterns. Monocots, including orchids and grasses, have completely lost TNL-type genes, while eudicots typically maintain both CNL and TNL subtypes [40]. For example, comprehensive analysis of six orchid species revealed a complete absence of TNL-type genes, consistent with the pattern observed in other monocots [40]. In medicinal plants like Salvia miltiorrhiza, researchers identified 196 NBS genes, with only 62 possessing complete N-terminal and LRR domains, and a notable reduction in TNL and RNL subfamily members [41].
NBS genes are distributed unevenly across plant chromosomes, with a strong tendency to cluster in specific genomic regions. In pepper, NBS-LRR genes are distributed across all chromosomes, with chromosome 3 harboring the highest number (38 genes) while chromosomes 2 and 6 contain the lowest (5 genes each) [22]. Similar clustering patterns have been observed across multiple species, with chromosomal termini often enriched with NBS-LRR genes [87].
These clusters frequently arise from tandem duplications and genomic rearrangements, creating hotspots for the evolution of novel resistance specificities. Analysis of the Solanaceae family revealed that whole genome duplication (WGD) has played a significant role in the expansion of NBS-LRR genes, with the most recent whole genome triplication (WGT) particularly impacting this gene family [87].
Comparative analyses between domesticated crops and their wild relatives have revealed that domestication has significantly impacted NBS gene repertoires. A study of 15 domesticated crop species and their wild relatives found that five crops—grapes, mandarins, rice, barley, and yellow sarson—exhibited significantly reduced immune receptor gene repertoires compared to their wild counterparts [88].
This pattern is particularly evident in asparagus, where domesticated Asparagus officinalis contains only 27 NLR genes, compared to 47 in its wild relative A. kiusianus and 63 in A. setaceus [86]. This contraction of the NLR repertoire during domestication is associated with increased disease susceptibility in the cultivated species. Orthologous gene analysis identified only 16 conserved NLR gene pairs between A. setaceus and A. officinalis, representing the NLR genes preserved during the domestication process [86].
The duration of domestication appears positively associated with the extent of immune receptor gene loss, suggesting that domestication imposes cumulative pressure on the maintenance of NBS gene repertoires, consistent with relaxed selection rather than strong cost-of-resistance effects [88].
The identification of NBS genes across multiple species typically follows a standardized bioinformatics workflow:
Figure 1: Workflow for genome-wide identification of NBS genes. Key steps include domain searches using multiple complementary methods followed by domain architecture validation.
The primary method for identifying NBS genes involves Hidden Markov Model (HMM) searches using the conserved NB-ARC domain (Pfam: PF00931) as a query. Researchers typically employ PfamScan.pl HMM search script with a default e-value cutoff (1.1e-50) using the background Pfam-A_hmm model [3] [86]. All genes containing the NB-ARC domain are initially considered NBS genes and filtered for further analysis.
Complementary BLAST searches provide additional validation. Local BLASTp analyses (BLAST+ v2.0) are conducted against reference NLR protein sequences from model plants like Arabidopsis thaliana, Oryza sativa, and other relevant species, applying a stringent E-value cutoff of 1e-10 [86]. Candidate sequences identified through both methods are extracted using bioinformatics tools like TBtools [86].
Protein domains are characterized using InterProScan and NCBI's Batch CD-Search, with sequences containing the NB-ARC domain (E-value ≤ 1e-5) retained as bona fide NLR genes [86]. Final classification is performed by querying the Pfam and PRGdb 4.0 databases, with genes categorized based on their complete domain architecture [86]. Classification follows established systems that place similar domain-architecture-bearing genes under the same classes [3].
Evolutionary analyses employ OrthoFinder v2.5.1 for orthogroup inference, utilizing the DIAMOND tool for fast sequence similarity searches and the MCL clustering algorithm for gene clustering [3]. Orthologs and orthogrouping are carried out with DendroBLAST, while multiple sequence alignment is performed using MAFFT 7.0 [3]. Gene-based phylogenetic trees are constructed using maximum likelihood algorithms implemented in FastTreeMP with 1000 bootstrap replicates [3].
For specific plant families, researchers often employ additional analyses. In the Solanaceae family, the DupGen_finder (v1.0) program is used to classify gene duplication types, including whole genome duplications (WGD), tandem duplications (TD), proximal duplications (PD), transposon-related duplications (TRD), and dispersed segmental duplications (DSD) [87].
Expression profiling of NBS genes involves analyzing RNA-seq data from various tissues under biotic and abiotic stresses. Researchers typically retrieve FPKM values from specialized databases such as:
The RNA-seq data is categorized into three types: (1) tissue-specific (leaf, stem, flower, pollen, etc.), (2) abiotic stress-specific (dehydration, cold, drought, heat, etc.), and (3) biotic-stress specific (responses to various pathogens) expression profiling [3]. Data processing follows established transcriptomic pipelines, with final visualization using heatmaps to illustrate differential expression patterns.
Table 3: Essential Research Reagents and Resources for NBS Gene Analysis
| Resource Category | Specific Tool/Resource | Primary Function | Application Context |
|---|---|---|---|
| Genome Databases | NCBI, Phytozome, Plaza | Access to genome assemblies and annotations | Initial data retrieval for comparative analyses |
| Domain Analysis | PfamScan, InterProScan, HMMER | Identification of conserved protein domains | NBS gene identification and classification |
| Orthogroup Analysis | OrthoFinder v2.5.1, DIAMOND | Orthogroup inference and sequence similarity search | Evolutionary analysis and conserved gene family identification |
| Phylogenetic Analysis | MAFFT, FastTreeMP, MEGA | Multiple sequence alignment and tree construction | Evolutionary relationship reconstruction |
| Expression Analysis | IPF Database, CottonFGD, Cottongen | Access to tissue-specific and stress-induced expression data | Expression profiling under different conditions |
| Functional Validation | VIGS (Virus-Induced Gene Silencing) | Functional characterization of candidate NBS genes | In planta validation of gene function |
| Specialized Tools | RGAugury, PRGminer | Prediction of resistance gene analogs | Genome-wide R gene identification |
Recent advances include the development of deep learning-based tools like PRGminer, which provides a comprehensive approach to identifying and classifying R-genes that outperforms previous methods in terms of efficacy and precision [36]. PRGminer operates in two phases: Phase I predicts input protein sequences as R-genes or non-R-genes, while Phase II classifies the predicted R-genes into eight different classes based on domain architecture [36]. This tool achieves an accuracy of 98.75% in k-fold training/testing and 95.72% on independent testing, representing a significant improvement over traditional alignment-based methods [36].
Plants implement multiple mechanisms to control the transcript levels of NBS-LRR defense genes, as their high expression can be lethal to plant cells [13]. Diverse miRNAs target NBS-LRRs in eudicots and gymnosperms, functioning as negative transcriptional regulators. There is a tight association between NBS-LRR diversity and miRNAs, with miRNAs typically targeting highly duplicated NBS-LRRs [13].
The interaction between miRNAs and NBS-LRRs represents a co-evolutionary model where duplicated NBS-LRRs from different gene families periodically give birth to new miRNAs. Most newly emerged miRNAs target the same conserved, encoded protein motif of NBS-LRRs, particularly the P-loop region, consistent with a model of convergent evolution [13]. This regulatory system potentially allows plants to maintain extensive NLR repertoires without exhausting functional NLR loci, offsetting the fitness costs associated with NLR maintenance [3].
Figure 2: miRNA-mediated regulatory network for NBS-LRR genes. This co-evolutionary model illustrates how plants balance the benefits and costs of NBS-LRR defense genes.
Expression analyses demonstrate that NBS genes show specific upregulation under various stress conditions. In cotton, expression profiling revealed putative upregulation of specific orthogroups (OG2, OG6, and OG15) in different tissues under various biotic and abiotic stresses in susceptible and tolerant plants to cotton leaf curl disease (CLCuD) [3].
Similar patterns were observed in Dendrobium officinale, where transcriptome analysis following salicylic acid (SA) treatment identified 1,677 differentially expressed genes (DEGs), including six NBS-LRR genes that were significantly up-regulated [40]. One gene in particular, Dof020138, was closely associated with pathogen identification pathways, MAPK signaling pathways, plant hormone signal transduction pathways, and various biosynthetic and energy metabolism pathways [40].
Promoter analyses across multiple species have revealed an abundance of cis-acting elements in NBS genes related to plant hormones and abiotic stress, providing mechanistic insights into their stress-responsive expression patterns [41].
The comparative analysis of NBS genes across 34 plant species reveals distinct evolutionary patterns among major plant lineages. The minimal NLR repertoires in bryophytes and lycophytes suggest that substantial gene expansion occurred primarily in flowering plants [3]. This expansion has been driven by various mechanisms including whole-genome duplication (WGD) and small-scale duplications (SSD) encompassing tandem, segmental, and transposon-mediated duplications [3].
The complete absence of TNL-type genes in monocots, including orchids and grasses, represents a major lineage-specific evolutionary pattern [40]. This loss may be potentially driven by NRG1/SAG101 pathway deficiency in these lineages [40]. In contrast, most eudicots maintain both TNL and CNL subtypes, though with significant variation in their relative proportions. For example, in the Solanaceae family, analysis of nine species revealed 819 NBS-LRR genes, divided into 583 CNL, 182 TNL, and 54 RNL genes [87].
The reduction of NBS gene repertoires in domesticated crops compared to their wild relatives has significant implications for disease resistance breeding. Studies show that domesticated asparagus (A. officinalis) not only has fewer NLR genes than its wild relatives but also that the majority of preserved NLR genes in the cultivated species demonstrate either unchanged or downregulated expression following fungal challenge [86]. This suggests that the increased disease susceptibility of domesticated crops is driven by both the contraction of NLR gene repertoire and the functional impairment of retained NLR genes—likely a consequence of artificial selection favoring yield and quality traits over disease resistance [86].
The identification of core orthogroups conserved across multiple species provides valuable candidates for broad-spectrum resistance breeding. These evolutionarily conserved NBS genes may recognize conserved pathogen patterns or play fundamental roles in immune signaling cascades. Conversely, species-specific NBS genes offer insights into lineage-specific pathogen pressures and potential sources of specialized resistance.
Functional validation through approaches like virus-induced gene silencing (VIGS), as demonstrated with GaNBS (OG2) in resistant cotton, provides critical evidence for the putative role of specific NBS genes in disease resistance [3]. The strong interaction of putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus in protein-ligand and protein-protein interaction studies further supports their functional importance in pathogen recognition and signal transduction [3].
The comprehensive analysis of NBS gene repertoires across 34 plant species provides unprecedented insights into the evolution and diversification of plant immune receptor genes. The identification of 12,820 NBS-domain-containing genes classified into 168 architectural classes highlights the remarkable diversity of this gene family, while the discovery of 603 orthogroups reveals both conserved and lineage-specific evolutionary patterns.
The significant reduction of NBS gene repertoires in domesticated species underscores the importance of incorporating wild relatives in breeding programs to enhance disease resistance. The regulatory mechanisms controlling NBS gene expression, particularly miRNA-mediated regulation, represent crucial components of the plant immune system that balance effective pathogen defense with the fitness costs of maintaining these defense genes.
This comparative genomic framework provides a foundation for future research aimed at understanding plant adaptation mechanisms and offers valuable resources for developing disease-resistant crops through targeted breeding strategies. The integration of computational predictions with functional validation will be essential for translating these genomic insights into practical applications for crop improvement.
Plant immunity is fundamentally shaped by the evolution of specific gene families that enable recognition of pathogens. The Nucleotide-Binding Site (NBS)-Leucine-Rich Repeat (LRR) gene family represents one of the largest and most critical classes of plant resistance (R) genes, with approximately 80% of cloned R genes encoding proteins belonging to this family [22] [41]. These genes encode intracellular immune receptors that are central to the plant's effector-triggered immunity (ETI), which provides a robust, often race-specific defense response against adapting pathogens [89] [41].
The evolution of NBS-LRR genes across land plants reveals a dynamic history of gene expansion, diversification, and loss. Comparative genomic analyses across species from mosses to monocots and dicots have identified thousands of NBS-domain-containing genes with remarkable structural diversity [3]. These genes are typically classified based on their N-terminal domains into TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL) subfamilies [3] [41]. Phylogenetic studies show significant variation in the prevalence of these subfamilies across plant lineages, with notable losses of TNL genes in monocots and specific dicot species [22] [41]. This evolutionary landscape provides the essential context for understanding species-specific resistance mechanisms, such as those deployed against powdery mildew in cannabis and pepper.
Cannabis sativa L., an economically important crop for medicinal, recreational, and industrial purposes, faces significant production challenges due to powdery mildew (PM) disease, primarily caused by the biotrophic fungus Golovinomyces ambrosiae [90]. The plant's defense against this pathogen involves two primary genetic mechanisms: NBS-LRR-mediated resistance and loss-of-susceptibility (mlo-based) resistance [89].
Research indicates that cannabis NBS-LRR genes function as intracellular immune receptors that recognize pathogen-secreted effector proteins, triggering a defense cascade that often includes a hypersensitive response (HR) - characterized by localized cell death at infection sites - and the production of antimicrobial compounds [89] [90]. This recognition follows the gene-for-gene model, where specific R genes products directly or indirectly interact with complementary pathogen avirulence (Avr) effectors [89].
Table 1: Key Characteristics of Cannabis Powdery Mildew Resistance Loci
| Locus Name | Chromosomal Location | Resistance Type | Key Features | Molecular Markers |
|---|---|---|---|---|
| PM1 | Chromosome 2 (NC_044375.1) | Qualitative (NLR-mediated) | First reported R-gene in cannabis; contains conserved NBS-LRR domains | Not specified in studies |
| PM2 | Chromosome 9 (NC_083609.1) | Qualitative (NLR-mediated) | Single dominant locus; induces localized hypersensitive response; suppresses pathogen sporulation | SNP markers developed from associated SNPs |
The novel powdery mildew resistance locus PM2 was recently identified and characterized through an integrated approach combining bulked segregant analysis with RNA sequencing (BSR-Seq) [90]. The experimental methodology encompassed several key stages:
Population Development: Researchers developed F1 mapping populations by crossing PM-resistant parents (W03 and N88) with a susceptible cultivar (AC). The inheritance pattern observed in segregating populations (1:1 resistant to susceptible ratio) indicated that PM resistance is controlled by a single dominant locus [90].
Phenotypic Screening: A large diversity panel of 510 cannabis genotypes was evaluated for PM susceptibility using clone assays. Plants were inoculated via "dusting" with fungal spores from sporulating infected leaves and maintained at 23°C with 80% relative humidity for 48 hours, then at 70% RH for the remainder of the trial. Disease scoring occurred at 4 weeks post-inoculation [90].
BSR-Seq and Genetic Mapping: Resistant and susceptible bulks from F1 populations were subjected to RNA sequencing. Analysis of SNPs identified a major region on chromosome 9 associated with PM resistance, which was designated as the PM2 locus [90].
Functional Characterization: Histochemical analyses revealed that PM2-induced resistance is mediated by a highly localized hypersensitive response in the epidermal and mesophyll cells of infected leaves. This response involves accumulation of reactive oxygen species (ROS), particularly hydrogen peroxide (H₂O₂), leading to programmed cell death that restricts pathogen growth and sporulation [90] [91].
The following diagram illustrates the experimental workflow for PM2 locus identification:
Comprehensive analysis of the pepper (Capsicum annuum L.) genome has identified 252 NBS-LRR resistance genes distributed unevenly across all chromosomes, with 54% (136 genes) forming 47 physical gene clusters [22]. These clusters are primarily driven by tandem duplications and genomic rearrangements, highlighting the dynamic evolution of resistance genes in pepper.
Phylogenetic and structural analyses reveal a striking dominance of the non-TIR-NBS-LRR (nTNL) subfamily, which comprises 248 genes, over the TIR-NBS-LRR (TNL) subfamily, represented by only 4 genes [22]. This distribution reflects lineage-specific adaptations and evolutionary pressures. Structural characterization identified six conserved motifs within the NBS domain (P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL) that are essential for ATP/GTP binding and resistance signaling [22].
Table 2: Distribution and Classification of NBS-LRR Genes in Pepper
| Chromosome | Total NBS-LRR Genes | Gene Clusters | Notable Features |
|---|---|---|---|
| Chr 3 | 38 | 10 (largest: 8 genes) | Highest gene density and cluster number |
| Chr 2, 6 | 5 each | 0 (Chr 6) | Lowest gene count |
| Chr 12 | Not specified | Not specified | Highest subfamily diversity (TN, NL, NN, NLN, N) |
| Chr 0 (unassigned) | 31 | Not specified | Exclusively NB-ARC genes |
| Total Genome | 252 | 47 clusters | 136 genes (54%) in clusters |
While pepper is susceptible to various diseases including phytophthora blight and root-knot nematodes, powdery mildew caused by Leveillula taurica significantly impacts pepper development and growth [92]. Recent research has employed bulked segregant analysis combined with DNA re-sequencing (BSA-seq) to map resistance genes:
Population Development: An F₂ segregating population was constructed by crossing the highly resistant material "NuMex Suave Red" with the extremely susceptible material "c89" [92].
BSA-seq and QTL Mapping: BSA-seq analysis identified a major quantitative trait locus (QTL) located on chromosome 5 (7.20-11.75 Mb) associated with powdery mildew resistance [92].
Fine Mapping: Using InDel and KSAP molecular markers developed from the QTL region, researchers refined the candidate interval to 64.86 kb encompassing five genes [92].
Candidate Gene Identification: Among the genes in the mapped interval, the ubiquitin-conjugating enzyme E2 gene (Capana05g000392) showed significantly upregulated expression in multiple resistant materials. A critical single nucleotide polymorphism (SNP) at position 241 of the CDS sequence (A/G) results in an amino acid polymorphism (M/V) between susceptible and resistant parents, suggesting this gene as a robust potential factor against powdery mildew in pepper [92].
The following diagram illustrates the NBS-LRR mediated immune signaling pathway in plants:
The comparative analysis of NBS-LRR genes in cannabis and pepper reveals distinct evolutionary trajectories within the broader context of land plant evolution. Pepper demonstrates a remarkable expansion of nTNL genes (248 genes) with near-complete loss of TNL representatives (only 4 genes), while cannabis maintains functional TNL-type resistance genes as evidenced by the PM1 and PM2 loci [22] [90]. This pattern aligns with the observed differential evolution of NBS-LRR subfamilies across angiosperms, where some lineages show significant reduction or loss of specific subfamilies [41].
Both species exhibit clustered genomic arrangements of NBS-LRR genes, with pepper showing particularly high clustering (54% of genes in clusters). These clusters often arise from tandem duplications and generate reservoirs of genetic diversity for pathogen recognition [22]. The evolution of these gene families is driven by a combination of whole-genome duplications and small-scale duplications, including tandem, segmental, and transposon-mediated events [3].
The characterization of specific resistance loci and the comprehensive profiling of NBS-LRR gene families in cannabis and pepper provide valuable resources for molecular breeding programs:
Marker-Assisted Selection: The development of genetic markers for PM2 in cannabis and the identification of the Capana05g000392 polymorphism in pepper enable efficient tracking of resistance traits in breeding populations [92] [90].
Pyramiding Strategies: The identification of multiple resistance mechanisms (NLR-based and mlo-based) in cannabis allows for pyramiding different types of resistance genes to develop more durable resistance [89].
Engineering Broad-Spectrum Resistance: Understanding the structural and functional diversity of NBS-LRR genes facilitates engineering approaches, such as modifying LRR domains to alter recognition specificities [22] [36].
Harnessing Natural Diversity: The extensive natural variation in NBS-LRR genes across germplasm resources provides a foundation for identifying novel resistance specificities through eco-tilling and genome-wide association studies [22] [3].
Table 3: Research Reagent Solutions for Studying Powdery Mildew Resistance
| Reagent/Resource | Function/Application | Example Use in Case Studies |
|---|---|---|
| BSR-Seq (Bulked Segregant RNA-Seq) | Identification and mapping of resistance loci by combining transcriptome data with genetic analysis | Mapping of PM2 locus in cannabis [90] |
| BSA-seq (Bulked Segregant Analysis with sequencing) | QTL mapping using DNA sequencing of pooled extremes from a segregating population | Identification of major QTL on chromosome 5 in pepper [92] |
| VIGS (Virus-Induced Gene Silencing) | Functional validation of candidate genes through targeted silencing | Validation of NBS-LRR gene function in tung tree and cotton [3] [38] |
| Genetic Markers (SNPs, InDels) | Tracking resistance alleles in breeding programs | SNP markers for PM2 introgression in cannabis [90] |
| HMMER Software | Identification of NBS-domain-containing genes in genome sequences | Comprehensive identification of NBS-LRR genes in pepper and tung tree [22] [38] |
| PRGminer | Deep learning-based prediction and classification of resistance genes | High-throughput identification of R genes in newly sequenced genomes [36] |
The case studies of powdery mildew resistance in cannabis and pepper defense mechanisms exemplify the evolutionary innovation of NBS domain genes in plant immunity. While both species utilize NBS-LRR genes as central components of their defense arsenals, they exhibit distinct genomic distributions and evolutionary histories of these gene families. The characterization of specific resistance loci (PM1, PM2 in cannabis; Capana05g000392 in pepper) provides not only insights into molecular mechanisms of disease resistance but also practical tools for crop improvement. As research continues to unravel the complex evolutionary dynamics of NBS genes across land plants, these findings contribute to a broader understanding of plant-pathogen co-evolution and the development of sustainable disease management strategies through molecular breeding and genetic engineering.
Plant immunity relies heavily on a sophisticated intracellular surveillance system mediated by nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins. These molecular sentinels detect pathogen-derived effector molecules and initiate robust defense responses, culminating in effector-triggered immunity (ETI). The central mechanism underlying this pathogen recognition lies in specific protein-ligand interactions between plant NBS-LRR receptors and their cognate pathogen effectors. Understanding the structural basis and specificity of these interactions provides crucial insights into plant defense mechanisms and evolutionary adaptation. Within the broader context of NBS domain gene evolution in land plants, these recognition interfaces represent dynamic evolutionary battlefields shaped by continuous host-pathogen co-evolution. This technical guide examines the molecular principles governing these specific interactions, the experimental methodologies for their characterization, and their evolutionary significance across plant lineages.
NBS-LRR proteins constitute one of the largest and most diverse gene families in plants, with complex domain architecture that facilitates their recognition and signaling functions.
Table 1: Major Domains of Plant NBS-LRR Proteins and Their Functions
| Domain | Structural Features | Primary Functions | Conserved Motifs |
|---|---|---|---|
| N-terminal | TIR or CC configuration | Signaling initiation; protein-protein interactions | TIR motifs (TNLs); CC motif (CNLs) |
| NBS (NB-ARC) | STAND family ATPase | Molecular switch; nucleotide-dependent conformational changes | P-loop, kinase-2, RNBS-A, RNBS-B, RNBS-C, RNBS-D |
| LRR | Tandem repeats forming β-sheet/α-helical structure | Effector recognition; molecular binding | Variable solvent-exposed residues; diversifying selection |
NBS-LRR genes are ancient in origin, found in non-vascular plants, gymnosperms, and angiosperms, with wide variation in copy number across species—from fewer than 100 to over 1,000 genes per genome [13] [61]. They evolve through a birth-and-death process characterized by frequent gene duplications, unequal crossing-over, and diversifying selection, particularly in the LRR region [61] [94]. Two evolutionary patterns are observed: type I genes evolve rapidly with frequent gene conversions, while type II genes evolve slowly with rare gene conversion events [13] [61].
Plant NBS-LRR proteins employ distinct molecular strategies for pathogen detection, ranging from direct binding to indirect surveillance mechanisms.
Direct recognition involves physical interaction between the NBS-LRR protein and the pathogen effector. Key evidence comes from several well-characterized systems:
In direct recognition, the LRR domain typically serves as the primary binding interface, with specificity determined by polymorphic residues in the solvent-exposed β-sheets. This creates a highly variable molecular surface capable of recognizing diverse pathogen ligands.
The guard hypothesis proposes that NBS-LRR proteins monitor ("guard") host cellular components that are modified by pathogen effectors. Key examples include:
This indirect mechanism allows plants to monitor a limited number of host targets while detecting multiple effectors that converge on the same cellular components, providing an efficient surveillance strategy.
Figure 1: Direct vs. Indirect Effector Recognition Mechanisms
The evolutionary arms race between plants and pathogens has shaped the diversification of NBS-LRR genes and their recognition specificities across land plants.
NBS-LRR genes exhibit distinctive evolutionary patterns across plant lineages:
Table 2: Evolutionary Patterns of NBS-LRR Genes Across Plant Lineages
| Plant Group | NBS-LRR Repertoire | Distinctive Features | Evolutionary Mechanisms |
|---|---|---|---|
| Bryophytes (e.g., Physcomitrella patens) | ~25 NLR genes | Minimal repertoire | Ancient origins; limited diversification |
| Cereals/Monocots | Hundreds of genes; highly variable | Absence of TNL subclass | Lineage-specific loss; CNL expansion |
| Dicots | Hundreds to thousands of genes | Both TNL and CNL subclasses | Birth-and-death evolution; tandem duplications |
| Tung Trees (Vernicia species) | 90-149 genes | Domain loss events in susceptible species | Differential selection; promoter evolution |
Plants have evolved sophisticated regulatory mechanisms to manage the expression of NBS-LRR genes, balancing effective defense with autoimmunity costs:
Characterizing protein-ligand interactions in NBS-LRR effector recognition requires multidisciplinary approaches and carefully controlled experiments.
The pioneering study on the potato Rx NBS-LRR protein established a powerful framework for domain interaction analysis:
Experimental Workflow:
Key Findings:
Figure 2: Experimental Workflow for Rx Functional Complementation Studies
Multiple biochemical and genetic approaches are employed to characterize recognition interfaces:
While no full-length plant NBS-LRR structures are available, several approaches provide structural insights:
Table 3: Essential Research Reagents and Resources for Studying NBS-LRR Recognition
| Reagent/Resource | Specifications | Research Application | Example Implementation |
|---|---|---|---|
| HMMER Software | HMMER v3.1b2 with Pfam models (NBS: pfam00931) | Identification of NBS-domain containing genes from genomic data | Identification of 274 NBS-LRR genes in grass pea genome [96] |
| Agrobacterium Transformation System | Agrobacterium tumefaciens strains GV3101, LBA4404 | Transient expression in N. benthamiana for functional assays | Rx domain complementation assays [95] |
| Epitope Tag Systems | HA, FLAG, GFP tags for protein detection | Protein localization, interaction studies, and immunoprecipitation | HA-tagged Rx domains for co-immunoprecipitation [95] |
| VIGS Vectors | TRV-based (Tobacco Rattle Virus) vectors | Functional validation through targeted gene silencing | Validation of Vm019719 function in tung tree Fusarium resistance [38] |
| Yeast Two-Hybrid System | GAL4-based or split-ubiquitin systems | Protein-protein interaction mapping | Direct interaction between L and AvrL567 proteins [93] |
| OrthoFinder Pipeline | OrthoFinder v2.5.1 with DIAMOND and MCL | Evolutionary analysis and orthogroup identification | Classification of 12,820 NBS genes into 168 architectural classes [3] |
Protein-ligand interactions governing effector recognition specificity in plant NBS-LRR proteins represent a sophisticated molecular interface shaped by evolutionary arms races. The structural modularity of NBS-LRR proteins, with distinct signaling and recognition domains, enables both direct and indirect detection mechanisms while maintaining conserved activation pathways. The evolutionary dynamics of these genes—characterized by birth-and-death evolution, lineage-specific expansions, and regulatory adaptations—highlight their central role in plant-pathogen coevolution across land plants. Experimental approaches combining functional complementation, interaction mapping, and genomic analyses continue to reveal the intricate molecular logic underlying these recognition specificities. As research progresses, integrating structural biology with evolutionary genomics promises to unlock new strategies for engineering durable disease resistance in crop plants, informed by natural diversity and recognition mechanisms refined over millions of years of plant evolution.
The nucleotide-binding site (NBS) domain gene family represents a cornerstone of the plant immune system, encoding intracellular receptors responsible for pathogen recognition and activation of defense responses [3]. These genes, often characterized by their canonical NBS-leucine-rich repeat (LRR) architecture, play a pivotal role in effector-triggered immunity (ETI), enabling plants to detect pathogen effector proteins and initiate robust defense cascades [41]. The evolutionary dynamics of NBS genes, driven by duplication events and selective pressures, have resulted in substantial diversification across land plants, creating complex repertoires that underlie species-specific adaptation to pathogens [3].
Orthogroup analysis has emerged as a powerful computational framework for elucidating evolutionary relationships among genes across multiple species. By clustering genes into groups descended from a single gene in the last common ancestor of the species being considered, this approach enables systematic identification of core conserved genes and species-specific innovations [97] [34]. Applied to NBS domain genes, orthogroup analysis reveals fundamental insights into the evolutionary mechanisms that have shaped plant immunity from early land plants to modern angiosperms. This technical guide examines the methodology, findings, and implications of orthogroup analysis in decoding the complex evolutionary history of NBS genes across the plant kingdom.
Orthogroup inference represents a critical bioinformatics workflow for comparative genomics, with several sophisticated algorithms available for large-scale analyses. OrthoFinder has established itself as a benchmark tool, employing a comprehensive phylogenetic approach that infers orthogroups, gene trees, the rooted species tree, and gene duplication events [34]. The algorithm utilizes DIAMOND for rapid sequence similarity searches, followed by clustering with the MCL algorithm and phylogenetic analysis using DendroBLAST [3] [34]. Benchmarking through the Quest for Orthologs initiative has demonstrated OrthoFinder's superior accuracy in ortholog inference, outperforming other methods by 3-30% on standardized tests [34].
For projects involving hundreds or thousands of genomes, FastOMA provides a scalable alternative with linear time complexity, enabling analysis of thousands of eukaryotic genomes within 24 hours [98]. This method leverages OMAmer for k-mer-based placement of sequences into hierarchical orthologous groups (HOGs), followed by taxonomy-guided resolution of nested gene families [98]. While maintaining high precision comparable to OMA, FastOMA dramatically reduces computational requirements through innovative algorithms that avoid all-against-all sequence comparisons [98].
Visualization and interpretation of orthogroup results are facilitated by tools like OrthoBrowser, which creates interactive static websites for exploring phylogenies, gene trees, multiple sequence alignments, and synteny relationships [97]. This platform enhances accessibility to complex orthogroup datasets, enabling researchers to identify, interact with, and share information about gene families of interest without requiring advanced computational expertise [97].
A standardized workflow for NBS gene orthogroup analysis encompasses multiple stages of data processing and validation (Figure 1). The initial step involves identification of NBS domain-containing genes across target species using Hidden Markov Model (HMM) searches with Pfam domain models (e.g., NB-ARC domain PF00931) at stringent e-value thresholds (e.g., 1.1e-50) [3]. Subsequent domain architecture classification categorizes genes into structural classes (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and identifies species-specific patterns through systematic domain annotation [3] [41].
Figure 1: Workflow for Orthogroup Analysis of NBS Genes
The classification system differentiates between typical NBS-LRR proteins (containing both N-terminal and LRR domains) and atypical forms with truncated architectures [41]. Genes are further subdivided based on N-terminal domain presence into TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL) subclasses [41] [22]. This comprehensive structural characterization provides the foundation for meaningful orthogroup inference and evolutionary interpretation.
Comprehensive analysis of NBS genes across 34 plant species spanning evolutionary lineages from bryophytes to higher eudicots has identified 12,820 NBS-domain-containing genes, illuminating patterns of gene family expansion and structural diversification [3]. These genes distribute into 168 distinct domain architecture classes, encompassing both canonical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and numerous species-specific configurations (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS) [3]. The remarkable diversity in domain architecture underscores the dynamic evolutionary processes that have shaped NBS gene repertoires through domain shuffling, duplication, and functional innovation.
Orthogroup analysis of these sequences resolved 603 distinct orthogroups (OGs), comprising both core orthogroups (OG0, OG1, OG2) conserved across multiple lineages and unique orthogroups (OG80, OG82) restricted to specific species [3]. The prevalence of tandem duplications within these orthogroups highlights a key mechanism for NBS gene family expansion and adaptation to rapidly evolving pathogen pressures [3]. These findings align with similar patterns observed in taxon-specific studies, where pepper (Capsicum annuum) genomes revealed 252 NBS-LRR genes with 54% physically clustered in 47 genomic regions, and Salvia miltiorrhiza exhibited 196 NBS-domain-containing genes [41] [22].
Comparative analysis across diverse plant lineages reveals striking variation in NBS gene subfamily composition and evolutionary trajectories (Table 1). Eudicot species generally maintain both CNL and TNL subfamilies, though with substantial lineage-specific differences in relative proportions [41] [22]. Monocot species, including Oryza sativa and Triticum aestivum, exhibit complete absence of TNL genes, representing a major lineage-specific loss [41]. Gymnosperms such as Pinus taeda display contrasting patterns with dramatic expansion of TNL subfamilies, comprising 89.3% of typical NBS-LRR genes [41].
Table 1: Evolutionary Distribution of NBS-LRR Gene Subfamilies Across Plant Lineages
| Plant Species/Lineage | Total NBS Genes | CNL Subfamily | TNL Subfamily | RNL Subfamily | Notable Evolutionary Patterns |
|---|---|---|---|---|---|
| Arabidopsis thaliana | 207 [41] | ~61% [41] | ~36% [41] | ~3% [41] | Balanced subfamily representation |
| Oryza sativa (Rice) | 505 [41] | 100% [41] | 0% [41] | 0% [41] | Complete loss of TNL and RNL |
| Salvia miltiorrhiza | 196 [41] | 75 CC-domain [41] | 2 [41] | 1 [41] | Marked reduction in TNL/RNL |
| Capsicum annuum | 252 [22] | 48 CC-domain [22] | 4 [22] | 1 RNL-like [22] | Dominance of nTNL (98.4%) |
| Pinus taeda | 311 [41] | ~10.7% [41] | ~89.3% [41] | - | Dramatic TNL expansion |
| Early Land Plants | ~25 [3] | Limited repertoire | Limited repertoire | Limited repertoire | Ancestral compact NLR repertoire |
The medicinal plant Salvia miltiorrhiza exemplifies particularly extreme subfamily distribution, with only 2 TNL and 1 RNL genes identified among 196 NBS-domain-containing genes [41]. This pattern extends across the Salvia genus, with comparative analysis of five Salvia species revealing complete absence of TNL subfamily members and minimal RNL representation [41]. These findings suggest distinct evolutionary pressures in certain lineages that have driven the contraction or loss of specific NBS subfamilies, possibly compensated by expansion and functional diversification of remaining subfamilies.
Functional analysis of NBS orthogroups through transcriptomic profiling provides critical insights into their roles in plant stress responses. Examination of orthogroup expression patterns across different tissues and stress conditions has identified putative functional specialization among conserved orthogroups [3]. OG2, OG6, and OG15 demonstrate particular significance, showing upregulated expression in various tissues under diverse biotic and abiotic stresses in cotton species with contrasting susceptibility to cotton leaf curl disease (CLCuD) [3].
These expression patterns suggest that core orthogroups may represent fundamental components of plant immune signaling networks, while species-specific orthogroups potentially contribute to specialized adaptation to lineage-specific pathogen pressures. Integration of expression data with orthogroup classification enables prioritization of candidate genes for functional validation and elucidates how evolutionary conservation correlates with functional importance in plant immunity.
Comparative analysis of genetic variation between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions revealed substantial differences in NBS gene sequences, with 6,583 unique variants in the tolerant genotype compared to 5,173 in the susceptible line [3]. This disparity in genetic variation suggests potential mechanisms underlying resistance specificity and highlights the contribution of species-specific NBS genes to pathogen recognition capabilities.
Protein-ligand and protein-protein interaction studies further demonstrated strong binding affinity between putative NBS proteins and ADP/ATP, consistent with the nucleotide-binding function of the NBS domain [3]. Importantly, interaction assays also identified specific associations between NBS proteins and core components of the cotton leaf curl disease virus, providing mechanistic insights into recognition specificity and resistance protein function [3].
Direct experimental validation of orthogroup functional predictions represents a critical step in establishing gene function. Virus-induced gene silencing (VIGS) of GaNBS (OG2) in resistant cotton demonstrated its essential role in viral pathogen response, with silenced plants showing increased virus titers and compromised resistance [3]. This functional confirmation establishes OG2 as a bona fide resistance orthogroup with conserved function in plant defense against viral pathogens.
The integration of orthogroup analysis with functional validation provides a powerful framework for prioritizing candidate genes for detailed mechanistic studies. This approach efficiently bridges computational predictions with experimental confirmation, accelerating the identification of evolutionarily conserved resistance genes with potential applications in crop improvement programs.
Table 2: Essential Computational Tools and Reagents for NBS Orthogroup Analysis
| Tool/Reagent | Category | Primary Function | Application in NBS Analysis |
|---|---|---|---|
| OrthoFinder [34] | Software | Phylogenetic orthology inference | Core orthogroup identification across species |
| FastOMA [98] | Software | Scalable orthology inference | Large-scale analyses (1000+ genomes) |
| OMAmer [98] | Software | k-mer-based sequence placement | Rapid homology detection and grouping |
| OrthoBrowser [97] | Software | Results visualization and exploration | Interactive orthogroup data exploration |
| Pfam HMM Models [3] | Database | Protein domain annotation | NBS domain identification (NB-ARC domain) |
| DIAMOND [3] | Software | Sequence similarity search | Rapid all-against-all sequence comparison |
| VIGS Vectors [3] | Experimental | Virus-induced gene silencing | Functional validation of NBS gene function |
| RNA-seq Libraries [3] | Experimental | Transcriptome profiling | Expression analysis of NBS orthogroups |
Orthogroup analysis has fundamentally advanced our understanding of NBS gene evolution in land plants, revealing both conserved principles and lineage-specific innovations in plant immune receptor repertoires. The identification of core orthogroups underscores the evolutionary conservation of essential immune signaling components, while species-specific orthogroups highlight the dynamic adaptation of plant immunity to diverse pathogen pressures. The integration of computational orthology inference with experimental validation provides a powerful framework for deciphering the complex evolutionary history and functional diversity of NBS genes.
Future research directions will benefit from the expanding genomic resources for non-model plant species and continued refinement of orthology inference methods capable of processing thousands of genomes. The application of structural phylogenomics, integrating protein structure prediction with orthogroup analysis, promises to enhance resolution of deep evolutionary relationships among NBS genes. Furthermore, leveraging orthogroup classifications to inform comparative functional studies across species will illuminate how evolutionary conservation and divergence translate to immune receptor function, ultimately enabling strategic manipulation of NBS genes for crop improvement and sustainable agriculture.
The evolution of NBS domain genes represents a remarkable story of plant adaptation, characterized by dynamic expansion, diversification, and sophisticated regulatory mechanisms. From their origins in early land plants to their specialized functions in modern crops, these genes have evolved through tandem duplications, whole-genome events, and lineage-specific adaptations, including the notable loss of TNL subfamilies in monocots. The development of advanced computational tools, particularly deep learning approaches, has revolutionized our ability to identify and classify these complex genes, while functional studies continue to reveal their crucial roles in pathogen recognition and defense signaling. The intricate balance plants maintain between robust immunity and the fitness costs of NBS gene expression, often mediated by miRNAs, highlights the sophistication of this evolutionary arms race. For biomedical and clinical research, understanding the molecular mechanisms of plant NBS genes offers unexpected insights into innate immunity principles that may inform human immune receptor studies. Future directions should focus on harnessing this knowledge for developing novel disease resistance strategies in crops, exploring potential applications in synthetic biology, and further investigating the conserved evolutionary principles that may bridge plant and animal immunity systems.