Evolution and Innovation: How NBS Domain Genes Shape Plant Immunity and Biomedical Potential

Genesis Rose Nov 27, 2025 452

This article provides a comprehensive analysis of the evolution of Nucleotide-Binding Site (NBS) domain genes, the largest class of plant resistance (R) genes.

Evolution and Innovation: How NBS Domain Genes Shape Plant Immunity and Biomedical Potential

Abstract

This article provides a comprehensive analysis of the evolution of Nucleotide-Binding Site (NBS) domain genes, the largest class of plant resistance (R) genes. We explore the foundational evolutionary trajectory of these genes from early land plants to angiosperms, highlighting major diversification events and lineage-specific adaptations. The piece details cutting-edge methodologies for NBS gene identification, from traditional HMM-based searches to novel deep learning tools, and addresses key challenges in their study, including annotation difficulties and transcriptional regulation. Finally, we present validation techniques and comparative genomic insights that reveal the functional roles of specific NBS genes and discuss the emerging implications of this knowledge for disease resistance breeding and its unexpected connections to biomedical research, particularly in understanding immune receptor functions.

From Ancient Origins to Modern Diversification: Tracing the Evolutionary Path of NBS Genes

The Genomic Expansion of NBS-LRR Genes from Bryophytes to Angiosperms

The evolutionary history of land plants is marked by their continuous adaptation to a pathogen-rich environment. Central to this adaptation is the expansion and diversification of intracellular immune receptors encoded by the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family. These genes, which constitute the largest class of plant disease resistance (R) genes, have undergone remarkable genomic changes from the early non-vascular plants to the diverse flowering plants we see today. This whitepaper traces the trajectory of NBS-LRR gene expansion, leveraging recent genomic studies to quantify this phenomenon and explore its functional implications for plant immunity. The investigation of these genes is not merely an academic exercise; it provides a fundamental resource for understanding the molecular basis of disease resistance and informs future crop breeding strategies [1] [2].

Evolutionary Trajectory and Genomic Distribution

From Simple Bryophyte Systems to Complex Angiosperm Repertoires

The NBS-LRR gene family originated in the common ancestor of all green plants, with early divergence into different subclasses [1]. However, the scale of this gene family differs dramatically across the plant kingdom.

Bryophytes and Lycophytes: Ancestral land plant lineages like the moss Physcomitrella patens and the lycophyte Selaginella moellendorffii possess relatively small NLR repertoires. Physcomitrella patens has approximately 25 NLRs, while Selaginella moellendorffii has a mere 2 NLRs [3]. This indicates that the substantial gene expansion observed today occurred primarily after the divergence of flowering plants.
Angiosperms: In contrast, flowering plants typically harbor hundreds of NBS-LRR genes. A recent angiosperm NLR atlas (ANNA) that includes over 300 angiosperm genomes has identified more than 90,000 NLR genes, comprising 18,707 TNLs, 70,737 CNLs, and 1,847 RNLs [3]. Another study analyzing 34 plant species identified 12,820 NBS-domain-containing genes, which were classified into 168 different domain architecture classes, highlighting the extensive diversification within this gene family [3].

Table 1: Genomic Content of NBS-LRR Genes Across Representative Plant Species

Plant Species	Group	Total NBS-LRR Genes	TNL	CNL	RNL	Key Reference
Physcomitrella patens	Moss	~25	Not Specified	Not Specified	Not Specified	[3]
Selaginella moellendorffii	Lycophyte	~2	Not Specified	Not Specified	Not Specified	[3]
Arabidopsis thaliana	Eudicot	149-159	94-98	50-55	(Included in total)	[4] [5]
Euryale ferox (Basal Angiosperm)	Angiosperm	131	73	40	18	[1]
Oryza sativa (Rice)	Monocot	553-653	0	553-653	(Included in total)	[4]
Secale cereale (Rye)	Monocot	582	0	581	1	[6]
Glycine max (Soybean)	Eudicot	319	Not Specified	Not Specified	Not Specified	[4]
Manihot esculenta (Cassava)	Eudicot	228	34	128	(Not Specified)	[2]

Patterns of Gene Family Evolution

The expansion of NBS-LRR genes has not been uniform across all plant lineages. Independent gene duplication and loss events have resulted in distinct evolutionary patterns, even among closely related species.

Mechanisms of Expansion: Gene duplication acts as the primary engine for NBS-LRR expansion. This occurs through both whole-genome duplication (WGD) and small-scale duplications (SSD), the latter including tandem, segmental, and transposon-mediated duplications [3]. In the basal angiosperm Euryale ferox, segmental duplications were a major mechanism for the expansion of CNL and TNL subclasses, but not for RNL genes, which appear to have expanded via ectopic duplications [1].
Lineage-Specific Patterns: Studies of specific plant families reveal dynamic evolutionary histories.
- In the Rosaceae (e.g., apple, strawberry, peach), different species exhibit patterns of "first expansion and then contraction," "continuous expansion," or "early sharp expanding to abrupt shrinking" [7].
- In the Solanaceae (e.g., potato, tomato, pepper), potato shows "consistent expansion," tomato shows "expansion followed by contraction," and pepper displays a "shrinking" pattern [7].
- Aquatic, parasitic, and carnivorous plants have consistently undergone NLR gene contraction, suggesting that adaptations to these specialized lifestyles are associated with a reduction in the need for a large, diverse immune receptor repertoire [8].

Subclass Diversification and Genomic Architecture

Divergence of NBS-LRR Subclasses

Angiosperm NBS-LRR genes are phylogenetically divided into three major subclasses, each with distinct structural and functional characteristics [6].

TNL (TIR-NBS-LRR): Characterized by an N-terminal Toll/Interleukin-1 Receptor (TIR) domain. TNLs are completely absent from cereal genomes, suggesting a loss in the monocot lineage after its divergence from the dicots [5] [6]. The common ancestor of the three Nymphaeaceae species (a basal angiosperm group) already possessed a significant number of TNLs [1].
CNL (CC-NBS-LRR): Characterized by an N-terminal Coiled-Coil (CC) domain. CNLs are found in both monocots and dicots, and they often represent the vast majority of NBS-LRRs in monocot species, as seen in rye where 581 of 582 genes are CNLs [6].
RNL (RPW8-NBS-LRR): Characterized by an N-terminal Resistance to Powdery Mildew 8 (RPW8) domain. This is a smaller, more ancient subclass that can be divided into ADR1 and NRG1 clades. RNLs do not typically function as pathogen sensors but act as "helper NBS-LRRs" (hNLRs) that transduce immune signals downstream of TNL and CNL activation [1].

Table 2: Characteristics of the Major NBS-LRR Subclasses in Angiosperms

Feature	TNL Subclass	CNL Subclass	RNL Subclass
N-Terminal Domain	TIR (Toll/Interleukin-1 Receptor)	CC (Coiled-Coil)	RPW8 (Resistance to Powdery Mildew 8)
Primary Function	Pathogen recognition ("Sensor")	Pathogen recognition ("Sensor")	Signal transduction ("Helper")
Presence in Monocots	Absent	Predominant	Rare
Presence in Dicots	Widespread	Widespread	Widespread
Key Signaling Component	EDS1	Often NRC proteins	ADR1/NRG1

Genomic Organization and Cluster Evolution

A hallmark of NBS-LRR genes is their non-random genomic distribution, which has profound implications for their evolution.

Clustering and Tandem Duplications: NBS-LRR genes are frequently clustered in plant genomes, a result of both segmental and tandem duplications [4] [5]. For example, in cassava, 63% of its 327 R genes are organized into 39 clusters on its chromosomes [2]. Similarly, in Euryale ferox, 87 out of 131 genes are located in 18 multigene clusters [1]. These clusters are often homogeneous, containing genes from the same subclass and derived from a recent common ancestor, which facilitates rapid evolution of novel pathogen specificities through mechanisms like unequal crossing-over and gene conversion [4] [2].
Singleton Genes: A significant number of NBS-LRR genes exist as singletons, scattered throughout the genome outside of dense clusters. In Euryale ferox, 44 of its 131 genes were singletons [1].

Figure 1: Evolutionary Pathways of NBS-LRR Gene Expansion and Diversification. The diagram illustrates the divergence from an ancestral gene into major subclasses, followed by duplication mechanisms that create genomic clusters where recombination drives the evolution of new pathogen recognition capabilities.

Methodologies for Genome-Wide NBS-LRR Analysis

A standard pipeline for the genome-wide identification and characterization of NBS-LRR genes has been established and refined across multiple studies [1] [2] [6]. The following protocol details the key steps.

Protocol: Identification and Classification of NBS-LRR Genes

Step 1: Data Acquisition

Download the whole genome sequence assembly and its corresponding annotation file (typically in GFF3 or GTF format) from a public database such as Phytozome, Ensembl Plants, or a species-specific genome portal [7] [2].

Step 2: Initial Candidate Identification using HMMER and BLAST

Perform a Hidden Markov Model (HMM) search against the annotated protein sequences of the target species using the HMM profile for the NB-ARC domain (Pfam: PF00931). The hmmsearch tool from the HMMER package is typically used with a relaxed E-value threshold (e.g., 1.0) to cast a wide net [2] [6].
In parallel, conduct a BLASTp search using the sequence of the HMM profile or known NBS-LRR proteins as a query against the same protein dataset, also using a relaxed E-value (e.g., 1.0) [1] [6].
Merge the hits from both methods and remove redundant entries.

Step 3: Domain Verification and Subclassification

Subject the non-redundant candidate sequences to a second, more stringent round of domain verification using HMMscan against the Pfam database (E-value < 0.0001) or the NCBI Conserved Domain Database (CDD) [1] [6].
Confirm the presence of the NBS domain and identify associated N-terminal and C-terminal domains:
- TIR domain: Use Pfam model PF01582.
- Coiled-Coil (CC) domain: Use prediction tools like Paircoil2 with a P-score cutoff (e.g., 0.03), as it is not always identifiable by Pfam [2].
- RPW8 domain: Use Pfam model PF05659.
- LRR domains: Use models such as PF00560, PF07723, PF07725, and PF12799 [2].
Classify genes into TNL, CNL, or RNL subclasses based on the presence of these N-terminal domains.

Step 4: Analysis of Genomic Distribution and Duplication

Extract the genomic locations of all confirmed NBS-LRR genes from the annotation file.
Identify gene clusters. A common definition is two or more NBS-LRR genes located within a 250 kb window of each other [1] [6].
Classify genes not falling within any cluster as "singletons."

Step 5: Phylogenetic and Evolutionary Analysis

Extract the amino acid sequences of the conserved NBS domain from all genes.
Perform a multiple sequence alignment using tools like ClustalW or MAFFT.
Construct a phylogenetic tree using maximum likelihood methods (e.g., with IQ-TREE) after selecting the best-fit model with ModelFinder [1] [6].
Reconcile the gene tree with the species tree to infer ancestral NBS-LRR lineages and deduce gene duplication and loss events [7] [6].

Figure 2: Experimental Workflow for Genome-Wide Identification of NBS-LRR Genes. The flowchart outlines the bioinformatic pipeline from data acquisition to final profile generation, highlighting key steps of candidate identification, domain verification, and evolutionary analysis.

The Scientist's Toolkit: Key Research Reagents and Solutions

Table 3: Essential Resources for NBS-LRR Gene Family Research

Resource/Solution	Function/Application	Example Tools/Databases
Genomic Databases	Source for genome sequences and annotations.	Phytozome, Ensembl Plants, NCBI Genome, GDR (Genome Database for Rosaceae) [7] [2]
HMMER Suite	Profile HMM-based sequence search for identifying NBS domains.	`hmmsearch`, `hmmscan` (Pfam model PF00931) [2] [6]
Domain Analysis Tools	Verification of protein domains and classification.	NCBI CDD, Pfam, Paircoil2 (for CC domains), MEME (for motif discovery) [1] [2] [6]
Orthology Analysis Software	Inferring gene families and evolutionary relationships.	OrthoFinder, DendroBLAST [3]
Phylogenetic Software	Reconstructing evolutionary history and ancestral states.	IQ-TREE (ModelFinder, UFBoot), MEGA, FastTree [1] [6] [3]
Expression Databases	Profiling gene expression under various conditions.	IPF Database, CottonFGD, NCBI BioProject [3]
Functional Validation Tools	Testing gene function in planta.	Virus-Induced Gene Silencing (VIGS) [3]

The journey of NBS-LRR genes from compact repertoires in bryophytes to expansive, diverse families in angiosperms underscores their pivotal role in the evolutionary arms race between plants and their pathogens. This expansion, driven by varied duplication mechanisms and refined by natural selection, has equipped angiosperms with a sophisticated and adaptable immune system. The distinct evolutionary patterns observed across plant lineages—including contraction in specialized aquatic and parasitic species—reveal a complex interplay between genomic content, ecological adaptation, and life history. The continued application of standardized genomic and bioinformatic protocols, as outlined in this whitepaper, will be crucial for further elucidating the function of specific NBS-LRR genes. Ultimately, this knowledge serves as a cornerstone for future efforts in crop improvement and sustainable agriculture, enabling the development of disease-resistant plant varieties through informed breeding and biotechnological approaches.

The evolutionary history of Nucleotide-Binding Leucine-Rich Repeat (NBR-LRR) receptors reveals fundamental insights into plant immunity mechanisms that have diverged along moncot and dicot lineages. These intracellular immune receptors form a critical component of the plant innate immune system, enabling recognition of diverse pathogens through effector-triggered immunity [2]. The NBS-LRR gene family has undergone substantial lineage-specific evolution, culminating in the striking absence of entire receptor subclasses in certain plant families [9] [10]. This whitepaper examines the molecular basis and evolutionary implications of these divergent adaptations, focusing specifically on the loss of TNL genes in monocots and the subsequent functional diversification in both monocot and dicot lineages. Understanding these evolutionary trajectories provides crucial insights for plant immunity research and crop enhancement strategies.

Evolutionary Background and Phylogenetic Context

Monocot-Dicot Divergence

The evolutionary split between monocots and dicots represents a fundamental divergence in angiosperm history, estimated to have occurred approximately 200 million years ago (with an uncertainty of about 40 million years) based on chloroplast DNA sequence analysis [11]. This temporal framework establishes the timeline for subsequent lineage-specific adaptations in immune gene families. Genomic analyses consistently place Acorales as the sister lineage to all other extant monocots, making it a critical taxon for understanding early monocot evolution and genomic architecture [12].

NLR Gene Classification and Evolution

NLR genes encode a pivotal class of plant immune receptors that have undergone dynamic evolution through gene duplication, loss, and diversification. A novel classification system for angiosperm NLR genes, grounded in network analysis of microsynteny information, categorizes these genes into five distinct classes: CNLA, CNLB, CNL_C, TNL, and RNL [9]. This refined classification reveals the complex evolutionary history of NLR genes beyond the traditional grouping, enabling more precise tracking of lineage-specific adaptations.

Table 1: Classification of Plant NLR Genes

Class	N-Terminal Domain	Distribution	Characteristics
TNL	TIR (Toll/Interleukin-1 Receptor)	Dicots only	Lost in monocots; specific signaling pathway
CNL_A	Coiled-Coil (CC)	Angiosperms	Further subdivided in new classification
CNL_B	Coiled-Coil (CC)	Angiosperms	Monocot-specific expansions
CNL_C	Coiled-Coil (CC)	Angiosperms	Distinct evolutionary trajectory
RNL	CC or other	Angiosperms	Helper NLRs

The NBS domain itself can be divided into two major groups based on phylogenetic analysis. Group I NBS domains contain group-specific motifs that are always linked with TIR sequences at the N-terminus, while Group II NBS domains are always associated with putative coiled-coil domains in their N-terminus [10]. This fundamental division reflects deep evolutionary divergence in plant immune signaling pathways.

Diagram 1: Evolutionary trajectory of NLR genes following monocot-dicot divergence

The TNL Extinction in Monocots

Evidence for TNL Absence in Monocots

Comprehensive genomic analyses have revealed that TNL family genes are conspicuously absent in monocot genomes [9] [10]. This pattern was initially identified through phylogenetic reconstruction of NBS domains, which demonstrated that Group I NBS domains (always associated with TIR sequences) are widely distributed in dicot species but undetectable in cereal genomes [10]. Experimental evidence further confirmed that Group I-specific NBS sequences could be readily amplified from dicot genomic DNA but not from cereal genomic DNA [10].

Recent synteny-informed classification provides a model explaining this extinction event, with compelling microsynteny evidence indicating a clear correspondence between non-TNLs in monocots and the extinct TNL subclass [9]. This suggests that specific genomic regions in monocots have undergone fundamental reorganization following TNL loss.

Evolutionary Implications of TNL Loss

The loss of TNL genes in monocots represents a significant evolutionary event that has shaped subsequent immune receptor evolution. This extinction has potentially driven:

Compensatory evolution in remaining CNL classes
Alternative signaling pathway development
Distinct co-evolutionary dynamics with pathogens
Lineage-specific adaptations in immune recognition

The absence of TNLs in monocots implies that their cognate signaling pathways have diverged from those in dicots, suggesting fundamental differences in how these major plant lineages perceive and respond to pathogens [10].

Molecular Mechanisms and Evolutionary Drivers

Genomic Architecture and Evolution

NLR genes typically exist in large multigene families and are often organized in genomic clusters, which facilitates their rapid evolution through recombination and gene conversion [13] [2]. Studies across multiple plant species have revealed that these clusters vary in size and complexity, with most containing closely related genes derived from recent common ancestors [2].

Two distinct patterns of evolution have been identified among NBS-LRR genes: Type I genes are often represented by multiple paralogs in a genome and evolve rapidly with frequent gene conversions, while Type II genes typically have fewer paralogs, evolve slowly, and experience rare gene conversion events [13]. This differential evolutionary rate contributes to the diverse repertoire of pathogen recognition capabilities across plant lineages.

Regulatory Mechanisms and miRNA Co-evolution

Plants have evolved sophisticated regulatory mechanisms to control NBS-LRR gene expression, as high expression levels can be lethal to plant cells [13]. Diverse microRNAs (miRNAs) target NBS-LRRs in both eudicots and gymnosperms, creating a tight association between NBS-LRR diversity and miRNA regulation [13].

Table 2: miRNA Families Targeting NBS-LRR Genes

miRNA Family	Target Site	Distribution	Evolutionary Origin
miR482/2118	P-loop region	Gymnosperms to dicots	Prior to angiosperms
miR472	Multiple sites	Specific lineages	Younger, lineage-specific
miR6019	TIR-NBS-LRR	Dicots	Recent evolution
miR6020	TNL genes	Dicots	Recent evolution

The miRNAs typically target highly duplicated NBS-LRRs, while heterogeneous NBS-LRR families are rarely targeted by miRNAs in Poaceae and Brassicaceae genomes [13]. This suggests lineage-specific co-evolution between miRNAs and their NBS-LRR targets. New miRNAs periodically emerge from duplicated NBS-LRRs from different gene families, with most targeting the same conserved, encoded protein motif of NBS-LRRs, consistent with a model of convergent evolution [13].

Research Methodologies for NLR Gene Analysis

Genomic Identification and Annotation

Comprehensive identification of NBS-LRR genes in plant genomes involves multiple bioinformatic approaches:

Hidden Markov Model (HMM) searches using Pfam NBS (NB-ARC) domain (PF00931)
Cassava-specific HMM development for improved detection
Manual curation and functional annotation against reference databases
Conserved domain identification using hmmpfam comparisons to Pfam databases
Coiled-coil domain prediction using Paircoil2 with P-score cut-off of 0.03
Partial gene identification through BLAST searches against known NBS-LRR databases [2]

These methods have been successfully applied to catalog NBS-LRR genes in numerous plant species, including cassava, where 228 NBS-LRR type genes and 99 partial NBS genes were identified, representing almost 1% of the total predicted genes [2].

Phylogenetic and Microsynteny Analysis

Reconstructing evolutionary relationships among NLR genes requires:

Multiple sequence alignment of NB-ARC domains using clustalW
Phylogenetic tree estimation using Maximum Likelihood methods
Bootstrap analysis with 1000 replicates for node support
Microsynteny network analysis for classification
Orthology determination across species boundaries

These analyses have revealed that 63% of R genes in cassava occur in 39 clusters on chromosomes, with most clusters being homogeneous and containing NBS-LRRs derived from a recent common ancestor [2].

Diagram 2: Workflow for NLR gene identification and evolutionary analysis

Table 3: Essential Research Materials for NLR Gene Studies

Reagent/Resource	Function/Application	Example Specifications
HMMER Suite	Hidden Markov Model searches for domain identification	v3 with cassava-specific NBS HMM (E-value < 0.01)
Pfam Databases	Conserved domain identification	NBS (NB-ARC) PF00931, TIR PF01582, LRR models
ClustalW	Multiple sequence alignment	Default parameters for NB-ARC domain alignment
MEGA Software	Phylogenetic tree estimation	Maximum Likelihood, Whelan and Goldman + freq. model
Paircoil2	Coiled-coil domain prediction	P-score cut-off of 0.03
Jalview	Alignment curation and visualization	Manual curation of poorly aligned regions
Phytozome	Genome annotations and resources	Cassava v4.1/v5.0 genome data

The lineage-specific adaptations in plant NLR genes, particularly the loss of TNLs in monocots and their retention and diversification in dicots, exemplify the dynamic nature of plant immune system evolution. These divergent evolutionary trajectories have resulted in fundamentally different immune receptor repertoires between these major angiosperm lineages, with significant implications for plant-pathogen co-evolution. The emerging understanding of these patterns, facilitated by advanced genomic analyses and synteny-informed classification, provides crucial insights for future crop improvement strategies and enhances our fundamental knowledge of plant immunity evolution. Further comparative analyses across diverse plant lineages will continue to reveal the intricate interplay between genomic architecture, regulatory mechanisms, and pathogen pressure that has shaped the evolution of these critical immune receptors.

Structural Classification and Domain Architecture Diversity in Plant NLRs

Plant immunity relies on a sophisticated innate immune system that actively protects against pathogen invasion [14]. A crucial component of this system involves intracellular immune receptors known as Nucleotide-binding and Leucine-rich Repeat (NLR) proteins, which mediate effector-triggered immunity (ETI) upon pathogen recognition [14] [15]. NLRs function as molecular switches that perceive pathogen effector proteins and initiate robust immune responses, typically accompanied by programmed cell death termed the hypersensitive response [14]. The structural classification and domain architecture diversity of these NLR proteins have evolved through constant evolutionary arms races with rapidly adapting pathogens, resulting in tremendous genetic innovation that makes NLR-encoding genes among the most diverse in plant genomes [14]. This technical guide examines the structural principles governing NLR function, their evolutionary trajectories across plant species, and the experimental frameworks for their investigation, providing a comprehensive resource for researchers studying plant immunity.

NLR Domain Architecture and Structural Classification

Core Domain Organization

NLR proteins exhibit a conserved tripartite modular domain architecture that classifies them as STAND (Signal Transduction ATPases with Numerous Domains) proteins [14]. This canonical architecture consists of:

N-terminal domain: Functions primarily as a signaling domain that mediates downstream programmed cell death responses following immune receptor activation [14]. This domain displays the greatest variability and defines major NLR classes.
Central nucleotide-binding and oligomerization domain (NOD): In plant NLRs, this is exclusively an NB-ARC (Nucleotide-binding adaptor shared by APAF-1, certain R gene products, and CED-4) domain that operates as a molecular switch through ADP/ATP exchange [14].
C-terminal superstructure-forming repeats (SSFRs): Typically composed of leucine-rich repeat (LRR) domains that often determine pathogen perception and mediate critical autoinhibitory intramolecular interactions [14].

Table 1: Core Domains of Plant NLR Proteins

Domain	Structural Type	Primary Function	Key Features
N-terminal	Signaling domain	Mediates cell death response	Determines NLR classification; most variable region
NB-ARC	Nucleotide-binding domain	Molecular switch via ADP/ATP exchange	Conserved in plants; controls activation state
LRR	Superstructure-forming repeats	Pathogen recognition & autoinhibition	Hypervariable; under positive selection

N-terminal Domain Diversity and NLR Classification

The N-terminal signaling domains form the basis for classifying NLRs into distinct categories, with these classifications following the phylogeny of the NB-ARC domain, indicating a deep evolutionary origin [14]. Four main N-terminal domain types have been characterized in angiosperms:

Coiled-coil (CC)-type: Characterized by coiled-coil structural motifs that facilitate protein-protein interactions [14].
RESISTANCE TO POWDERY MILDEW 8 (RPW8)-type (CCR): Contains RPW8-derived domains with specific signaling properties [14].
G10-type CC (CCG10): A distinct subclass of CC domains with specialized functions [14].
Toll/interleukin-1 receptor-type (TIR): Containing TIR domains that often possess NADase activity [14].

In non-flowering plants, NLRs can carry additional N-terminal domain types, including α/β hydrolases and kinase domains, revealing even greater architectural diversity beyond flowering plants [14]. The recently generated RefPlantNLR collection of almost 500 experimentally validated NLRs illustrates the extensive structural diversity within this protein family [14].

Non-canonical Architectures and Integrated Domains

Beyond the canonical tripartite structure, many NLRs have diversified into specialized proteins with additional non-canonical domains or degenerated features [14]. These include:

Truncated NLR variants: Proteins lacking specific domains (e.g., NL, CN, RN, TN, or N) that retain functional classification despite their truncated architecture [16].
Integrated domains: Additional domains incorporated into the NLR structure that can act as sensor domains by binding pathogen effectors [17]. A prominent example is the heavy metal-associated (HMA) domain integrated into the architecture of the rice NLR Pik-1, which directly binds AVR-Pik effectors from the blast fungus [17].
Degenerated features: Various lineage-specific modifications that create functional specialization, including domain losses, fusions, and swaps [14].

Figure 1: Domain Architecture Diversity in Plant NLR Proteins. NLRs display both canonical tripartite structures and various non-canonical variants with truncated forms, integrated domains, or degenerated features.

Genomic Distribution and Evolutionary Patterns

NLR Expansion and Contraction Across Plant Lineages

NLR genes represent one of the most dynamic and rapidly evolving gene families in plants, showing remarkable variation in copy number across species [14] [3]. Comparative genomic analyses reveal:

Extensive copy number variation: NLR counts range from approximately 50 in watermelon (Citrus lanatus) and papaya (Carica papaya) to over 1,000 in apple (Malus domestica) and hexaploid wheat (Triticum aestivum) [14].
Lineage-specific expansions and contractions: These typically occur through tandem duplication and/or deletion events influenced by transposon content, ecological context, and adaptation to local pathogen pressures [14].
Domestication-associated reductions: Domesticated species often show significant contractions in NLR repertoire compared to wild relatives. For example, garden asparagus (Asparagus officinalis) contains only 27 NLR genes compared to 63 and 47 in its wild relatives A. setaceus and A. kiusianus, respectively [16].

Large-scale comparative studies have identified 12,820 NBS-domain-containing genes across 34 species spanning from mosses to monocots and dicots, classified into 168 distinct domain architecture classes [3]. This comprehensive analysis reveals both classical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR, etc.) and species-specific structural configurations (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS, etc.) [3].

Evolutionary Mechanisms Driving NLR Diversity

The dramatic expansion and diversification of NLR genes primarily result from three molecular mechanisms:

Tandem duplication: The predominant driver of NLR family expansion, creating local clusters that facilitate rapid generation of new resistance alleles [15]. In pepper (Capsicum annuum), tandem duplication accounts for 18.4% of NLR genes (53/288), with particularly high density on chromosomes 08 and 09 [15].
Segmental duplication: Larger-scale genomic duplications that distribute related NLRs across different chromosomal locations [15].
Retrotransposition: Although less common, this mechanism contributes to NLR dispersal throughout plant genomes [15].

These duplication events are followed by intense positive selection, particularly in the LRR domains, enabling continuous adaptation to evolving pathogen effectors [15]. The "arms race" with pathogens subjects NLR genes to strong diversifying selection, resulting in rapid coevolution and neo-functionalization [15].

Table 2: Evolutionary Mechanisms Driving NLR Diversity in Plants

Mechanism	Prevalence	Impact on NLR Repertoire	Examples
Tandem Duplication	Primary driver	Creates local clusters; rapid new allele generation	53/288 NLRs in pepper [15]
Segmental Duplication	Significant contributor	Distributes NLRs across chromosomes	Widespread in eudicots [3]
Retrotransposition	Less common	Disperses NLRs throughout genome	Limited documentation
Positive Selection	Widespread in LRR domains	Enhances effector recognition	Hypervariable LRR regions [15]

NLR Pairs and Networks: Higher-Order Architecture

Functional Specialization in Paired NLR Systems

Beyond functioning as singleton receptors, NLRs increasingly operate in genetically linked pairs or complex networks with functionally specialized components [14] [18]. In these higher-order configurations:

Sensor NLRs: Specialize in pathogen perception through direct or indirect effector recognition [14] [17].
Helper NLRs: Amplify and propagate immune signaling following sensor activation [14] [17].

The rice NLR pair Pik-1 and Pik-2 exemplifies this functional specialization, where Pik-1 acts as a sensor that binds AVR-Pik effectors through an integrated HMA domain, while Pik-2 functions as a helper NLR required for immune signaling activation [17]. This cooperative system demonstrates exquisite specificity, where matching pairs of allelic Pik NLRs mount effective immune responses, while mismatched pairs lead to autoimmune phenotypes [17].

Genomic Organization of NLR Pairs

Paired NLRs display diverse genomic architectures with varying functional constraints:

Head-to-head orientation: Commonly observed in linked NLR pairs, though not always essential for function. The wheat stripe rust resistance locus Yr84 contains an NLR pair arranged head-to-head, yet random insertion of the two genes into a susceptible variety still conferred resistance, indicating orientation flexibility [18].
Flexible spacing: The rice Pik-1 and Pik-2 pair are separated by approximately 2.5 kb, representing a relatively compact arrangement [17].
Cross-species transferability: Functional NLR pairs can be transferred between distantly related species, demonstrating evolutionary conservation. The pepper Rpi-blb2 and Rpi-blb1 NLR pair from Solanum bulbocastanum confers late blight resistance when transferred to potato, illustrating the potential for engineering resistance across taxonomic boundaries [18].

Figure 2: Evolution of NLR Systems from Singletons to Pairs and Networks. NLRs can function as individual receptors, specialized pairs, or complex networks with many-to-one and one-to-many sensor-helper connections.

Comparative Genomic Analyses Across Plant Families

Family-Specific NLR Repertoires

Comparative analyses reveal significant variation in NLR abundance and architecture across plant families, independent of genome size [19] [3]:

Fabaceae crops: Display substantial variation in NLR numbers, with the NB-ARC domain exhibiting preferential co-occurrence with a specific LRR domain (IPR001611) [19]. Classification into seven distinct classes (N, L, CN, TN, NL, CNL, and TNL) shows species-specific clustering within CN, TN, and CNL classes, reflecting diversification within Fabaceae [19].
Citrus species: Analysis of 10 citrus genomes identified 1,585 NLR genes classified into TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), RPW8-NBS-LRR (RNL), and NL categories [20]. Phylogenetic evidence suggests TNL, RNL, and CNL genes originated from NL genes through acquisition of TIR, RPW8 domains, and CC motifs, followed by random loss of corresponding domains [20].
Asparagus species: NLR genes in A. officinalis, A. kiusianus, and A. setaceus display chromosomal clustering patterns and can be categorized into three distinct subfamilies based on N-terminal domain classification [16].

Evolutionary Origins and Adaptive Radiation

Phylogenetic evidence indicates that NLR genes originated alongside their host species and underwent adaptive evolution that facilitated global colonization [20]. Several key evolutionary patterns emerge:

Ancient origin: NLR genes predate the evolutionary split between charophytes and land plants, with phylogenetic analyses suggesting their emergence significantly contributed to the transition of plants from aquatic to terrestrial habitats [20].
Horizontal gene transfer: In some species like Atlantia buxifolia, horizontal gene transfer (HGT) represents the principal mechanism increasing NLR copy number [20].
Diversifying selection: Positive selection analyses reveal consistent pressure on NLR genes, particularly in residues involved with effector recognition and specific subfunctionalized domains [17] [20].

Table 3: NLR Repertoire Size Variation Across Plant Species

Plant Species	Family	NLR Count	Notable Features
Arabidopsis thaliana	Brassicaceae	~150	Model for NLR studies [15]
Oryza sativa (rice)	Poaceae	~500	Well-characterized pairs [15] [17]
Capsicum annuum (pepper)	Solanaceae	288	High density on Chr09 [15]
Triticum aestivum (wheat)	Poaceae	>1,000	High NLR content [14]
Asparagus officinalis	Asparagaceae	27	Domesticated reduction [16]
Asparagus setaceus	Asparagaceae	63	Wild relative with expanded repertoire [16]
Citrus species (average)	Rutaceae	~160	Diverse architectures [20]

Experimental Approaches and Methodological Frameworks

Genome-Wide Identification and Annotation

Comprehensive identification of NLR genes requires integrated computational approaches:

Hidden Markov Model (HMM) searches: Using conserved NB-ARC domain (PF00931) profiles to identify candidate sequences across whole proteomes [15] [16] [20]. Typical e-value cutoffs of 1×10-5 to 1×10-10 ensure specificity while maintaining sensitivity [15] [3].
Homology-based searches: BLASTp algorithms with reference NLR proteins from model species (e.g., Arabidopsis thaliana) as queries, applying e-value cutoffs of 10-3 [15] [20].
Domain architecture validation: Tools like InterProScan, NCBI's Batch CD-Search, and NLR-Annotator software confirm domain composition and classify NLRs into specific architectural categories [15] [16] [20].

These methods typically identify candidate sequences containing NB-ARC domains, which are then validated for presence/completeness of N-terminal (TIR, CC, RPW8) and C-terminal (LRR) domains [15].

Evolutionary and Expression Analyses

Multiple computational frameworks support evolutionary and functional characterization:

Phylogenetic reconstruction: Multiple sequence alignment using Muscle v5, Clustal Omega, or MAFFT followed by maximum likelihood tree construction with IQ-TREE or MEGA software [15] [16].
Gene duplication analysis: Synteny analysis using MCScanX and visualization with TBtools identifies tandem and segmental duplication events [15].
Positive selection analysis: Likelihood ratio tests comparing M7 (beta) and M8 (beta and v) models using PAML software package identifies positively selected sites [20].
Expression profiling: RNA-seq analysis following pathogen infection identifies differentially expressed NLR genes, with typical thresholds of |log2 Fold Change| ≥ 1 and FDR < 0.05 [15].

Functional Validation Approaches

Several experimental methods confirm NLR function:

Virus-Induced Gene Silencing (VIGS): Knockdown of candidate NLRs like GaNBS (OG2) in resistant cotton demonstrates role in virus resistance [3].
Protein-protein interaction assays: String and PPI network predictions identify key interaction hubs among differentially expressed NLRs [15].
Heterologous expression: Transfer of NLR pairs into susceptible varieties confirms functionality, as demonstrated with wheat Yr84 locus genes [18].

Table 4: Essential Research Reagents and Resources for NLR Studies

Resource Category	Specific Tools	Function/Application	Key Features
Bioinformatics Tools	HMMER v3.3.2 [15]	Domain-based NLR identification	Hidden Markov Model searches
	NLR-Annotator v2.1 [20]	Automated NLR annotation	Standardized classification
	OrthoFinder v2.2.7 [16]	Orthogroup analysis	Evolutionary relationships
	MCScanX [15]	Synteny and duplication analysis	Identifies gene pairs and clusters
Databases	RefPlantNLR [14]	Curated NLR collection	~500 experimentally validated NLRs
	Pfam (PF00931) [15]	NB-ARC domain reference	Core domain identification
	PlantCARE [15] [16]	cis-element prediction	Promoter analysis
Experimental Resources	VIGS vectors [3]	Functional validation	Gene silencing in plants
	STRING database [15]	Protein interaction prediction	PPI network mapping
	PhytoPath [15]	Pathogen effector data	Effector-NLR interaction studies

The structural classification and domain architecture diversity of plant NLRs reflects continuous evolutionary innovation driven by plant-pathogen arms races. From canonical tripartite structures to specialized pairs and complex networks, NLR proteins exhibit remarkable architectural flexibility that enables specific pathogen recognition and robust immune activation. The evolutionary mechanisms of tandem duplication, positive selection, and domain integration have generated extensive NLR diversity across plant lineages, while maintaining core signaling functions through conserved NB-ARC domains. Methodological advances in genomic identification, phylogenetic analysis, and functional validation continue to reveal new dimensions of NLR structural diversity, providing insights for engineering disease resistance in crop species. Future research elucidating the structure-function relationships of non-canonical NLR architectures and their higher-order assemblies will further advance our understanding of plant immunity and its evolution.

The Role of Tandem Duplications and Gene Clusters in Resistance Gene Evolution

In the ongoing evolutionary arms race between plants and their pathogens, resistance (R) genes represent a critical line of defense. Among these, genes containing Nucleotide-Binding Site (NBS) and Leucine-Rich Repeat (LRR) domains form the largest and most extensively studied family, playing a pivotal role in pathogen recognition and activation of immune responses [21] [22]. The evolution of these NBS-LRR genes is characterized by extraordinary diversification, driven primarily by tandem gene duplications and the formation of genetically linked gene clusters [21] [22]. This dynamic genomic architecture enables plants to rapidly generate novel recognition specificities, allowing them to keep pace with evolving pathogenic threats. The NBS domain, highly conserved and responsible for ATP/GTP binding and hydrolysis, serves as a molecular switch for immune signaling, while the hypervariable LRR domain determines pathogen recognition specificity [22]. This review synthesizes current understanding of how tandem duplications and gene cluster organization have shaped the evolutionary trajectory of NBS domain genes in land plants, providing a framework for future research and biotechnological applications.

Quantitative Landscape of Resistance Gene Clusters

Genomic studies across plant species have revealed consistent patterns in the distribution and organization of NBS-LRR genes. These genes are frequently organized into complex clusters within plant genomes, with significant variation in cluster size, composition, and chromosomal distribution.

Table 1: Genomic Distribution of NBS-LRR Genes and Clusters in Selected Plant Species

Plant Species	Total NBS-LRR Genes Identified	Genes in Clusters (%)	Number of Clusters	Largest Cluster Size (Genes)	Chromosomal Distribution
Pepper (Capsicum annuum)	252	54% (136 genes)	47	8 genes (Chromosome 3)	All chromosomes, highest on Chr3 [22]
Barley (Hordeum vulgare)	1,199 LDPRs* identified	Significant association	Data Not Specified	Data Not Specified	Primarily subtelomeric regions [23]
Arabidopsis (Arabidopsis thaliana)	149 NBS-LRR genes	Data Not Specified	Data Not Specified	Data Not Specified	Genome-wide distribution [21]

*LDPRs: Long-Duplication-Prone Regions

The pepper genome illustrates a typical organizational pattern, with 252 identified NBS-LRR genes unevenly distributed across all chromosomes [22]. Chromosome 3 emerges as a particular hotspot, containing the highest number of genes (38) and the largest cluster comprising eight genes [22]. Notably, 54% of all NBS-LRR genes in pepper reside within 47 physical clusters, underscoring the prevalence of this genomic arrangement [22]. Similarly, recent analysis of the exceptionally repetitive barley genome has identified 1,199 Long-Duplication-Prone Regions (LDPRs) that show statistically significant associations with pathogen defense genes, indicating that natural selection has favored lineages where arms-race genes fall within these duplication-prone genomic regions [23].

Table 2: Classification and Structural Diversity of NBS-LRR Genes in Pepper

Gene Classification	Number of Genes	Percentage of Total	Key Structural Features
nTNL (non-TIR-NBS-LRR)	248	98.4%	Dominant class in pepper; includes CC-NBS-LRR
TNL (TIR-NBS-LRR)	4	1.6%	Minor class in pepper
Genes with CC domain	48	19.0%	Facilitate protein-protein interactions
Genes lacking both CC and TIR domains	200	79.4%	Highlight structural diversity
Gene Subclasses (Domain Structure)	7 subclasses identified	-	N, NL, NLL, NN, NLN, NLNLN, TN

The quantitative distribution reveals striking lineage-specific adaptations, with nTNL genes dramatically dominating over TNL genes in pepper (248 versus 4) [22]. This pattern reflects broader evolutionary trends observed across angiosperms, which show significant losses of TNL genes in monocots compared to dicots [22]. The structural classification further reveals six distinct nTNL subclasses based on domain architecture, with the NLNLN subclass represented by only a single gene, illustrating the diverse evolutionary trajectories possible within this gene family [22].

Molecular Mechanisms and Evolutionary Drivers

Tandem Duplications as Engines of Diversity

Tandem gene duplications occur frequently through mechanisms such as non-allelic homologous recombination (NAHR) and replication slippage, creating arrays of closely related genes [23]. These duplication events provide the raw genetic material for evolutionary innovation through several pathways:

Neofunctionalization: Redundant gene copies accumulate mutations that may lead to novel pathogen recognition specificities without compromising existing immune functions [23] [24].
Subfunctionalization: Different copies partition ancestral functions, allowing specialization of different aspects of immune response [24].
Dosage Effects: Increased copy number can enhance expression of defense response components, providing quantitative resistance benefits [23].

In barley, duplication-prone regions show a history of repeated long-distance dispersal to distant genomic sites, followed by local expansion by tandem duplication [23]. Often, the long tandemly duplicated motif differs between sites, suggesting these arise frequently throughout evolutionary history [23]. This dynamic creates a genomic environment where genes involved in arms races can form effectively cooperative associations with duplication-inducing sequences, representing an evolutionarily advantageous strategy at the lineage level [23].

Birth-and-Death Evolution and Selection Signatures

The NBS-LRR gene family evolves through a birth-and-death process characterized by continuous cycles of gene duplication, functional diversification, and pseudogenization [21]. Strong positive selection acts primarily on the LRR domains, particularly on solvent-exposed residues involved in direct protein-protein interactions, reflecting continuous adaptation to recognize evolving pathogen effectors [21] [22]. This diversifying selection maximizes the repertoire of recognition specificities available to counter diverse pathogenic threats.

Genomic Architecture and Cluster Organization

Gene clusters often include members from the same gene subfamily, but some clusters contain genes from different subfamilies, reflecting complex evolutionary histories [22]. In pepper, some clusters contain genes belonging to different subfamilies (CN, NL, and N) within the same cluster, indicating that non-homologous genes can become organized into functional units through genomic rearrangement [22]. This organizational pattern facilitates coordinated regulation and co-inheritance of functionally related genes, potentially enabling more rapid adaptive responses to pathogen pressure.

Experimental Approaches and Methodologies

Identification and Characterization of Resistance Gene Clusters

Experimental Workflow for NBS-LRR Gene Identification

The standard pipeline for comprehensive identification and characterization of NBS-LRR resistance genes involves multiple complementary approaches:

Sequence-Based Identification: Initial identification typically employs BLAST searches using known NBS domain sequences and Hidden Markov Model (HMM) searches against Pfam databases to identify conserved domains [22]. These searches target characteristic motifs including P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL motifs essential for ATP/GTP binding and resistance signaling [22].
Domain Architecture Analysis: Following identification, genes are classified based on their N-terminal domains using tools like COILS for coiled-coil domains and Pfam for TIR domains, categorizing them into TNL (TIR-NBS-LRR) or nTNL (non-TIR-NBS-LRR, including CNL) subfamilies [22].
Phylogenetic Reconstruction: A subset of conserved NBS domain sequences are selected for multiple sequence alignment and phylogenetic tree construction to elucidate evolutionary relationships and diversification patterns within the gene family [22].

Defining and Analyzing Gene Clusters

Evolutionary Dynamics of Gene Clusters

Operational definition of gene clusters varies but typically involves identifying two or more non-homologous genes in close genomic proximity that participate in a common biosynthetic or recognition pathway [24]. In practice, researchers often employ physical distance thresholds (e.g., genes within 200-500 kb) combined with functional relatedness criteria [22] [24]. Comparative genomic analyses across related species can further distinguish conserved clusters from lineage-specific arrangements, revealing evolutionary dynamics.

Advanced genome assembly approaches are crucial for accurate characterization of these regions. Long-read sequencing technologies (PacBio SMRT, ONT) combined with chromosome conformation capture (Hi-C) techniques have dramatically improved the contiguity and completeness of genome assemblies, enabling resolution of complex repetitive regions characteristic of gene clusters [25]. The barley study (MorexV3 assembly) exemplifies how high-quality genome resources enable explicit testing of evolutionary hypotheses regarding duplication-selection dynamics [23].

Functional Validation Approaches

Expression Profiling: RNA-seq and qPCR analyses under pathogen challenge conditions identify clusters with coordinated expression patterns, suggesting functional coordination [22].
Genetic Transformation: Complementation tests and overexpression in model systems validate functionality of individual cluster members.
Biochemical Studies: Enzyme activity assays and protein-protein interaction studies characterize the molecular functions of cluster-encoded proteins.

Research Tools and Reagent Solutions

Table 3: Essential Research Reagents and Resources for Resistance Gene Studies

Reagent/Resource	Specific Examples	Application and Function
Genome Assemblies	Barley MorexV3, Pepper CM334	Reference sequences for gene identification and synteny analysis [23] [22]
Software Tools	HMMER, Pfam, COILS, MEME, OrthoMCL	Domain identification, motif discovery, orthology assignment [22]
Sequencing Technologies	PacBio SMRT, Oxford Nanopore, Hi-C	Long-read sequencing for resolving repetitive regions; chromatin conformation for scaffolding [25]
Phylogenetic Software	MAFFT, MUSCLE, MrBayes, RAxML	Multiple sequence alignment and evolutionary inference [22]
Expression Analysis	RNA-seq, qPCR primers	Transcriptional profiling under pathogen challenge [22]

The organization of NBS-LRR resistance genes into tandemly duplicated clusters represents a fundamental evolutionary strategy that enables land plants to maintain diverse and adaptable detection systems against rapidly evolving pathogens. The dynamic birth-and-death evolution observed in these gene families, driven by continuous cycles of duplication, diversification, and selection, creates a genomic environment conducive to rapid innovation in pathogen recognition.

Future research directions will likely focus on leveraging this understanding for crop improvement. The discovery that duplication-inducing elements effectively cooperate with arms-race genes suggests new approaches for targeted breeding or genome editing to enhance disease resistance [23]. As genomic technologies continue to advance, particularly in long-read sequencing and telomere-to-telomere assembly, our ability to resolve complex resistance gene clusters will improve, revealing new dimensions of plant-pathogen coevolution.

The comprehensive characterization of NBS-LRR gene clusters across diverse land plants will further illuminate the evolutionary principles governing immune gene diversification, potentially enabling predictive approaches to disease resistance breeding in agricultural systems.

Phylogenetic Analysis Revealing Conserved and Diverged NBS Subfamilies

The nucleotide-binding site (NBS) domain represents a fundamental component of plant immune receptors, constituting one of the largest and most diverse gene families in plant genomes. Within the context of land plant evolution, NBS-containing genes have undergone remarkable expansion and diversification, driven by constant evolutionary arms races with rapidly evolving pathogens [3]. These genes typically encode proteins containing a nucleotide-binding site domain and a leucine-rich repeat (LRR) domain, collectively known as NBS-LRR genes or NLR genes, which function as critical intracellular immune receptors responsible for recognizing pathogen effector proteins and initiating effector-triggered immunity (ETI) [26] [27].

The evolutionary history of NBS genes reveals a complex tapestry of gene duplication, loss, and divergence events that have shaped the resistance gene repertoire across different plant lineages. Recent studies have demonstrated that NBS genes originated in ancestral land plants, with bryophytes like Physcomitrella patens containing relatively small NLR repertoires of approximately 25 genes, while flowering plants have experienced substantial gene family expansion, resulting in hundreds to thousands of NBS genes [3]. This expansion has been facilitated by both whole-genome duplication (WGD) events and small-scale duplications (SSD), including tandem and segmental duplications [28] [3].

Phylogenetic analyses of NBS domains across diverse plant species have revealed distinct evolutionary patterns, with two major subclasses characterized by N-terminal Toll/Interleukin-1 Receptor (TIR) or coiled-coil (CC) domains, termed TNL and CNL genes, respectively [3] [27]. A third subclass containing RPW8 domains (RNL) has also been identified, primarily functioning in signal transduction within the immune system [3]. The comparative analysis of these NBS subfamilies across species boundaries provides crucial insights into both conserved evolutionary patterns and lineage-specific adaptations that have occurred throughout plant evolution.

Methodological Framework for NBS Gene Identification and Phylogenetic Analysis

Identification of NBS Gene Families

The accurate identification of NBS-LRR genes within plant genomes requires a multi-step computational approach leveraging conserved protein domains and motif structures. The standard methodology involves:

Domain-Based HMMER Searches: Initial identification typically employs Hidden Markov Model (HMM) searches using HMMER software (v3.1b2 or later) with the PF00931 (NB-ARC) model from the PFAM database [28] [26]. This step identifies protein sequences containing the conserved nucleotide-binding domain. The search stringency is typically set with an E-value cutoff of 1 × 10⁻²⁰, though some studies apply less stringent thresholds (E-value < 0.01) followed by manual curation to identify divergent family members [26].

Domain Architecture Confirmation: Candidate sequences are subsequently scanned against additional domain databases to classify complete domain architectures. Key domains include:

TIR (Pfam: PF01582, PF00560, PF07723, PF07725, PF12779, PF13306, PF13516, PF13855, PF14580)
LRR (Pfam: PF03382, PF01030, PF05725)
CC domains (identified using NCBI Conserved Domain Database and Paircoil2 with P-score cutoff of 0.03) [28] [26]

Manual Curation and Validation: Automated predictions require manual verification to remove false positives (e.g., proteins with kinase domains but no NBS relationship) and to identify fragmented or misannotated genes through sequence extension and re-annotation [27]. This may involve extending gene models by 3 kb at both 5' and 3' ends to capture complete domain architectures.

Classification of NBS Genes

Based on domain architecture, NBS genes are classified into distinct subfamilies:

Comprehensive Classification System (8 categories):

CC-NBS (CN), CC-NBS-LRR (CNL), NBS (N), NBS-LRR (NL)
RPW8-NBS (RN), RPW8-NBS-LRR (RNL)
TIR-NBS (TN), TIR-NBS-LRR (TNL) [28] [29]

Simplified Classification Systems:

Solanaceae-specific: TNL (TIR-NBS-LRR) and non-TNL (non-TIR-NBS-LRR) [28]
Brassicaceae-specific: TNL, CNL, and RNL [28]

Phylogenetic Reconstruction

Sequence Alignment: Multiple sequence alignment of the NB-ARC domain regions is performed using MUSCLE v3.8.31 or MAFFT 7.0 with default parameters [28] [3]. The NB-ARC domain is typically extracted by counting 250 amino acids after the p-loop motif, and sequences with less than 90% of the full-length NB-ARC domain are excluded from analysis [26].

Tree Construction: Phylogenetic trees are inferred using maximum likelihood methods implemented in MEGA11 or FastTreeMP with 1000 bootstrap replicates to assess node support [28] [3]. The Whelan and Goldman + frequency model or Neighbor-Joining method with Nei-Gojobori evolutionary model are commonly employed [26] [28].

Orthogroup Analysis: OrthoFinder v2.5.1 with DIAMOND for sequence similarity searches and MCL clustering algorithm can identify orthogroups across multiple species, differentiating between core (conserved) and unique (lineage-specific) orthogroups [3].

Figure 1: Computational workflow for identification and phylogenetic analysis of NBS gene families in plants.

Comparative Genomic Distribution of NBS Genes

Variation in NBS Gene Repertoire Across Plant Lineages

The number of NBS genes exhibits remarkable variation across plant species, reflecting differential evolutionary pressures and diversification histories. Recent comparative analyses of 34 plant species identified 12,820 NBS-domain-containing genes, which were classified into 168 distinct domain architecture classes, revealing both conserved structural patterns and species-specific innovations [3].

Table 1: NBS-LRR Gene Distribution Across Selected Plant Species

Plant Species	Total NBS Genes	TNL Genes	CNL Genes	Other/Partial	Genome Reference
Akebia trifoliata	73	Not specified	Not specified	Not specified	[28]
Dioscorea rotundata	167	Not specified	Not specified	Not specified	[28]
Vitis vinifera	352	Not specified	Not specified	Not specified	[28]
Triticum aestivum	2,151	Not specified	Not specified	Not specified	[28]
Manihot esculenta (cassava)	327	34	128	165 partial	[26]
Solanum tuberosum (potato)	438	77	107	254 partial/other	[27]
Nicotiana tabacum	603	73	224	306 other	[28]
Nicotiana sylvestris	344	42	130	172 other	[28]
Nicotiana tomentosiformis	279	40	112	127 other	[28]
Arabidopsis thaliana	~150	Not specified	Not specified	Not specified	[27]

The data reveal substantial variation in NBS gene numbers, with early diverging land plants like the bryophyte Physcomitrella patens containing approximately 25 NLR genes, while angiosperm species typically possess hundreds to thousands of NBS genes [3]. Notably, the asterid species Solanum tuberosum (potato) contains 438 NB-LRR genes, while the closely related Nicotiana tabacum possesses 603 NBS genes, illustrating lineage-specific expansions even within the same family [28] [27].

Genomic Organization and Cluster Analysis

NBS genes are frequently organized in clusters throughout plant genomes, a genomic architecture that facilitates rapid evolution through mechanisms such as unequal crossing over and gene conversion. In cassava, 63% of the 327 identified NBS-LRR genes occur in 39 clusters distributed across the chromosomes, with most clusters being homogeneous (containing NBS-LRRs derived from a recent common ancestor) [26]. Similarly, in potato, the majority of the 438 predicted NB-LRR genes are physically organized within 63 identified clusters, with 50 being homogeneous [27].

This clustering pattern is conserved across plant lineages, though cluster composition and complexity vary. Homogeneous clusters typically contain closely related genes of the same type (e.g., all TNL or all CNL), while heterogeneous clusters contain phylogenetically distant NBS-LRR genes, sometimes including both TNL and CNL genes [26] [26]. The preferential location of NBS genes in clusters is thought to facilitate the generation of novel resistance specificities through recombination and diversifying selection.

Evolutionary Dynamics of NBS Gene Families

Duplication Mechanisms and Family Expansion

The expansion of NBS gene families has been driven by multiple duplication mechanisms, with varying contributions across plant lineages:

Whole-Genome Duplication (WGD): Paleopolyploidization events have contributed significantly to NBS gene family expansion. The Solanum lineage has experienced two consecutive genome triplications: one ancient event shared with rosids and a more recent one specific to this lineage [30]. These triplications established the genomic context for neofunctionalization of genes controlling various traits, including disease resistance components.

Small-Scale Duplications (SSD): Tandem duplications represent a major mechanism for NBS gene expansion, particularly in response to pathogen pressure. Comparative analyses of Nicotiana species revealed that whole-genome duplication contributed significantly to NBS gene family expansion, with 76.62% of NBS members in allotetraploid N. tabacum traceable to their parental genomes (N. sylvestris and N. tomentosiformis) [28].

Birth-and-Death Evolution: NBS gene families evolve through a process of birth-and-death evolution, where new genes are created by duplication and some duplicates are maintained while others are deleted or become pseudogenes [3]. This dynamic process generates substantial interspecific variation in NBS gene content and organization.

Selection Pressures and Diversification

The evolution of NBS genes is characterized by contrasting selection pressures acting on different protein domains:

Diversifying Selection: LRR domains involved in pathogen recognition typically experience positive selection that increases polymorphism at specific residues, facilitating recognition of evolving pathogen effectors [26]. This diversifying selection is particularly pronounced in solvent-exposed residues of the LRR domain that directly interact with pathogen proteins.

Purifying Selection: The NBS domain responsible for nucleotide binding and activation signaling is predominantly under purifying selection that conserves structural and functional integrity [26]. Similarly, signaling domains such as TIR and CC domains experience stronger evolutionary constraints.

Lineage-Specific Selection Patterns: Comparative analyses between tomato and potato identified 18,320 orthologous gene pairs, with 138 (0.75%) showing significantly higher than average non-synonymous versus synonymous substitution rate ratios (ω), indicating diversifying selection, while 147 (0.80%) showed significantly lower than average ω, indicating purifying selection [30]. The proportions of genes under diversifying selection were higher than those observed in grass species, suggesting distinct evolutionary dynamics in Solanaceae.

Figure 2: Evolutionary dynamics driving NBS gene family expansion and diversification in plants.

Structural and Functional Diversification of NBS Genes

Domain Architecture Variation

The NBS gene superfamily exhibits remarkable diversity in domain architecture, which correlates with functional specialization:

Classical Architectures: Most NBS genes conform to classical domain arrangements including TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL). In cassava, among 228 full-length NBS-LRR genes, 34 contained TIR domains and 128 contained CC domains at their N-termini [26]. Similarly, in potato, 77 of 438 NB-LRR genes contain TIR-like domains, while 107 of the remaining non-TIR genes contain CC domains [27].

Non-Canonical Architectures: Recent comparative analyses have identified numerous non-canonical domain architectures, including TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, and Sugar_tr-NBS, representing species-specific structural innovations [3]. These unusual architectures likely represent functional adaptations to specific pathogen pressures.

Lineage-Specific Patterns: Significant variation exists in the relative proportions of NBS gene subfamilies across plant lineages. Monocot species generally display reduced TNL representation compared to eudicots, despite the ancient origin of TNL genes predating the angiosperm-gymnosperm split [27] [29]. In the Asteraceae family (sunflower, lettuce, chicory), comparative analysis revealed distinct families of R-genes composed of genes related to both CC and TIR domain-containing NBS-LRR R-genes, with striking similarity in CC subfamily composition between closely related species (lettuce and chicory) [31].

Expression Diversity and Functional Specialization

NBS genes display complex expression patterns reflecting their functional specialization:

Constitutive vs. Induced Expression: Some NBS genes are constitutively expressed, providing constant surveillance, while others are induced only upon pathogen recognition. Expression profiling of orthogroups in cotton identified putative upregulation of specific orthogroups (OG2, OG6, OG15) in different tissues under various biotic and abiotic stresses in susceptible and tolerant plants to cotton leaf curl disease [3].

Tissue-Specific Expression: RNA-seq analyses across multiple species reveal that NBS genes display tissue-specific expression patterns, with some genes preferentially expressed in roots, leaves, or reproductive tissues, potentially reflecting tissue-specific pathogen challenges [3].

Pseudogenization and Functional Loss: Not all NBS genes retain functionality; many represent pseudogenes resulting from frameshift mutations, deletions, or insertions. In cassava, 99 partial NBS genes were identified alongside 228 complete NBS-LRR genes, representing potential pseudogenes [26]. The proportion of pseudogenes varies substantially across lineages, reflecting different evolutionary histories and selection pressures.

Experimental Validation and Functional Analysis

Functional Characterization Methods

Several experimental approaches are employed to validate the function of NBS genes identified through phylogenetic analyses:

Virus-Induced Gene Silencing (VIGS): VIGS has been successfully employed to validate NBS gene function. Silencing of GaNBS (OG2) in resistant cotton demonstrated its putative role in virus tittering, confirming its function in resistance to cotton leaf curl disease [3].

Heterologous Expression: Heterologous expression in model systems provides functional validation. For example, heterologous expression of a maize NBS-LRR gene improved resistance to Pseudomonas syringae in Arabidopsis thaliana [28]. Similarly, overexpression of a soybean TNL gene conferred broad-spectrum resistance to viral pathogens in soybean [28].

Differential Expression Analysis: RNA-seq datasets from infection time courses identify NBS genes responsive to specific pathogens. Analysis of tobacco responses to black shank (Phytophthora nicotianae) and bacterial wilt (Ralstonia solanacearum) identified numerous differentially expressed NBS genes, highlighting potential candidates for functional validation [28].

Genetic Variation and Resistance Association

Analysis of genetic variation in NBS genes between resistant and susceptible genotypes provides evidence for functional importance:

Comparative Genomics: Comparison between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified numerous unique variants in NBS genes, with Mac7 displaying 6,583 variants compared to 5,173 in Coker312 [3]. This differential variation suggests association with resistance phenotypes.

Protein Interaction Studies: Protein-ligand and protein-protein interaction analyses demonstrate strong interaction of putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus, providing mechanistic insights into recognition specificity [3].

Table 2: Essential Research Reagents and Resources for NBS Gene Analysis

Resource Category	Specific Tools/Databases	Application in NBS Research	Reference
Genome Databases	Phytozome, NCBI Genome, Plaza	Access to genome assemblies and annotations	[3] [26]
Domain Databases	PFAM, NCBI CDD, SMART	Identification of NBS, TIR, LRR, CC domains	[28] [26]
Software Tools	HMMER v3.1b2, MUSCLE, MEGA11	Domain search, alignment, phylogenetics	[28] [26]
Orthology Analysis	OrthoFinder v2.5.1, DIAMOND	Identification of orthogroups across species	[3]
Expression Databases	IPF Database, CottonFGD, NCBI SRA	Tissue-specific and stress-responsive expression	[3] [28]
Selection Pressure	KaKs_Calculator 2.0	Calculation of Ka/Ks ratios	[28]

Phylogenetic analysis of NBS genes across land plants has revealed complex evolutionary patterns characterized by both deeply conserved subfamilies and lineage-specific expansions. The NBS gene superfamily has evolved through a combination of whole-genome duplications, tandem duplications, and birth-and-death evolution, resulting in substantial variation in gene content across species. Structural diversification in domain architectures has generated specialized immune receptors adapted to recognize diverse pathogen effectors, while conserved NBS domains maintain core signaling functions across lineages.

The genomic organization of NBS genes into clusters facilitates rapid evolution through recombination and diversifying selection, particularly in residues involved in pathogen recognition. Comparative analyses across species boundaries have identified both core orthogroups conserved across angiosperms and lineage-specific innovations reflecting adaptation to distinct pathogen pressures. Functional validation through modern genomic tools has confirmed the role of specific NBS genes in disease resistance, providing potential targets for crop improvement.

Future research directions should include expanded comparative analyses incorporating more diverse plant lineages, particularly non-angiosperm species, to reconstruct the deep evolutionary history of plant immune receptors. Integration of structural biology approaches with phylogenetic analysis will further elucidate the molecular basis of pathogen recognition specificity. The continued development of pangenome resources for crop species and their wild relatives will empower more comprehensive surveys of NBS gene diversity, accelerating the discovery of novel resistance genes for agricultural applications.

Harnessing Advanced Tools for NBS Gene Discovery and Functional Characterization

The study of gene family evolution, particularly for disease resistance genes in plants, relies on a suite of sophisticated bioinformatics tools. Research on the evolution of Nucleotide-Binding Site (NBS) domain genes—a major class of plant disease resistance genes—exemplifies the powerful synergy between traditional sequence analysis methods and modern orthology inference platforms [3]. These genes are part of the larger NLR (Nucleotide-binding Leucine-Rich Repeat) family and are crucial for plant immune responses against pathogens [3]. Understanding their diversification from basal land plants like bryophytes to higher angiosperms requires comparative genomic analyses across diverse species, a process greatly accelerated by tools such as HMMER, BLAST, and OrthoFinder [3] [32]. This technical guide details the methodologies for identifying and classifying these genes, framing them within a broader evolutionary context and providing actionable experimental protocols for researchers.

Core Methodologies and Workflows

Hidden Markov Model (HMM) Searches for Domain-Centric Gene Identification

Principle and Application: Hidden Markov Models are probabilistic models used for identifying distantly related protein sequences based on conserved domain architecture. In studies of NBS domain gene evolution, HMM searches are the preferred initial step for identifying candidate genes across entire proteomes due to their high sensitivity in detecting conserved protein domains [3].

Experimental Protocol:

HMM Model Acquisition: Obtain the pre-built profile HMM for the protein domain of interest. The Pfam database is a primary resource. For NBS gene identification, the key domain is the NB-ARC domain (Pfam: PF00931).
Proteome Preparation: Gather the protein sequences for the species under investigation in FASTA format.
Domain Scanning: Use the PfamScan.pl script or the hmmsearch tool from the HMMER package to scan the proteomes against the Pfam-A.hmm model library.
Stringent Filtering: Apply a strict expectation value (e-value) cutoff to minimize false positives. A common threshold used in published studies is 1.1e-50 [3].
Architecture Classification: Extract all genes containing the NB-ARC domain and analyze their full domain architecture using tools like PfamScan to identify associated domains (e.g., TIR, LRR, CC). Genes can then be classified into architectural classes (e.g., TIR-NBS-LRR, CC-NBS-LRR) [3].

Table 1: Key Resources for HMM-based Gene Identification

Resource/Tool	Function	Specifications
Pfam Database	Repository of protein family HMM models	Provides the NB-ARC (PF00931) and other domain models [3].
HMMER Suite	Software for sequence homology searches	Includes `hmmsearch` for scanning sequences against a profile HMM database.
PfamScan Script	Utility for scanning sequences against Pfam HMMs	Often used with default parameters and a customized e-value cutoff [3].

BLAST and DIAMOND for Sequence Similarity Searches

Principle and Application: BLAST (Basic Local Alignment Search Tool) and its accelerated alternative DIAMOND use heuristic algorithms to find regions of local similarity between sequences. They are fundamental for tasks requiring rapid, large-scale sequence comparison, such as building input for orthology inference or functional annotation.

Experimental Protocol:

Database and Query Setup: Format the target proteome(s) as a BLAST database. Prepare the query sequences (e.g., a set of known NBS genes from a reference species).
Sequence Search: Execute a BLASTP or DIAMOND search of queries against the target database. DIAMOND is recommended for very large datasets due to its significantly higher speed [33] [34].
Parameter Tuning: Use a standard e-value cutoff (e.g., 0.001) to define significant hits. The DIAMOND tool in OrthoFinder uses a default e-value of 1e-3 [33].
Downstream Analysis: Use the results for functional inference or as the similarity input for orthogroup inference algorithms like OrthoMCL or the initial steps of OrthoFinder.

Table 2: Comparison of Sequence Similarity Search Tools

Tool	Primary Use Case	Speed	Typical E-value Cutoff
BLAST	Standard sequence similarity searches	Standard	0.001 [33]
DIAMOND	Ultra-fast large-scale searches	20,000x BLAST [34]	0.001 [33]

OrthoFinder for Phylogenetic Orthology Inference

Principle and Application: OrthoFinder is a sophisticated phylogenomics tool that infers orthogroups (sets of genes descended from a single gene in the last common ancestor) and orthologs. It moves beyond simple similarity by incorporating gene tree inference, providing a robust evolutionary framework for comparative studies [34]. It has been benchmarked as one of the most accurate methods for ortholog inference [34].

Experimental Protocol:

Input Preparation: Collect the complete protein sequences in FASTA format for all species in the analysis.
Running OrthoFinder: Execute OrthoFinder with a single command (e.g., orthofinder -f [proteome_directory]). The default workflow uses DIAMOND for all-vs-all sequence searches [3] [34].
Orthogroup Inference: OrthoFinder uses sequence similarity scores and the MCL algorithm to cluster genes into orthogroups [3] [34].
Phylogenetic Analysis: The tool automatically infers gene trees for each orthogroup, reconciles them to infer a rooted species tree, and maps gene duplication events [34].
Output Analysis: Key outputs include:
- Orthogroups.tsv: A file listing all orthogroups and their constituent genes.
- Orthogroups_Genes.tsv: A file listing all orthogroups and their constituent genes.
- Gene_Duplication_Events.tsv: The location and timing of duplication events on the species tree.
- Comparative genomics statistics such as gene duplication rates per branch [34].

OrthoFinder Phylogenomic Workflow

Integrated Analysis in Evolutionary Research: A Case Study of NBS Genes

A comprehensive study on the evolution of NBS domain genes in land plants provides a prime example of these tools working in concert [3]. The research aimed to understand the diversification of these immune receptors across 34 plant species, from mosses to dicots.

Gene Identification with HMMER: The study used PfamScan.pl with the NB-ARC domain HMM (PF00931) and a stringent e-value of 1.1e-50 to identify 12,820 NBS-domain-containing genes [3].
Classification: Genes were classified into 168 distinct domain architecture classes, revealing both classical and novel, species-specific structural patterns [3].
Orthology Inference with OrthoFinder: The identified NBS sequences were analyzed with OrthoFinder v2.5.1, which used DIAMOND for sequence similarity and the MCL algorithm for clustering. This identified 603 orthogroups (OGs), including core OGs conserved across species and unique OGs specific to certain lineages [3].
Evolutionary and Functional Validation: The study combined this analysis with transcriptomics to show differential expression of specific OGs under stress and used virus-induced gene silencing (VIGS) to validate the role of one OG in virus resistance [3].

This integrated approach demonstrates how HMM searches provide the initial gene set, while OrthoFinder places these genes into an evolutionary context, identifying conserved and lineage-specific elements that have shaped plant immunity.

Table 3: Key Reagent Solutions for Evolutionary Genomics of Gene Families

Category/Resource	Specific Tool / Database	Function in Research
Software & Pipelines	OrthoFinder [3] [34]	Phylogenomic orthology inference from protein sequences.
	HMMER / PfamScan [3]	Identification of genes based on protein domain content.
	DIAMOND [3] [34]	Ultra-fast sequence similarity search for large datasets.
	PlantTribes2 [35]	A specialized framework for gene family analysis in plants.
Databases	Pfam Database [3]	Curated collection of protein family HMM models.
	PLAZA [3] [35]	Platform for plant comparative genomics.
	NCBI Genome & BioProject [3]	Repository for genomic data and sequencing projects.
Computational Resources	Galaxy Workbench [35]	Web-based platform for accessible, reproducible bioinformatics.
	High-Performance Computing (HPC)	Essential for running genome-scale analyses in reasonable time.

The identification of plant resistance (R) genes is crucial for understanding plant immunity and breeding disease-resistant crops. Traditional methods for R-gene identification face challenges due to gene diversity, complex genomic structures, and low sequence homology. This whitepaper presents PRGminer, a deep learning-based tool that revolutionizes the prediction and classification of R-genes. We examine PRGminer's architecture, performance metrics, and practical applications within the broader context of nucleotide-binding site (NBS) domain gene evolution in land plants. The tool achieves exceptional accuracy (95.72-98.75%) in identifying R-genes and classifying them into specific structural categories, providing researchers with an efficient solution for high-throughput R-gene discovery.

Plant resistance genes (R-genes) encode proteins that recognize pathogen effectors and activate robust immune responses through effector-triggered immunity (ETI) [36] [37]. This represents the second layer of plant defense, complementing the initial pathogen-associated molecular pattern (PAMP)-triggered immunity (PTI) [36] [37]. Among R-genes, the nucleotide-binding site leucine-rich repeat (NBS-LRR) family constitutes the largest class, with proteins characterized by modular domains including an N-terminal Toll/interleukin-1 receptor (TIR) or coiled-coil (CC) domain, a central NBS domain, and C-terminal LRR domains [3] [38] [22].

The evolutionary expansion of NBS-encoding genes across land plants reveals remarkable diversification. Recent studies identified 12,820 NBS-domain-containing genes across 34 species from mosses to monocots and dicots, classified into 168 distinct domain architecture patterns [3]. This diversity presents significant challenges for traditional gene annotation methods, which often produce incomplete and fragmented annotations due to the unique genomic structure of R-gene clusters [36] [37].

The PRGminer Framework: Architecture and Methodology

PRGminer implements a sophisticated two-phase deep learning approach for R-gene identification and classification. The tool addresses critical limitations of alignment-based methods, which often fail with sequences exhibiting low homology [36] [37].

Two-Phase Analytical Workflow

The PRGminer workflow consists of two sequential phases:

Phase I: R-gene Identification

Input: Protein sequences
Process: Deep learning model predicts sequences as R-genes or non-R-genes
Output: Binary classification with exclusion of non-R-genes from further analysis

Phase II: R-gene Classification

Input: R-genes identified in Phase I
Process: Classification into eight specific R-gene classes
Output: Categorized R-genes with structural domain information

Table 1: PRGminer Performance Metrics

Phase	Training/Testing Method	Accuracy	MCC	Independent Testing Accuracy	Independent Testing MCC
Phase I	k-fold training/testing	98.75%	0.98	95.72%	0.91
Phase II	k-fold training/testing	97.55%	0.93	97.21%	0.92

Dataset Composition and Feature Engineering

PRGminer was trained on comprehensive datasets derived from public databases including Phytozome, Ensemble Plants, and NCBI [36] [37]. The training data underwent rigorous processing:

Redundancy reduction using CD-HIT
Domain-based filtering using information from Ensemble BioMart and Phytozome Biomart
Stratified splitting with 90% for k-fold training/testing and 10% for independent validation

Table 2: PRGminer Training Dataset Composition

Dataset Component	Sequence Count	Description
Phase I - R-genes	18,952	Sequences with known R-gene domains
Phase I - Non-Rgenes	19,212	Sequences without R-gene domains
Phase II - CNL	1,883	Coiled-coil-NBS-LRR sequences
Phase II - KIN	8,591	Kinase domain sequences
Phase II - Other classes	8,478	RLP, LECRK, RLK, LYK, TIR, TNL

For feature representation, dipeptide composition demonstrated superior performance with optimized computational pipelines processing large protein sequence datasets in approximately two minutes [39].

Evolutionary Context: NBS Domain Genes in Land Plants

Understanding PRGminer's significance requires examining the evolutionary landscape of NBS genes across plant species. Comparative genomic analyses reveal patterns of gene duplication, diversification, and loss that have shaped plant immune systems over millions of years.

Evolutionary Patterns and Diversification

NBS genes exhibit remarkable evolutionary dynamics across land plants:

Gene family expansion: Varies significantly among species, from approximately 25 NLRs in bryophytes like Physcomitrella patens to thousands in flowering plants [3]
Domain architecture diversity: 168 classes identified with both classical (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific structural patterns [3]
Differential distribution: TNL genes are absent in monocots and some eudicots, including Vernicia fordii and Sesamum indicum [38] [40]

Table 3: NBS-LRR Gene Distribution Across Selected Plant Species

Plant Species	Total NBS Genes	CNL-type	TNL-type	Unique Features
Vernicia montana	149	98 (65.8%)	12 (8.1%)	2 genes with both CC and TIR domains
Vernicia fordii	90	49 (54.4%)	0	Complete absence of TIR domains
Capsicum annuum	252	48 (19.0%)	4 (1.6%)	200 genes lack both CC and TIR domains
Dendrobium officinale	74	10 (13.5%)	0	Representative of monocot TNL absence
Arabidopsis thaliana	210	40 (19.0%)	Not specified	Reference eudicot genome

Genomic Organization and Evolutionary Mechanisms

NBS-LRR genes display distinctive genomic architectures that influence their evolution:

Clustered organization: 54% of pepper NBS-LRR genes form 47 physical clusters, with chromosome 3 containing the highest number (10 clusters) [22]
Tandem duplications: Major drivers of gene family expansion, creating arrays of similar sequences that facilitate rapid evolution [3] [22]
Regulatory mechanisms: miRNAs target highly duplicated NBS-LRRs, representing an evolutionary adaptation to balance benefits and costs of maintaining extensive R-gene repertoires [13]

Experimental Validation and Functional Characterization

Beyond computational prediction, experimental validation remains essential for confirming R-gene function. Several methodologies have proven effective for characterizing NBS-LRR genes.

Virus-Induced Gene Silencing (VIGS)

VIGS has emerged as a powerful technique for functional characterization of R-genes:

Application in tung trees: Silencing of Vm019719 in resistant Vernicia montana compromised resistance to Fusarium wilt, validating its role in defense [38]
Cotton NBS gene validation: Silencing of GaNBS (OG2) demonstrated its putative role in virus titration [3]
Advantages: Enables rapid functional assessment without stable transformation

Expression Profiling Under Stress Conditions

Transcriptomic analyses reveal R-gene expression patterns in response to biotic and abiotic stresses:

Dendrobium officinale: Six NBS-LRR genes showed significant up-regulation after salicylic acid treatment [40]
Cotton leaf curl disease: Orthogroups OG2, OG6, and OG15 displayed putative upregulation in different tissues under various stresses [3]
Tissue-specific expression: NBS-LRR genes often show differential expression across plant tissues and developmental stages

Table 4: Key Research Reagents and Resources for R-gene Studies

Resource	Function/Application	Specific Examples
PRGminer	Deep learning-based R-gene prediction and classification	Webserver (https://kaabil.net/prgminer/) and standalone tool [36] [39]
VIGS Systems	Functional validation of candidate R-genes	Tobacco rattle virus-based vectors for rapid gene silencing [3] [38]
HMMER Software	Domain identification and sequence annotation	PfamScan with NB-ARC domain models (e-value 1.1e-50) [3] [38]
OrthoFinder	Evolutionary analysis and orthogroup identification	DIAMOND for sequence similarity, MCL for clustering [3]
RNA-seq Databases	Expression profiling under various conditions	IPF database, Cotton Functional Genomics Database, NCBI BioProjects [3]

Signaling Pathways and Molecular Mechanisms

R-gene mediated immunity involves complex signaling pathways that translate pathogen recognition into defense responses. The following diagram illustrates key pathways in plant immunity, particularly focusing on NBS-LRR gene function:

Figure 1: Plant Immune Signaling Pathways

Effector-Triggered Immunity Signaling

The NBS-LRR proteins function as intracellular immune receptors that recognize pathogen effectors directly or indirectly through guard mechanisms [38] [22]. Key aspects include:

Molecular switch mechanism: NBS domains bind ATP/GTP, with hydrolysis providing energy for conformational changes and signaling initiation [22]
Downstream signaling: TIR domains participate in signal transduction, while CC domains facilitate protein-protein interactions [22]
Hypersensitive response: Programmed cell death in infected areas restricts pathogen spread [36] [39]

Implementation and Research Applications

Practical Workflow for R-gene Discovery

The following diagram illustrates an integrated experimental-computational workflow for R-gene identification and validation:

Figure 2: R-gene Discovery Workflow

Case Studies and Applications

PRGminer and similar approaches have enabled significant advances in crop improvement programs:

Tung tree disease resistance: Identification of Vm019719 in resistant Vernicia montana provides targets for marker-assisted breeding to control Fusarium wilt in susceptible Vernicia fordii [38]
Cotton leaf curl disease: Characterization of NBS genes in tolerant (Mac7) and susceptible (Coker 312) cotton accessions identified genetic variations associated with resistance [3]
Pepper breeding: Comprehensive analysis of 252 NBS-LRR genes facilitates development of disease-resistant varieties through targeted breeding strategies [22]

PRGminer represents a significant advancement in R-gene prediction, leveraging deep learning to overcome limitations of traditional alignment-based methods. Its high accuracy (>95%) in both identification and classification phases demonstrates the power of computational approaches for decoding plant immune systems. When integrated with experimental validation techniques like VIGS and expression analysis, PRGminer provides researchers with a comprehensive toolkit for accelerating R-gene discovery.

The evolution of NBS domain genes across land plants reveals a dynamic history of gene family expansion, diversification, and specialization. Understanding these evolutionary patterns provides essential context for interpreting PRGminer predictions and guiding targeted breeding strategies. As plant genomics continues to advance, deep learning approaches like PRGminer will play increasingly important roles in bridging genomic information and practical crop improvement, ultimately contributing to enhanced food security and sustainable agricultural practices.

Expression Profiling and Transcriptomic Analysis of NBS Genes Under Stress

Plant immunity relies on a sophisticated surveillance system where Nucleotide-Binding Site (NBS) domain genes play a pivotal role. These genes, often constituting one of the largest resistance (R) gene families, encode intracellular receptors that mediate effector-triggered immunity (ETI), a robust defense layer activated upon pathogen recognition [41]. The NBS domain forms the core of these proteins, functioning as a molecular switch by binding and hydrolyzing ATP to activate downstream immune signaling [41]. The typical structure of these proteins, frequently referred to as NBS-LRR or NLR proteins, includes a conserved NBS domain coupled with C-terminal leucine-rich repeats (LRR) and variable N-terminal domains such as Toll/Interleukin-1 receptor (TIR) or coiled-coil (CC), leading to their classification as TNL or CNL subfamilies, respectively [22].

The evolution of NBS genes across land plants showcases remarkable diversification and adaptation. From the relatively small repertoires in ancestral lineages like bryophytes to the expansive families in flowering plants, NBS genes have undergone significant expansion, primarily through mechanisms like tandem duplications and whole-genome duplications [3]. This evolutionary trajectory has resulted in substantial structural and functional diversity, encompassing both classical domain architectures (e.g., NBS, NBS-LRR, TIR-NBS) and numerous species-specific patterns, reflecting ongoing adaptive evolution to diverse pathogen pressures [3]. Placing expression profiling within this evolutionary context is crucial for understanding how these genes confer resistance and how their regulation has been fine-tuned across different plant lineages.

Experimental Design and Workflows for Transcriptomic Analysis

A robust transcriptomic analysis of NBS genes under stress requires careful experimental design, spanning from plant material selection to computational processing. The workflow can be broadly divided into wet-lab procedures for generating gene expression data and computational methods for its analysis, often performed in an integrated manner.

Plant Material Selection and Stress Induction

The foundation of a reliable expression study lies in the selection of appropriate plant material, often including genotypes with contrasting resistance phenotypes. For instance, studies on cotton leaf curl disease (CLCuD) have utilized susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions, enabling the identification of genetic variations and expression differences associated with resistance [3]. Similarly, research on Dalbergia sissoo involved screening two hundred plants and selecting resistant individuals after inoculation with the fungal pathogen Ceratocystis dalbergiae to study dieback disease resistance [42].

Stress treatments should be designed to mimic natural pathogen encounters or environmental challenges. For biotic stress, this can involve inoculation with pathogens such as bacteria (Pseudomonas syringae), fungi (Fusarium graminearum), or viruses (Begomoviruses) [3]. For abiotic stress, treatments may include dehydration, cold, drought, heat, osmotic stress, salt, or wounding [3]. Tissue collection should be performed at multiple time points post-stress application to capture both early and late responsive genes.

RNA Extraction, Library Preparation, and Sequencing

High-quality RNA extraction is a critical step. For tissues like roots or leaves, the CTAB (cetyltrimethylammonium bromide) method is widely used, often with modifications to optimize yield and purity [43]. The extracted RNA must be treated with DNase to remove genomic DNA contamination, and its quality and quantity should be assessed using spectrophotometry (e.g., NanoDrop), fluorometry (e.g., Qubit), and integrity analysis (e.g., TapeStation or agarose gel electrophoresis) [43].

For library preparation, the Illumina TruSeq Stranded Total RNA Library Prep Plant Kit is a common choice, which utilizes RiboZero beads to deplete ribosomal RNA, enriching for mRNA and other non-ribosomal transcripts [43]. The prepared libraries are then sequenced on high-throughput platforms like Illumina to generate short-read data (e.g., 150 bp paired-end reads), providing the raw data for subsequent transcriptome assembly and expression quantification.

Transcriptome Assembly and Expression Quantification

For non-model organisms without a reference genome, a de novo transcriptome assembly approach is necessary. However, leveraging genomic data from phylogenetically close species is a valuable strategy. For example, in a study on Euterpe edulis, the Elaeis guineensis (oil palm) genome was used as a reference for mapping due to their phylogenetic proximity and the high quality of the oil palm genome assembly [43]. This cross-species mapping strategy enhances the precision of gene annotation and identification of conserved genes.

The subsequent workflow for transcriptomic analysis, from raw data to biological insight, involves several key processing stages which can be visualized as follows:

Figure 1: Transcriptomic Data Analysis Workflow. This flowchart outlines the key computational steps for processing RNA-seq data to identify differentially expressed NBS genes.

Expression levels are typically quantified as Fragments Per Kilobase of transcript per Million mapped reads (FPKM), which normalizes for both gene length and sequencing depth, allowing for cross-sample comparisons [3]. These FPKM values can be retrieved from public databases like the IPF database, Cotton Functional Genomics Database (CottonFGD), and Cottongen, or calculated from raw sequencing data using pipelines as mentioned by Zahra et al. [3]. Differential expression analysis is then performed to identify genes with statistically significant expression changes between stress conditions and controls, or between resistant and susceptible genotypes.

Key Analytical Methods and Data Interpretation

Identification and Classification of NBS Genes

Before expression profiling, a comprehensive identification of NBS genes within the studied species is essential. The PfamScan.pl HMM search script is commonly employed with the Pfam-A_hmm model to screen for genes containing the NB-ARC domain (Pfam accession: PF00931), using a default e-value cutoff (e.g., 1.1e-50) [3] [44]. Additional associated domains (e.g., TIR, CC, LRR) are identified to determine domain architecture, allowing for the classification of NBS genes into classes such as CNL, TNL, NL (NBS-LRR only), and other atypical types [3] [22].

Orthogroup (OG) analysis using tools like OrthoFinder provides a deep evolutionary context. It clusters NBS genes from multiple species into orthogroups based on sequence similarity, helping identify core orthogroups (common across species) and unique ones (species-specific) [3]. This phylogenetic framework is crucial for understanding the conservation and divergence of NBS gene expression patterns.

Expression Profiling and Cluster Analysis

Once NBS genes are identified and expression values are obtained, researchers categorize expression data into tissue-specific, abiotic stress-specific, and biotic stress-specific profiles [3]. Heat maps are then generated to visualize the expression patterns of NBS genes, particularly focusing on core orthogroups across different tissues and stress conditions. For example, in cotton, expression profiling revealed the putative upregulation of orthogroups OG2, OG6, and OG15 in various tissues under biotic and abiotic stresses in both susceptible and tolerant plants [3].

The analysis often extends to promoter analysis, identifying cis-acting elements related to plant hormones (e.g., salicylic acid, methyl jasmonate) and abiotic stress in the upstream regions of NBS genes, which provides mechanistic insights into their regulation [41] [44]. Furthermore, co-expression analysis can link the expression of specific NBS genes with secondary metabolism pathways, suggesting a broader role in defense mechanisms beyond pathogen recognition [41].

Functional Validation of Candidate NBS Genes

Transcriptomic analysis identifies candidate genes, but their functional validation is a critical next step. Virus-Induced Gene Silencing (VIGS) is a powerful reverse-genetics approach to transiently knock down the expression of a candidate NBS gene in a resistant plant. For instance, silencing of GaNBS (from OG2) in resistant cotton demonstrated its putative role in reducing the viral titer of cotton leaf curl disease [3]. This functional link between gene expression and disease outcome solidifies the importance of the candidate gene.

Additional validation methods include quantitative real-time PCR (qPCR) to confirm the expression patterns of selected NBS genes observed in the RNA-seq data under specific stress conditions, such as salt stress [44]. Moreover, in silico analyses like protein-ligand and protein-protein interaction modeling can provide insights into molecular functions, such as the strong interaction of certain NBS proteins with ADP/ATP and viral proteins [3].

Successful execution of NBS gene expression profiling relies on a suite of specific reagents, tools, and databases. The following table summarizes key resources used in the featured methodologies.

Table 1: Research Reagent Solutions for NBS Gene Transcriptomic Analysis

Item Name	Function/Application	Specific Example/Usage
CTAB Extraction Buffer	RNA extraction from plant tissues, particularly recalcitrant tissues like roots.	Used in RNA extraction from Euterpe edulis seedlings [43].
Illumina TruSeq Stranded Total RNA Library Prep Plant Kit	Preparation of strand-specific RNA-seq libraries with ribosomal RNA depletion.	Library preparation for transcriptome sequencing [43].
PfamScan & HMMER3	Identification of conserved protein domains (e.g., NB-ARC PF00931) in candidate genes.	Screening for NBS-domain-containing genes [3] [36].
OrthoFinder	Clustering of genes into orthogroups across multiple species for evolutionary analysis.	Identifying core and unique orthogroups of NBS genes [3].
Degenerate Oligonucleotide Primers	Amplification of diverse NBS-LRR gene family members from transcriptomes when genomic data is lacking.	Probing the Dalbergia sissoo transcriptome under dieback stress [42].
VIGS Vectors	Functional validation through transient gene silencing in plants.	Silencing GaNBS to confirm its role in virus resistance [3].

Visualization of NBS Domain Architecture and Classification

The diversity of NBS domain genes, a result of their complex evolution, can be systematically classified based on their domain composition. The following diagram illustrates the primary structural classes and their relationships.

Figure 2: Classification of Plant NBS Domain Genes. This chart outlines the classification of NBS genes into typical and atypical NLRs based on the presence of complete N-terminal and LRR domains, with further subdivision into specific subfamilies like CNL, TNL, and RNL.

Expression profiling and transcriptomic analysis have proven indispensable for unraveling the complex roles and evolutionary dynamics of NBS genes in plant stress responses. The integrated methodology—combining high-throughput RNA sequencing, sophisticated bioinformatic analyses for identification and orthogroup clustering, and functional validation through techniques like VIGS—provides a powerful framework to link gene expression with biological function. This approach has illuminated the diverse expression patterns of NBS genes across tissues and stresses, identified key regulatory candidates for breeding, and revealed species-specific evolutionary paths, such as the marked reduction of TNL genes in certain lineages like Salvia miltiorrhiza and monocots [41] [45]. As genomic resources continue to expand for non-model plants, the application of these standardized protocols will further deepen our understanding of how this critical gene family has evolved to underpin plant adaptation and immunity, ultimately informing strategies for developing more resilient crops.

Bulked Segregant RNA-Seq (BSR-Seq) represents a powerful fusion of traditional genetics and modern high-throughput sequencing, enabling researchers to rapidly pinpoint genetic loci controlling traits of interest. This method combines the principles of bulked segregant analysis with the analytical power of RNA sequencing, facilitating simultaneous mapping and candidate gene identification. Within the broader context of plant evolutionary biology, BSR-Seq has proven particularly valuable for studying the evolution of complex gene families, such as the nucleotide-binding site (NBS)-encoding genes that form the backbone of plant innate immunity. This technical guide examines BSR-Seq methodologies, applications, and its growing role in elucidating the evolutionary dynamics of disease resistance genes across land plants.

Plant resistance to pathogens is often governed by a sophisticated surveillance system based on nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, which recognize pathogen effectors and initiate defense responses [4]. The NBS gene family exhibits remarkable diversity across land plants, with significant differences in gene number, organization, and subclass distribution between bryophytes and vascular plants [32], as well as between monocots and dicots [4]. This diversity results from continuous evolutionary arms races between plants and their pathogens, driving rapid diversification of resistance genes through mechanisms such as tandem duplication, ectopic recombination, and positive selection [4].

Traditional methods for mapping resistance loci, including quantitative trait locus (QTL) mapping and positional cloning, are often low-throughput and time-consuming [46]. BSR-Seq emerged as an efficient alternative that accelerates gene identification by combining bulked segregant analysis with RNA sequencing [47] [48]. This approach enables researchers to simultaneously identify genetic markers linked to traits of interest and analyze global gene expression patterns, providing valuable insights into both the genetic location and potential function of candidate genes [48].

The Evolutionary Context of NBS Domain Genes in Land Plants

Genomic Organization and Diversity of NBS-LRR Genes

NBS-LRR genes represent one of the largest and most variable gene families in plant genomes, with significant implications for plant adaptation and evolution. Comparative genomic analyses reveal substantial variation in NBS-LRR gene numbers across species, ranging from approximately 50 in papaya (Carica papaya) to 653 in rice (Oryza sativa spp. indica) [4]. This expansion results from both whole-genome duplication events and small-scale duplications, including tandem and segmental duplications [3].

Plant genomes typically organize NBS-LRR genes in clusters, which facilitates rapid evolution of new pathogen specificities through mechanisms such as tandem duplication and ectopic recombination [4]. The NBS domain itself contains several highly conserved motifs, including the P-loop and kinase-2 domains, while the LRR regions evolve rapidly under diversifying selection, particularly at solvent-exposed residues involved in pathogen recognition [4].

Table 1: NBS-LRR Gene Family Size in Various Plant Species

Plant Species	Total NBS-LRR Genes	TNL Genes	CNL Genes	Reference
Arabidopsis thaliana	149-159	94-98	50-55	[4]
Oryza sativa spp. japonica	553	-	-	[4]
Glycine max (soybean)	319	-	-	[4]
Medicago truncatula	333	156	177	[4]
Brachypodium distachyon	126	0	113	[4]

Evolutionary Lineage-Specific Patterns

A striking evolutionary pattern in NBS-LRR gene distribution is the near-total absence of TIR-NBS-LRR (TNL) genes in monocotyledons, while they are present—and often abundant—in dicotyledons [4]. For example, the Brachypodium distachyon genome contains 126 NBS-LRR genes, all belonging to the CC-NBS-LRR (CNL) subclass, with no TNL representatives [4]. In contrast, dicot species like Arabidopsis thaliana and soybean (Glycine max) contain both subclasses, sometimes with TNL genes outnumbering CNL genes [4].

Recent pangenome analyses of bryophytes have revealed they possess a substantially greater diversity of gene families than vascular plants, including higher numbers of unique and lineage-specific gene families [32]. This rich genetic toolkit, which includes novel immune receptors, likely contributed to their successful colonization of diverse habitats despite their structural simplicity.

Principles and Methodologies of BSR-Seq

Fundamental Workflow

BSR-Seq integrates bulk segregant analysis with transcriptome sequencing to identify genomic regions associated with specific phenotypes. The methodology involves creating two bulked RNA samples from segregating populations exhibiting contrasting phenotypes, followed by high-throughput sequencing and computational analysis to detect regions with significant allele frequency differences between bulks [48] [46].

The fundamental principle underlying BSR-Seq is that genetic markers completely linked to a causal gene will show significant differences in allele frequency between bulks, while unlinked markers will segregate randomly [46]. In practice, for a SNP completely linked to a recessive mutant, only one allele will be present in the mutant pool, while both alleles will be present in the non-mutant pool [48].

Detailed Experimental Protocol

Population Development and Bulk Construction

The BSR-Seq workflow begins with developing a suitable segregating population, typically an F2 generation derived from crossing parents with contrasting phenotypes for the trait of interest [48] [49]. For example, in a study mapping root lodging resistance in maize, researchers crossed the lodging-resistant line CIMBL145 with the susceptible line CIMBL74 to generate an F2 population [50].

From the segregating population, individuals with extreme phenotypes are selected and divided into two pools. In the maize root lodging study, researchers selected 30 non-lodging plants and 30 complete-lodging plants from the F2 population to create resistant and susceptible bulks, respectively [50]. Similarly, in a soybean study investigating multifoliolate leaves, researchers selected 30 recombinant inbred lines with the highest multifoliolate frequencies and 30 with the lowest frequencies to create contrasting bulks [49].

RNA Extraction, Library Preparation, and Sequencing

RNA is extracted from each bulk using standard protocols, such as Trizol reagent [50] or commercial RNA extraction kits [51]. The extracted RNA is then used to prepare sequencing libraries, which are sequenced using an appropriate platform (e.g., Illumina) [50] [48].

Sequencing depth is a critical consideration in BSR-Seq experimental design. In the original BSR-Seq study mapping the maize glossy3 gene, researchers obtained more than 13 million 75-bp single-end reads per bulk using one lane of an Illumina GAIIx flowcell [48]. Adequate sequencing depth ensures sufficient coverage for both SNP discovery and expression analysis.

Bioinformatics Analysis Pipeline

The bioinformatics workflow for BSR-Seq involves multiple steps, including read alignment, variant calling, and association analysis:

Read Quality Control and Alignment: Sequencing reads are first processed to remove low-quality regions and adapter sequences [50]. The clean reads are then aligned to a reference genome using splice-aware alignment tools such as GSNAP [48], HISAT2 [50], or BWA [46].
Variant Calling: Single nucleotide polymorphisms (SNPs) are identified using variant callers such as GATK [50]. In the maize glossy3 study, this approach identified more than 64,000 high-confidence SNPs [48].
Association Analysis: Statistical methods are applied to identify SNPs showing significant allele frequency differences between bulks. The original BSR-Seq publication used an empirical Bayesian approach to estimate the probability of each SNP being in complete linkage disequilibrium with the causal gene [48]. Alternatively, the QTLseqr package can be used to identify associated genomic regions [50].
Differential Expression Analysis: RNA-Seq data also enable identification of differentially expressed genes between bulks, providing functional insights into candidate genes [48]. In the glossy3 study, 1,095 genes were differentially expressed between mutant and non-mutant pools [48].

Table 2: Key Bioinformatics Tools for BSR-Seq Analysis

Analysis Step	Tools	Function	Reference
Read Alignment	HISAT2, GSNAP, BWA, Bowtie2	Map sequencing reads to reference genome	[50] [48] [46]
Variant Calling	GATK, FreeBayes	Identify SNPs and indels from aligned reads	[50] [46]
Association Analysis	QTLseqr, Bayesian Methods	Detect SNPs with allele frequency differences between bulks	[50] [48]
Differential Expression	edgeR, DESeq2	Identify differentially expressed genes between bulks	[48]

Applications in Resistance Gene Mapping

Case Study: Mapping Root Lodging Resistance in Maize

BSR-Seq has been successfully applied to map resistance loci in crop species. In a 2025 study on maize root lodging resistance, researchers used BSR-Seq to identify eight QTLs associated with root architecture and lodging resistance [50]. From the F2 population of 580 plants derived from a cross between lodging-resistant (CIMBL145) and lodging-susceptible (CIMBL74) lines, they created resistant and susceptible bulks by pooling roots from 30 non-lodging and 30 complete-lodging plants, respectively [50].

The BSR-Seq analysis identified four major QTLs (qRLR1, qRLR4, qRLR5, and qRLR6), which were subsequently validated through chromosomal region-based association study (CRAS) and linkage mapping [50]. Within these QTL regions, researchers identified 306 candidate genes, including root development- and cell wall-related genes. Further association and haplotype analysis pinpointed ZmNRT5, encoding a nitrate transporter, as a strong candidate gene [50]. Expression analysis revealed significantly lower ZmNRT5 expression in the susceptible parent (CIMBL74) compared to the resistant parent (CIMBL145), supporting its role in root lodging resistance [50].

Case Study: Identifying Multifoliolate Leaf Loci in Soybean

In a 2024 study, researchers integrated traditional QTL mapping with BSR-Seq to identify genetic loci controlling the multifoliolate leaf phenotype in soybean [49]. From a recombinant inbred line population of 407 lines, they selected 30 lines with the highest multifoliolate frequencies and 30 with the lowest frequencies to create contrasting bulks for BSR-Seq analysis [49].

The combined approach identified ten QTLs associated with the multifoliolate trait, including a major QTL (qMF-2-1) on chromosome 2 that explained more than 10% of the phenotypic variation [49]. BSR-Seq analysis revealed two candidate genes within the associated regions: Glyma.06G204300 encoding the transcription factor TCP5, and Glyma.06G204400 encoding LONGIFOLIA 2 (LNG2) [49]. Transcriptome analysis further indicated that stress-responsive genes were differentially expressed between high- and low-multifoliolate lines, suggesting potential interplay between genetic and environmental factors in regulating this trait [49].

Table 3: Essential Research Reagents for BSR-Seq Experiments

Reagent/Resource	Function	Examples/Specifications
Segregating Population	Genetic mapping resource	F2, RILs, or other segregating populations from parents with contrasting phenotypes
RNA Extraction Kit	Isolation of high-quality RNA	Trizol reagent or commercial kits (e.g., Aidlab RNA extraction kit)
cDNA Synthesis Kit	Reverse transcription of RNA	PrimeScript RT reagent kit with gDNA Eraser
Sequencing Library Prep Kit	Preparation of sequencing libraries	Illumina-compatible library preparation kits
Reference Genome	Read alignment and variant calling	Species-specific reference genome assembly
Alignment Software	Mapping reads to reference	HISAT2, GSNAP, BWA, Bowtie2
Variant Caller	SNP and indel identification	GATK, FreeBayes
QTL Analysis Tool	Association mapping	QTLseqr, Bayesian approaches

Integration with Evolutionary Studies of NBS Domain Genes

BSR-Seq provides a powerful tool for studying the evolution of NBS domain genes by enabling rapid identification of resistance loci and their associated gene families. The approach has been particularly valuable for investigating how NBS-LRR genes evolve in response to pathogen pressure and how different evolutionary mechanisms shape resistance gene repertoires across plant lineages.

Recent studies have revealed that bryophytes possess a substantially larger diversity of gene families compared to vascular plants, including unique immune receptors that may contribute to their ecological success [32]. BSR-Seq can help functionally characterize these lineage-specific genes and elucidate their roles in plant immunity and adaptation.

Furthermore, BSR-Seq facilitates comparative evolutionary analyses by enabling efficient mapping of resistance loci across multiple species. For example, the identification of ZmNRT5 as a candidate for root lodging resistance in maize [50] and TCP5/LNG2 as regulators of leaf development in soybean [49] demonstrates how BSR-Seq can reveal both conserved and lineage-specific genetic mechanisms underlying plant traits.

BSR-Seq has emerged as a powerful methodology that effectively bridges sequence and function in genetic studies. By combining the mapping power of bulk segregant analysis with the comprehensive data generation of RNA sequencing, this approach accelerates the identification of candidate genes controlling important traits, including disease resistance. In the broader context of plant evolutionary biology, BSR-Seq provides valuable insights into the dynamics of gene family evolution, particularly for rapidly evolving systems like the NBS-LRR genes that govern plant-pathogen interactions. As sequencing technologies continue to advance and become more accessible, BSR-Seq is poised to play an increasingly important role in both basic research and crop improvement programs.

Plants are in a constant evolutionary arms race with a wide array of pathogens, leading to significant yield losses that threaten global food security. It is estimated that plant diseases and pest infestations cause a 20–30% annual reduction in global crop yields [52]. To combat this threat, plants have evolved a sophisticated two-layered immune system. The second layer, known as effector-triggered immunity (ETI), is primarily mediated by a large class of resistance (R) proteins encoded by nucleotide-binding site leucine-rich repeat (NBS-LRR or NLR) genes [41]. These intracellular immune receptors recognize specific pathogen-secreted effector proteins, triggering robust defense responses that often include a hypersensitive response and programmed cell death to restrict pathogen spread [41] [52].

The NBS domain, a conserved feature across this gene family, functions as a molecular switch by binding and hydrolyzing ATP, which is essential for activating downstream immune signaling [41]. The remarkable diversity of NBS-LRR genes, coupled with their ability to evolve rapidly, makes them a cornerstone of plant immunity and an invaluable resource for crop improvement programs. This technical guide explores the applications of NBS-LRR gene research in developing disease-resistant cultivars, framed within the broader context of land plant evolution.

Evolutionary Dynamics of NBS-LRR Genes Across Plant Lineages

The NBS-LRR gene family exhibits extraordinary evolutionary dynamics across the plant kingdom. Comparative genomic analyses reveal significant variation in the composition and expansion of NBS-LRR subfamilies among different plant lineages, reflecting their distinct evolutionary paths and host-pathogen co-evolution histories.

Lineage-Specific Expansion and Contraction

Recent studies on Salvia miltiorrhiza, an important medicinal plant, revealed a striking reduction in specific NBS-LRR subfamilies. Among 196 NBS-LRR genes identified, only 62 possessed complete N-terminal and LRR domains, with a notable reduction in TNL (TIR-NBS-LRR) and RNL (RPW8-NBS-LRR) subfamily members [41]. This pattern extends across the Salvia genus, with comparative analysis of five Salvia species (S. miltiorrhiza, S. bowleyana, S. divinorum, S. hispanica, and S. splendens) showing a complete absence of TNL subfamily members and only one or two RNL copies in each species—far fewer than observed in other angiosperms like Arabidopsis thaliana, Nicotiana tabacum, and Vitis vinifera [41].

This pattern of subfamily expansion and contraction is observed across plant lineages. Gymnosperms such as Pinus taeda have experienced significant expansion of the TNL subfamily, which comprises 89.3% of their typical NBS-LRRs. In contrast, monocotyledonous species including Oryza sativa (rice), Triticum aestivum (wheat), and Zea mays (corn) have completely lost the TNL and RNL subfamilies through evolution [41]. These findings highlight the fluid nature of the NLR immune repertoire across plant evolution.

Table 1: NBS-LRR Gene Family Composition Across Plant Species

Plant Species	Total NLRs	CNL	TNL	RNL	Other/Partial	Reference
Salvia miltiorrhiza	196	61	0	1	134	[41]
Nicotiana benthamiana	156	25	5	4*	122	[53]
Triticum aestivum (Wheat)	~460	Predominant	0	0	Not specified	[52]
Arabidopsis thaliana	207	Not specified	Not specified	Not specified	Not specified	[41]
Oryza sativa (Rice)	505	Predominant	0	0	Not specified	[41]
100 Plant Species (PlantNLRatlas)	68,452	3,689 (full-length)	-	-	64,763 (partial-length)	[54]

Note: The 4 RNL-type proteins in N. benthamiana contain RPW8 domains but are distributed among N, CN, and NL subfamilies [53].

Structural and Functional Classification of NBS-LRR Proteins

NBS-LRR proteins are classified based on their domain architecture into typical and atypical types. Typical NBS-LRR proteins contain three principal domains: an N-terminal domain, a central NBS domain, and a C-terminal LRR domain [41] [55]. The N-terminal domain determines classification into three major subfamilies:

TNL: Contains a Toll/interleukin-1 receptor (TIR) domain
CNL: Features a coiled-coil (CC) domain
RNL: Possesses a resistance to powdery mildew 8 (RPW8) domain

Atypical NBS-LRR proteins lack complete domains and are further categorized based on specific domain deletions into subtypes such as N (NBS only), TN (TIR-NBS), CN (CC-NBS), and NL (NBS-LRR) [41]. Both CNL and TNL proteins serve as intracellular receptors in ETI, while RNL proteins act as helper nodes for signaling transduction [41] [53].

Computational Tools for NBS-LRR Gene Identification and Analysis

The identification and characterization of NBS-LRR genes have been revolutionized by computational biology approaches. Traditional methods relied on domain-based bioinformatics pipelines using tools like InterProScan, HMMER, and MEME to scan genomes for conserved domains and motifs [52]. However, recent advances in machine learning and deep learning have significantly enhanced prediction accuracy.

Emerging Deep Learning Approaches

PRGminer represents a cutting-edge deep learning-based tool specifically designed for accurate prediction of resistance proteins. This tool operates in two phases: Phase I predicts input protein sequences as R-genes or non-R-genes, while Phase II classifies the predicted R-genes into eight different classes [36]. PRGminer achieves remarkable accuracy rates of 95.72% on independent testing in Phase I and 97.21% in Phase II, with Matthews correlation coefficient values of 0.91 and 0.92, respectively [36]. This demonstrates superior performance compared to traditional alignment-based methods, particularly for sequences with low homology.

Table 2: Computational Tools for NBS-LRR Gene Identification

Tool Name	Approach	Input Data	Key Features	Reference
PRGminer	Deep Learning	Protein sequences	Two-phase classification; 95.72% accuracy	[36]
NLRtracker	Domain-based	Protein sequences	High sensitivity for plant proteomes	[55]
NLR-Annotator	Domain-based	Nucleotide sequences	Suitable for non-Linux users	[55]
PlantNLRatlas	Database	Pre-annotated genomes	68,452 NLRs from 100 plant species	[54]
RefPlantNLR	Database	Experimentally validated	Curated collection of confirmed NLRs	[54]

Comprehensive Databases for NBS-LRR Research

Large-scale datasets have been developed to support comparative investigations of NLRs across diverse plant taxa. The PlantNLRatlas represents one of the most comprehensive resources, containing 68,452 full-length and partial-length NLR genes identified across 100 high-quality plant genomes [56] [54]. This dataset includes 83 eudicots, 10 monocots, and 7 other plants representing 31 orders and 48 families, with an average of 685 NLRs per genome [54]. The extreme variation in NLR numbers between species—from 28 in coriander (Coriandrum sativum) to 3,428 in alfalfa (Medicago sativa)—highlights the diverse evolutionary paths of plant immune systems [54].

Experimental Protocols for Functional Characterization

Phylogenomic Analysis of Conserved Motifs

A step-by-step computational protocol has been developed for identifying evolutionarily conserved motifs in plant NLR proteins, which is essential for understanding their molecular functions [55]. This pipeline can be applied to identify molecular signatures that have remained conserved in the gene family over evolutionary time across plant species.

Figure 1: Workflow for Phylogenomic Analysis of Plant NLR Immune Receptors

The key steps in this protocol include:

Data Acquisition: Download protein sequences from reference genome databases. As a test dataset, proteomes from six representative plant species (Arabidopsis thaliana, Beta vulgaris, Solanum lycopersicum, Nicotiana benthamiana, Oryza sativa, and Hordeum vulgare) can be utilized [55].
NLR Annotation: Annotate NLRs from input protein sequence files using NLRtracker with the command: ./NLRtracker -s NLRtracker_input_protein.fasta -o NLRtracker_output [55]. NLRtracker demonstrates higher sensitivity compared to alternative tools and can detect functionally validated NLRs that might otherwise be missed.
Domain Sequence Parsing: Based on InterProScan results, parse domain sequences from corresponding protein sequences of each identified NB-LRR gene. For genes containing multiple domains, splice them in the order they appear in the gene sequence.
Multiple Sequence Alignment: Perform alignment using Clustal Omega with default parameters [55].
Phylogenetic Tree Construction: Build phylogenetic trees using FastTree with the parameter -lg [55].
Motif Prediction: Identify conserved sequence motifs using the MEME Suite, which can be implemented either through local installation or the online web server [55].

This pipeline has successfully identified conserved sequence motifs such as the MADA and EDVID motifs within the CC-NLR subfamily, providing insights into functionally important regions [55].

Map-Based Cloning of Functional NBS-LRR Genes

The identification of Ym1, a wheat CC-NBS-LRR protein that confers resistance to wheat yellow mosaic virus (WYMV), exemplifies a sophisticated gene cloning approach [57]. The experimental workflow involved:

Figure 2: Map-Based Cloning Workflow for Wheat Ym1 Gene

Genetic Population Development: Create a double hybrid F1 cross using Yining Xiaomai (YNXM, the donor of Ym1), 2011I-78 (WYMV susceptible), and Chinese Spring ph1b mutant (WYMV susceptible) [57]. The ph1b mutation promotes homoeologous recombination, overcoming the challenge of recombination suppression between alien introgression fragments and their wheat counterparts.
Fine-Mapping: Identify plants heterozygous for Ym1 and homozygous for the ph1b mutant gene using diagnostic markers. Genotype 326 BC1F2 individuals with flanking markers InDelM41 and InDelM412 to screen for recombinants [57].
Candidate Gene Identification: Narrow the Ym1 locus to a physical interval flanked by markers 2ESTK2 and InDel_FA192, corresponding to a 5.6 Mbp region containing 65-73 annotated genes [57]. Use reciprocal BLAST sequence alignment to compare annotated genes in resistant and susceptible varieties.
Functional Validation: Validate the candidate gene through knockdown/knockout experiments that compromise WYMV resistance, and overexpression studies that enhance WYMV resistance in wheat [57].

This comprehensive approach led to the successful isolation of Ym1, which recognizes the WYMV coat protein and activates resistance by triggering hypersensitive responses [57].

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Essential Research Reagents for NBS-LRR Gene Characterization

Reagent/Resource	Specification	Application	Example/Reference
Plant Materials	Resistant and susceptible cultivars; Mapping populations	Genetic analysis and phenotyping	BR 18-Terena wheat [58]
Pathogen Isolates	Characterized strains with known effectors	Disease assays and Avr-R interaction studies	MoT isolate BR32 [58]
Genotyping Markers	SSR, InDel, CAPS markers derived from SNPs	Genetic mapping and recombinant screening	MAT, MGM301, 1338.1.2, 1106.3.1 [59]
Cloning Vectors	pBluescript II SK(+), pSH75 (hygromycin resistance)	Gene cloning and transformation	[59]
HMM Profiles	NB-ARC (PF00931) from Pfam database	Initial identification of NBS domains	[53]
Software Tools	NLRtracker, MEME Suite, InterProScan	Bioinformatics analysis of NLR genes	[55]
Database Resources	PlantNLRatlas, RefPlantNLR	Comparative genomics and reference data	68,452 NLRs from 100 plants [54]

Case Studies: NBS-LRR Genes in Crop Improvement

Wheat Blast Resistance

Wheat blast, caused by the fungus Magnaporthe oryzae Triticum (MoT) pathotype, is a devastating disease that has spread from South America to Bangladesh and India, posing a global threat to wheat production [59] [58]. Genetic resistance has been identified in several wheat genes, including Rmg2, Rmg3, Rmg7, and Rmg8, with Rmg7 and Rmg8 providing resistance at both seedling and heading stages [58].

Notably, Rmg8 (on chromosome 2B in hexaploid wheat) and Rmg7 (on chromosome 2A in tetraploid wheat) both recognize the same avirulence gene AVR-Rmg8, suggesting these resistance genes are equivalent from a breeding perspective [59]. The corresponding avirulence gene AVR-Rmg8 was isolated from a wheat blast isolate through a map-based strategy, encoding a small protein containing a putative signal peptide [59].

The Brazilian wheat cultivar BR 18-Terena represents an important source of quantitative resistance to wheat blast, with genetic analysis revealing nine quantitative trait loci (QTL) associated with resistance at either seedling or heading stages [58]. This resistance is largely tissue-specific, with different QTL providing protection at different developmental stages, highlighting the complexity of breeding for comprehensive disease resistance [58].

Wheat Yellow Mosaic Virus Resistance

The recent cloning of Ym1 represents a significant advancement in controlling wheat yellow mosaic virus (WYMV), a soil-borne disease that threatens over 2.2 million square hectometers of Chinese wheat growing areas and causes 30% to 70% yield reduction [57].

Ym1 encodes a typical CC-NBS-LRR type R protein that is specifically expressed in roots and induced upon WYMV infection [57]. The resistance mechanism involves Ym1 recognizing and interacting with the WYMV coat protein (CP), which leads to nucleocytoplasmic redistribution—a process that transitions Ym1 from an auto-inhibited to an activated state [57]. This activation subsequently elicits hypersensitive responses and establishes WYMV resistance by likely blocking viral transmission from the root cortex into steles, thereby preventing systemic movement to aerial tissues [57].

Ym1 has been introgressed from the sub-genome Xn or Xc of polyploid Aegilops species into common wheat, demonstrating the value of harnessing wild relatives for crop improvement [57]. This gene is the most widely utilized source for WYMV resistance control in worldwide wheat breeding programs.

Breeding Applications and Future Perspectives

The identification and functional characterization of NBS-LRR genes have direct applications in marker-assisted selection (MAS) and genetic engineering for disease-resistant crop varieties. The findings from BR 18-Terena have enabled haplotype analysis of 100 Brazilian wheat cultivars, revealing that 11.0% already possess a BR 18-Terena-like haplotype for more than one of the identified heading stage QTL [58]. This facilitates targeted breeding efforts to combine multiple resistance QTL for more durable resistance.

Future perspectives in this field include:

Gene Pyramiding: Combining multiple NBS-LRR genes with different recognition specificities to develop cultivars with broader and more durable resistance [52].
Engineered NLRs: Modifying the LRR domains of NLR proteins to recognize new pathogen effectors, creating synthetic resistance genes [52].
Network Biology Approaches: Understanding how NLR proteins function within immune signaling networks, including interactions between helper NLRs and sensor NLRs [54].
Pan-NLRome Characterization: Leveraging multiple reference genomes for key crops to understand the complete diversity of NLR genes within species [54].

As genomic technologies continue to advance, the integration of evolutionary insights with functional characterization will accelerate the development of disease-resistant cultivars, reducing reliance on chemical pesticides and enhancing global food security. The rich diversity of NBS-LRR genes across land plants represents a vast resource for crop improvement that we are only beginning to tap systematically.

Overcoming Challenges in NBS Gene Research: Annotation, Regulation, and Balance

Addressing Annotation Challenges in Complex R-gene Clusters

Plant nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes constitute one of the largest and most diverse gene families involved in disease resistance, presenting substantial annotation challenges due to their complex genomic architecture. Recent studies have identified 12,820 NBS-domain-containing genes across 34 plant species, classified into 168 distinct classes with both classical and species-specific structural patterns [3]. The pepper (Capsicum annuum L.) genome alone contains 252 NBS-LRR resistance genes distributed unevenly across all chromosomes, with 54% forming 47 gene clusters driven by tandem duplications and genomic rearrangements [22]. In tung trees, comparative analysis revealed 239 NBS-LRR genes across two Vernicia species, with 90 in the Fusarium wilt-susceptible V. fordii and 149 in the resistant V. montana [60].

This remarkable diversity, coupled with the clustered organization of these genes, creates significant obstacles for accurate genome annotation. Standard annotation pipelines frequently fragment R-gene predictions due to their repetitive nature and sequence similarity, while their typically low expression levels complicate transcriptome-based annotation approaches [36]. The annotation challenge is further compounded by the fact that R-genes can be mistaken for repetitive sequences, causing public databases for transposable elements to obscure R-gene detection during genome annotation processes [36].

Core Challenges in R-gene Cluster Annotation

Structural Diversity and Genomic Organization

The NBS-LRR gene family exhibits extraordinary structural diversity, encompassing significant variations in domain architecture across plant species. These genes typically encode large proteins ranging from approximately 860 to 1,900 amino acids with at least four distinct domains: a variable amino-terminal domain, the NBS domain, the LRR region, and variable carboxy-terminal domains [61]. Based on their N-terminal domains, NBS-LRR genes are classified into two major subfamilies: TIR-NBS-LRR (TNL) and CC-NBS-LRR (CNL), also referred to as non-TIR-NBS-LRR (nTNL) [22].

Table 1: NBS-LRR Gene Distribution in Selected Plant Species

Plant Species	Total NBS-LRR Genes	TNL Genes	CNL/nTNL Genes	Gene Clusters	Reference
Capsicum annuum (pepper)	252	4	248	47 clusters	[22]
Vernicia montana (tung tree)	149	12	137	Not specified	[60]
Vernicia fordii (tung tree)	90	0	90	Not specified	[60]
Arabidopsis thaliana	~150	~62	~88	Multiple	[61]
Oryza sativa (rice)	~400	0	~400	Multiple	[61]

The distribution of these genes across genomes is highly non-random, with significant enrichment in specific genomic regions. For example, in pepper, chromosome 3 harbors the highest number of NBS-LRR genes (38), while chromosomes 2 and 6 contain the lowest number (5 each) [22]. This uneven distribution reflects the lineage-specific adaptations and evolutionary pressures that have shaped R-gene repertoires in different plant species.

Evolutionary Dynamics Complicating Annotation

NBS-LRR genes evolve through a birth-and-death process characterized by frequent gene duplications and losses, resulting in two distinct evolutionary patterns [61]. Type I genes evolve rapidly with frequent gene conversions and are often represented by multiple paralogs in a genome, while Type II genes evolve slowly with rare gene conversion events and typically have fewer paralogs [13]. This heterogeneous evolutionary rate creates substantial challenges for annotation pipelines, particularly when leveraging comparative genomics approaches.

The evolution of NBS-LRR genes is further complicated by their engagement with RNA silencing pathways. Multiple microRNA families target conserved regions within NBS-LRR transcripts, creating an additional layer of regulatory complexity that must be considered during annotation [13]. These miRNAs typically target highly duplicated NBS-LRRs, with duplicated genes from different families periodically giving birth to new miRNAs in a classic example of co-evolution [13].

Advanced Annotation Methodologies and Workflows

Integrated Bioinformatics Pipelines

Specialized bioinformatics pipelines have been developed to address the unique challenges of R-gene annotation. The nf-annotate pipeline represents a comprehensive approach that integrates multiple evidence sources for accurate R-gene prediction [62]. This pipeline employs a structured workflow that combines homology-based prediction, ab initio gene finding, and transcriptomic evidence to generate high-confidence annotations.

Table 2: Key Tools for R-gene Annotation and Their Applications

Tool/Pipeline	Methodology	Key Features	Application Scope
nf-annotate	Integrated homology-based and evidence-driven	Combines InterProScan, MEME, MAST, Miniprot	Comprehensive R-gene annotation	[62]
BRAKER2	Automated genome annotation with protein evidence	Integrates GeneMark-EP+ and AUGUSTUS	General eukaryotic annotation with R-gene capability	[63]
PRGminer	Deep learning-based prediction	Uses dipeptide composition and convolutional neural networks	R-gene identification and classification	[36]
OrthoFinder	Orthogroup inference	Uses DIAMOND and MCL clustering	Evolutionary analysis of R-gene families	[3]

The nf-annotate pipeline implements several specialized subworkflows for R-gene annotation. The HRP (Homology-based R-gene Prediction) subworkflow begins with protein sequence extraction from genome annotations, followed by domain identification using InterProScan Pfam, NB-LRR extraction, motif analysis with MEME/MAST, and refinement through InterProScan Superfamily [62]. This comprehensive approach enables the identification of both canonical and non-canonical R-genes that might be missed by standard annotation pipelines.

Deep Learning Approaches for R-gene Identification

Recent advances in deep learning have enabled the development of specialized tools like PRGminer, which implements a two-phase prediction approach for R-gene identification and classification [36]. In Phase I, the tool distinguishes R-genes from non-R-genes with 98.75% accuracy in k-fold testing and 95.72% on independent testing using dipeptide composition features. Phase II classifies the identified R-genes into eight different classes with an overall accuracy of 97.55% in k-fold testing and 97.21% on independent testing [36].

This deep learning approach offers significant advantages over traditional alignment-based methods, particularly for identifying R-genes with low sequence homology to known references. By extracting sequential and convolutional features from raw encoded protein sequences, PRGminer can recognize patterns indicative of R-genes that might be missed by BLAST-based or motif-based approaches [36].

Experimental Validation and Functional Characterization

Key Experimental Protocols

Functional validation of annotated R-genes typically employs a multi-faceted approach combining expression analysis, genetic variation studies, and functional characterization. A comprehensive protocol for validating NBS-LRR gene predictions includes the following key steps:

Expression Profiling: RNA-seq data from various tissues and stress conditions is analyzed to identify R-genes with responsive expression patterns. For example, in cotton, expression profiling revealed the putative upregulation of specific orthogroups (OG2, OG6, and OG15) in different tissues under various biotic and abiotic stresses in susceptible and tolerant plants to cotton leaf curl disease [3]. The retrieved FPKM values are categorized into tissue-specific, abiotic stress-specific, and biotic stress-specific expression profiles to identify context-dependent regulation.

Genetic Variation Analysis: Comparison of resistant and susceptible accessions identifies sequence variants potentially contributing to resistance phenotypes. In cotton, genetic variation analysis between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified several unique variants in NBS genes of Mac7 (6,583 variants) and Coker312 (5,173 variants) [3].

Protein Interaction Studies: Protein-ligand and protein-protein interaction assays validate the functional potential of annotated R-genes. Research has demonstrated strong interaction of some putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus [3].

Functional Characterization through VIGS: Virus-induced gene silencing (VIGS) provides direct evidence of gene function. In resistant cotton, silencing of GaNBS (OG2) through VIGS demonstrated its putative role in virus tittering, confirming its functional importance in disease resistance [3].

Research Reagent Solutions for R-gene Studies

Table 3: Essential Research Reagents for R-gene Functional Characterization

Reagent/Resource	Function/Application	Example Use Case	Reference
VIGS Vectors	Virus-induced gene silencing for functional validation	Silencing of GaNBS in cotton to confirm role in virus resistance	[3]
RNA-seq Libraries	Expression profiling under stress conditions	Identifying differentially expressed NBS-LRR genes in tolerant vs susceptible varieties	[3]
OrthoDB Protein Database	Source of protein sequences for homology-based annotation	Providing reference data for BRAKER2 automated annotation	[63]
Pfam Domain Databases	Domain identification and classification	Identifying NB-ARC domains in candidate R-genes	[3]
InterProScan	Integrated domain and motif prediction	Comprehensive domain architecture analysis in nf-annotate pipeline	[62]

Integrated Workflow for Comprehensive R-gene Annotation

R-gene Annotation and Validation Workflow

Addressing the annotation challenges presented by complex R-gene clusters requires specialized bioinformatics approaches that integrate multiple evidence sources and leverage both homology-based and machine learning methods. The remarkable diversity of NBS-LRR genes, with 168 distinct domain architecture classes identified across land plants [3], necessitates moving beyond standard annotation pipelines to specialized workflows that account for their unique genomic organization and evolutionary dynamics.

Future directions in R-gene annotation will likely involve improved integration of long-read sequencing technologies to resolve complex cluster regions, enhanced deep learning models trained on expanding collections of validated R-genes, and more sophisticated evolutionary analysis tools to reconstruct the birth-and-death dynamics that shape these gene families. As these technical challenges are addressed, researchers will be better positioned to leverage the vast diversity of R-genes for crop improvement, with more than 450 R genes already cloned from 42 plant species [64] providing a foundation for engineering disease-resistant crops protected by genetics rather than pesticides.

The continued development of specialized tools and pipelines for R-gene annotation, coupled with experimental validation approaches, will be essential for unlocking the full potential of this remarkable gene family in crop protection and sustainable agriculture.

Transcriptional Regulation and miRNA-Mediated Control of NBS-LRR Expression

Nucleotide-binding site leucine-rich repeat (NBS-LRR) genes constitute the largest family of plant resistance (R) genes, encoding intracellular immune receptors that confer pathogen-specific immunity through effector-triggered immunity (ETI). However, high constitutive expression of NBS-LRR genes incurs significant fitness costs and can be lethal to plant cells, necessitating sophisticated regulatory mechanisms. This technical review examines the evolutionary dynamics and molecular mechanisms governing NBS-LRR expression, focusing on transcriptional regulation and post-transcriptional control mediated by diverse microRNAs (miRNAs). We explore how plants balance the benefits of pathogen recognition against the autotoxicity costs of NBS-LRR overexpression through co-evolutionary networks, with emphasis on the convergent evolution of miRNA families that target conserved NBS-LRR motifs. The comprehensive analysis presented herein integrates genomic, phylogenetic, and experimental perspectives to elucidate the complex regulatory landscape of plant immune receptors.

NBS-LRR genes encode STAND (signal-transduction ATPases with numerous domains) P-loop ATPases that function as central hubs in plant immunity, detecting polymorphic pathogen effectors and initiating robust defense responses [13]. These proteins typically contain three fundamental domains: an N-terminal coiled-coil (CC) or Toll/Interleukin-1 receptor (TIR) domain, a central nucleotide-binding site (NBS) domain that functions as a molecular switch, and a C-terminal leucine-rich repeat (LRR) domain responsible for pathogen recognition [13]. The NBS domain contains conserved motifs including the P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL, which are essential for ATP/GTP binding and hydrolysis during immune signaling [22].

Plant genomes maintain highly variable NBS-LRR repertoires ranging from under 100 to over 1,000 genes, with their sum in a host population defining the detection repertoire for polymorphic pathogen effectors [13]. Two distinct evolutionary patterns characterize NBS-LRR genes: type I genes feature multiple paralogs and rapid evolution with frequent gene conversions, while type II genes have fewer paralogs and evolve slowly with rare gene conversion events [13]. Most NBS-LRRs are organized in genomic clusters generated through tandem duplications and genomic rearrangements [22]. This expansion creates a recognition capacity balancing act—sufficient diversity to detect evolving pathogens without the autotoxicity of overexpression. The fitness costs associated with NBS-LRR maintenance have driven the evolution of multilayer regulatory systems, with miRNA-mediated control representing a crucial mechanism for maintaining this balance [13].

Evolutionary Origins and Diversification of NBS-LRR Regulatory Networks

Phylogenetic Distribution and Lineage-Specific Adaptations

The link between NBS-LRRs and their regulation by small RNAs traces back to gymnosperms, emerging more than 100 million years after the origin of NBS-LRR genes in early land plants like mosses and spike mosses [13]. Comprehensive analyses across land plants reveal that NBS-LRR genes have undergone significant expansion and contraction events throughout plant evolution, with lineage-specific adaptations reflected in their domain architecture and subfamily distribution [3] [41].

Table 1: Evolutionary Distribution of NBS-LRR Subfamilies Across Plant Lineages

Plant Species/Lineage	Total NBS Genes	CNL Subfamily	TNL Subfamily	RNL Subfamily	Notable Characteristics
Arabidopsis thaliana (dicot)	207	~70%	~30%	Present	Balanced subfamily distribution
Oryza sativa (monocot)	505	~100%	Absent	Absent	Complete TNL loss
Triticum aestivum (monocot)	>1,000	~100%	Absent	Absent	Complete TNL loss
Salvia miltiorrhiza (medicinal dicot)	196	61 CNLs	Absent	1 RNL	Near-complete TNL loss
Pinus taeda (gymnosperm)	311	~10%	~89%	~1%	TNL dominance
Capsicum annuum (pepper)	252	248 nTNLs	4 TNLs	Present	Extreme nTNL dominance
Physcomitrella patens (moss)	~25	Mixed	Mixed	Mixed	Limited repertoire

Comparative genomics reveals striking lineage-specific patterns in NBS-LRR evolution. Monocots, including Poaceae family members like rice and wheat, demonstrate complete loss of TNL genes, while most dicots maintain both CNL and TNL subfamilies [41] [65]. However, exceptions exist even within dicots, with species like Mimulus guttatus and Salvia miltiorrhiza showing near-complete TNL loss [41] [65]. These distribution patterns reflect deep evolutionary pressures that have shaped immune receptor repertoires, possibly influencing concomitant miRNA regulator evolution.

Genomic Organization and Cluster Dynamics

NBS-LRR genes exhibit non-random genomic distribution, with approximately 54% forming physical clusters across plant genomes. Pepper (Capsicum annuum) exemplifies this pattern, with 136 of 252 NBS-LRR genes (54%) forming 47 gene clusters distributed across all chromosomes [22]. Chromosome 3 contains the highest concentration with 10 clusters, including the largest 8-gene cluster, while chromosome 6 contains no clusters [22]. Cluster members typically belong to the same gene subfamily, though mixed-cluster organizations also occur, suggesting complex evolutionary histories involving local duplications and rearrangements [22].

The correlation between cluster size and NBS-LRR numbers implies that tandem duplication represents a key mechanism for immune receptor diversification [65]. This clustering has profound implications for regulation, as duplicated NBS-LRRs from different gene families periodically give birth to new miRNAs, creating localized regulatory networks [13]. The birth of new miRNAs typically occurs through inverted duplication of target gene sequences, with subsequent mutations refining precursor processing and target specificity [13].

miRNA-Mediated Regulatory Mechanisms

miRNA Families Targeting NBS-LRR Genes and Their Evolutionary Conservation

At least eight families of miRNAs targeting NBS-LRRs have been identified across plant species, with the miR482/2118 superfamily representing the most deeply conserved [13]. These miRNAs typically target highly duplicated NBS-LRRs, while families of heterogeneous NBS-LRRs are rarely targeted in Poaceae and Brassicaceae genomes [13]. The tight association between NBS-LRR diversity and miRNA regulation represents a co-evolutionary adaptation allowing plants to maintain expansive immune receptor repertoires while mitigating fitness costs.

Table 2: Characterized miRNA Families Targeting NBS-LRR Genes

miRNA Family	Target Site	Conservation	Representative Functions
miR482/2118	P-loop motif	Gymnosperms to dicots	Targets multiple NBS-LRR lineages; triggers phasiRNA production
miR5300	P-loop motif	Specific lineages	Secondary layer of NBS-LRR regulation
miR6019	TIR domain	Specific lineages	TNL-specific regulation
miR6020	TIR domain	Specific lineages	TNL-specific regulation
miR7122	Multiple sites	Specific lineages	Family-specific NBS-LRR regulation
miR2118-3p	LRR domains	Common bean	Differential expression during fungal infection
miR5374	LRR domains	Common bean	Modulation during anthracnose infection
tae-miR1714	LysM receptors	Wheat	Novel regulator targeting non-NBS-LRR immune receptor

Most newly emerged miRNAs target the same conserved, encoded protein motifs of NBS-LRRs, particularly the P-loop region, consistent with convergent evolution [13]. This targeting strategy allows single miRNAs to regulate multiple NBS-LRR lineages, providing broad regulatory potential with minimal genetic investment. The conservation of these miRNAs from gymnosperms to dicots indicates they originated prior to the emergence of angiosperms [13].

Molecular Mechanisms of miRNA Action and Regulatory Networks

Plant miRNAs typically exhibit extensive complementarity to their target sequences, enabling transcript cleavage or translational repression through Argonaute (AGO) protein-containing RISC complexes [66]. Two primary mechanisms govern miRNA-mediated regulation: transcript cleavage reduces specific mRNA levels, while translation repression decreases protein accumulation without substantial transcript reduction [66]. The binding accessibility of target sites within mRNA molecules significantly influences regulatory efficacy, with flanking sequences playing crucial roles in allowing or restricting miRNA access [13].

A specialized regulatory mechanism involves 22-nt miRNAs, which trigger the generation of phased secondary siRNAs (phasiRNAs) from their target mRNAs [13]. This amplification system creates a robust regulatory cascade, particularly effective for large gene families like NBS-LRRs. In this process, 22-nt miRNAs (often resulting from precursors containing asymmetrical bulges) initiate phased siRNA production from NBS-LRR transcripts, generating secondary silencing signals that reinforce the primary miRNA regulation [13].

Figure 1: miRNA-Mediated Regulatory Mechanisms for NBS-LRR Genes. miRNAs guide RISC complexes to complementary NBS-LRR mRNAs, leading to transcript cleavage or translation repression. Twenty-two nucleotide miRNAs can trigger phasiRNA production, creating an amplified silencing cascade.

Experimental Approaches for Investigating miRNA-NBS-LRR Interactions

Comprehensive Identification and Expression Profiling

Cutting-edge methodologies enable researchers to dissect the complex regulatory relationships between miRNAs and their NBS-LRR targets. The following integrated workflow represents state-of-the-art approaches for characterizing these interactions:

Figure 2: Integrated Workflow for miRNA-NBS-LRR Interaction Analysis. Comprehensive approach combining high-throughput sequencing, bioinformatic prediction, and experimental validation.

Step 1: High-Throughput Sequencing

Small RNA Sequencing: Isolate small RNAs (<30 nt) from treated and control tissues using specialized kits (e.g., miRNeasy Micro Kit). Construct libraries with the NEBNext Small RNA Library Prep Set and sequence on Illumina platforms [67] [68]. Preprocess data by removing adapters (Cutadapt v1.14), filtering by size (15-41 nt) and quality (Q20), and removing ambiguous reads [68].
Degradome Sequencing: Perform parallel analysis of degraded mRNA 5' ends to identify miRNA-mediated cleavage products. This approach provides transcriptome-wide evidence of miRNA targets.

Step 2: Bioinformatic Identification and Prediction

miRNA Identification: Map sequenced small RNAs to reference genomes using Bowtie or similar aligners. Identify known miRNAs through database matching (miRBase) and novel miRNAs through characteristic hairpin structure prediction [67].
Target Prediction: Utilize specialized algorithms (psRobot, TargetFinder) with strict parameters to identify potential NBS-LRR targets. Focus on complementarity to miRNA seed regions and conserved NBS-LRR motifs [67].

Step 3: Expression Correlation Analysis

Dual RNA-seq: Process mRNA sequencing data from the same samples to quantify NBS-LRR expression. Align reads to reference genomes, calculate normalized counts (FPKM/TPM), and identify differentially expressed genes [3].
Statistical Correlation: Perform correlation analysis between miRNA and target NBS-LRR expression patterns. Inverse expression relationships suggest functional regulation.

Functional Validation Techniques

Virus-Induced Gene Silencing (VIGS) The barley stripe mosaic virus (BSMV) VIGS system enables functional characterization of miRNA-NBS-LRR interactions in plants [67]. For miRNA silencing, engineer constructs expressing short tandem target mimics (STTMs) that sequester endogenous miRNAs. For overexpression, clone pre-miRNA sequences into viral vectors [67]. Key steps include:

Vector Preparation: Modify BSMV vectors to accommodate miRNA sequences or target mimics.
In Vitro Transcription: Generate infectious RNA transcripts from linearized plasmids.
Plant Inoculation: Rub transcript mixtures onto leaves at multiple growth stages.
Phenotypic Assessment: Evaluate disease symptoms, pathogen growth, and hypersensitive response.
Molecular Validation: Confirm miRNA modulation and target derepression via RT-qPCR.

Dual-Luciferase and Fluorescent Reporter Assays Validate direct miRNA-target interactions through heterologous expression systems:

Vector Construction: Clone candidate target sequences (including miRNA binding sites) downstream of reporter genes (YFP, luciferase) in plant expression vectors [67].
Co-transformation: Introduce reporter constructs alongside miRNA overexpression vectors into Nicotiana benthamiana leaves via Agrobacterium infiltration.
Quantitative Measurement: Assess fluorescence intensity via confocal microscopy or luminescence using dual-luciferase assays [67] [68].
Statistical Analysis: Compare reporter activity between miRNA-expressing and control samples to confirm targeting.

Genetic and Transgenic Approaches

Stable Transformation: Generate transgenic plants overexpressing miRNAs or resistant NBS-LRR variants with silent mutations in miRNA target sites.
CRISPR/Cas9 Mutagenesis: Create knockout mutations in miRNA genes or their binding sites within NBS-LRR transcripts to disrupt regulation.

Research Reagent Solutions for miRNA-NBS-LRR Studies

Table 3: Essential Research Reagents and Resources

Reagent/Resource	Specific Application	Function and Utility	Example Products/Codes
miRNeasy Micro Kit	Small RNA extraction	Isolation of high-quality small RNAs from plant tissues	QIAGEN 217084
NEBNext Small RNA Library Prep Set	sRNA library construction	Preparation of sequencing libraries from small RNAs	NEB E7330S
ExoQuick Plasma Kit	Exosome isolation	Extraction of circulating exosomes for cross-kingdom studies	SBI EXOQ5A-1
BSMV VIGS System	Functional validation	Virus-induced gene silencing for miRNA and target characterization	Custom vectors
Dual-Luciferase Reporter System	Target validation	Quantitative measurement of miRNA-target interactions	Promega E1910
Agrobacterium tumefaciens	Transient transformation	Delivery of genetic constructs into plant tissues	GV3101, LBA4404
PRGminer	R-gene prediction	Deep learning-based identification of resistance genes	https://kaabil.net/prgminer/
psRobot	miRNA target prediction	Bioinformatics tool for plant miRNA target identification	http://omicslab.genetics.ac.cn/psRobot/

The intricate regulatory networks controlling NBS-LRR expression represent evolutionary solutions to the fundamental challenge of maintaining effective immunity without autotoxicity. The co-evolution of NBS-LRR genes and their miRNA regulators has created a dynamic system that balances detection capacity against fitness costs, enabling plants to adapt to evolving pathogen pressures. The convergent evolution of miRNAs targeting conserved NBS-LRR motifs demonstrates the power of natural selection to arrive at similar regulatory solutions across plant lineages.

Future research directions should focus on several key areas: (1) exploring the potential for engineered miRNAs to enhance crop resistance without yield penalties, (2) investigating cross-kingdom RNA regulation as a potential mechanism for pathogen manipulation of host immunity, and (3) developing computational models that predict regulatory outcomes from miRNA-NBS-LRR interaction networks. The integration of multi-omics approaches with advanced gene editing technologies will further illuminate this complex regulatory landscape, potentially enabling precise manipulation of plant immunity for sustainable agriculture.

Understanding miRNA-mediated control of NBS-LRR expression extends beyond fundamental plant biology, offering practical applications in crop improvement and disease management. As climate change and global trade accelerate pathogen spread, leveraging these natural regulatory mechanisms may prove essential for developing durable resistance in crop plants.

Balancing Fitness Costs and Defense Benefits in NBS Gene Maintenance

The nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family constitutes the most prevalent class of disease resistance (R) genes in plants, playing a critical role in effector-triggered immunity. The evolutionary maintenance of these genes represents a fundamental trade-off between the fitness benefits of pathogen resistance and the costs associated with gene expression and function. This whitepaper synthesizes current research on the selective pressures acting on NBS-encoding genes, examining the molecular signatures of purifying and balancing selection, genomic distribution patterns, and the experimental methodologies used to characterize these evolutionary dynamics. Within the broader context of land plant evolution, understanding these mechanisms provides crucial insights into plant-pathogen co-evolution and informs strategies for developing durable disease resistance in crops.

Plants employ a sophisticated two-layered immune system to defend against pathogens. The second layer, effector-triggered immunity (ETI), is primarily mediated by NBS-LRR proteins that recognize pathogen-secreted effectors, often activating a hypersensitive response and programmed cell death [41]. Approximately 80% of functionally characterized R genes belong to the NBS-LRR family, making them a central component of the plant immune system [41].

The NBS domain serves as a molecular switch, binding and hydrolyzing ATP to activate downstream immune signaling, while the LRR domain is responsible for recognizing diverse effectors released by pathogens [41]. This gene family exhibits remarkable plasticity, with copy numbers varying significantly across plant species—from approximately 150 in Arabidopsis to almost 500 in rice [69]. This rapid copy number evolution is driven by repeated cycles of duplication, divergence, and eventual loss via pseudogene formation or deletion in response to diverse pathogenic pressures [69].

The maintenance of this extensive genetic arsenal involves significant fitness costs, creating an evolutionary trade-off that shapes NBS gene diversity within plant genomes. This review examines the mechanisms balancing these costs and benefits through integrated molecular, genomic, and ecological perspectives.

Genomic Distribution and Diversity Patterns

Genomic Organization of NBS-LRR Genes

NBS-encoding genes display non-random distribution patterns across plant genomes, often clustering in specific chromosomal regions. In sorghum, over 60% of NBS-encoding genes are located on just three chromosomes (SBI-02, SBI-05, and SBI-08), with approximately 68.7% arranged in clustered configurations [69]. Similar clustering patterns occur in radish, where 72% of NBS-encoding genes form 48 clusters distributed across 24 crucifer blocks on chromosomes [70].

This clustered organization facilitates evolutionary plasticity through mechanisms such as tandem duplication. Comparative analyses reveal that NBS-LRR genes are significantly enriched in regions containing fungal pathogen resistance quantitative trait loci (QTL), highlighting their functional importance in disease resistance [69].

Table 1: NBS-LRR Gene Distribution Across Plant Species

Plant Species	Total NBS Genes	Notable Distribution Features	Reference
Sorghum (Sorghum bicolor)	346	60% on 3 chromosomes; 68.7% in clusters	[69]
Radish (Raphanus sativus)	225	72% clustered in 48 groups across chromosomes	[70]
Salvia (Salvia miltiorrhiza)	196	62 typical NLRs with complete domains	[41]
Vernicia montana	149	Higher numbers on Vmchr2, Vmchr7, Vmchr11	[60]
Vernicia fordii	90	Higher numbers on Vfchr2, Vfchr3, Vfchr9	[60]

Structural Diversity and Classification

NBS-LRR genes are classified based on their domain architecture, primarily according to their N-terminal domains:

TNL: Contains Toll/interleukin-1 receptor (TIR) domain
CNL: Contains coiled-coil (CC) domain
RNL: Contains resistance to powdery mildew 8 (RPW8) domain

Additional structural variations include partial genes lacking complete domains (e.g., TN, CN, NL) [70]. The distribution of these subfamilies varies significantly across plant lineages, with TNL subfamilies markedly reduced or absent in certain species. In Salvia miltiorrhiza, for instance, the 62 typical NLRs include 61 CNLs and only 1 RNL protein, with complete absence of TNL subfamilies [41]. Similarly, monocotyledonous species like rice have completely lost TNL and RNL subfamilies [41].

Table 2: NBS-LRR Gene Classification Across Species

Species	TNL	CNL	RNL	Partial/Other	Total	Reference
Raphanus sativus	80	19	0	126	225	[70]
Arabidopsis thaliana	75	-	-	89	164	[70]
Vernicia montana	12	98	-	39	149	[60]
Vernicia fordii	0	49	-	41	90	[60]
Sorghum bicolor	0	24	-	322	346	[69]

Evolutionary Mechanisms and Selection Signatures

Contrasting Selection Pressures

NBS-encoding genes exhibit molecular signatures of contrasting evolutionary processes. In sorghum, these genes are significantly enriched in genomic regions under both purifying selection (evident during domestication and improvement) and balancing selection [69].

Purifying selection acts to remove deleterious alleles, characterized by elevated differentiation between wild and cultivated groups, low nucleotide diversity, and negatively skewed allele frequency spectra. This selective pressure conserves essential resistance functions while eliminating costly variants.

Balancing selection maintains genetic variation within populations, potentially through frequency-dependent selection or heterozygote advantage. This process preserves diversity in resistance genes, enabling populations to respond to evolving pathogenic threats.

The diagram below illustrates how these selection pressures interact with NBS gene evolution:

Fitness Costs and Their Molecular Basis

The maintenance of NBS-LRR genes incurs significant fitness costs that drive evolutionary trade-offs. Several molecular mechanisms underlie these costs:

Resource Allocation Costs: NBS-LRR proteins are structurally complex, requiring substantial energetic resources for expression and maintenance. In radish, expression analyses revealed that 75 NBS-encoding genes contribute to resistance against Fusarium wilt, representing significant metabolic investment [70].

Autoimmunity Risks: Inappropriate activation of NBS-LRR-mediated immunity can lead to autoimmune responses, where the immune system attacks host tissues. This is particularly problematic under conditions without pathogenic challenge.

Pleiotropic Effects: Some NBS-LRR genes exhibit pleiotropic effects on plant development. For instance, while most trichome development genes affect both leaf trichomes and root hairs, GL1 specifically influences trichome development without affecting root hairs [71].

Evidence from ecological studies demonstrates these fitness trade-offs. Research on trichome production in Arabidopsis halleri subsp. gemmifera revealed equivalent fitness of hairy and glabrous plants under natural herbivory, allowing their coexistence in contemporary populations [71]. However, under weak herbivory conditions, a fitness cost of trichome production became apparent, illustrating the context-dependent nature of these trade-offs [71].

Experimental Approaches and Methodologies

Genome-Wide Identification Protocols

HMMER-Based Domain Identification The standard approach for comprehensive identification of NBS-encoding genes involves hidden Markov model (HMM) profiling:

Obtain the NB-ARC domain HMM profile (PF00931) from the Pfam database
Screen the entire proteome of the target species using HMMER software
Manually curate candidates through functional annotation based on closest homologs in model species
Classify genes into subfamilies (TNL, CNL, RNL) using NCBI Conserved Domain Database
Validate domain architecture with multiple domain databases (e.g., InterPro)

This methodology successfully identified 225 NBS-encoding genes in radish [70], 196 in Salvia miltiorrhiza [41], and 239 across two Vernicia species [60].

Polymorphism Analysis To assess selection pressures on NBS-encoding genes:

Resequence diverse accessions (wild, landrace, improved genotypes)
Calculate nucleotide diversity measures (e.g., θπ)
Compare diversity in NBS-encoding genes versus randomly selected non-NBS genes
Analyze allele frequency spectra and differentiation between population groups
Test for enrichment in selective sweep regions

In sorghum, this approach revealed significantly higher diversity in NBS-encoding genes compared to non-NBS genes, with enrichment in the upper 5% tail of the empirical distribution of nucleotide diversity [69].

Functional Characterization Methods

Expression Analysis

Collect RNA-seq data from resistant and susceptible genotypes under control and pathogen-challenge conditions
Identify differentially expressed NBS-LRR genes
Validate expression patterns via quantitative real-time PCR (qRT-PCR)
Correlate expression with resistance phenotypes

In radish, this approach identified RsTNL03 (Rs093020) and RsTNL09 (Rs042580) as positive regulators of resistance to Fusarium oxysporum, while RsTNL06 (Rs053740) acted as a negative regulator [70].

Functional Validation via VIGS Virus-induced gene silencing (VIGS) provides an efficient method for functional characterization:

Clone target NBS-LRR gene fragment into VIGS vector
Inoculate plants with the recombinant virus
Challenge silenced plants with target pathogen
Assess disease symptoms and pathogen load
Compare with control plants

This method demonstrated that Vm019719 confers resistance to Fusarium wilt in Vernicia montana [60]. The experimental workflow for functional characterization is summarized below:

The Scientist's Toolkit: Essential Research Reagents

Table 3: Essential Research Reagents for NBS-LRR Gene Studies

Reagent/Method	Application	Key Features	Reference
HMMER Software with NB-ARC Profile (PF00931)	Genome-wide identification of NBS-encoding genes	Hidden Markov Model approach for comprehensive domain detection	[70] [41]
Virus-Induced Gene Silencing (VIGS) Systems	Functional characterization of candidate NBS-LRR genes	Rapid gene silencing without stable transformation	[60]
RNA-seq Transcriptome Profiling	Expression analysis of NBS-LRR genes under pathogen challenge	Genome-wide expression quantification	[70] [41]
qRT-PCR with Specific Primers	Validation of NBS-LRR gene expression patterns	High sensitivity and quantitative accuracy	[70]
Pfam and NCBI Conserved Domain Databases	Domain architecture classification	Curated domain models and annotations	[70]
Resequencing Panels (Wild, Landrace, Improved)	Selection pressure analysis	Polymorphism detection and diversity calculations	[69]

The evolutionary maintenance of NBS-LRR genes represents a dynamic equilibrium between the imperative for pathogen recognition and the constraints of fitness costs. Evidence from diverse plant species reveals that this balance is achieved through contrasting selection pressures—purifying selection that conserves essential functions while minimizing costs, and balancing selection that maintains diversity for evolving pathogenic threats.

The genomic distribution of NBS-LRR genes in clusters, frequently enriched in disease resistance QTL regions, facilitates rapid evolution through tandem duplication and diversifying selection. The structural reduction of specific subfamilies (particularly TNL) in certain lineages further illustrates how evolutionary trajectories shape this gene family in response to selective constraints.

Future research directions should include:

Comprehensive fitness cost quantification across diverse ecological contexts
Characterization of NBS-LRR gene regulation in response to pathogen pressure
Investigation of pleiotropic effects on plant development and physiology
Integration of NBS-LRR diversity data into predictive models for crop breeding

Understanding these evolutionary dynamics provides a framework for developing disease-resistant crop varieties with optimized trade-offs between defense investment and agricultural productivity.

Functional validation of genes is a cornerstone of modern plant molecular biology, providing the foundational knowledge required for advanced breeding and genetic engineering. Within the context of studying the evolution of NBS domain genes in land plants—a major class of disease resistance genes—researchers require robust methodologies to link genetic sequences to biological functions [72] [3]. Virus-Induced Gene Silencing (VIGS) and Genetic Transformation have emerged as two powerful, yet distinct, strategies for this purpose. VIGS offers a rapid, transient silencing approach that exploits the plant's own antiviral defense mechanisms, while stable genetic transformation provides permanent genetic modification. This guide provides an in-depth technical comparison of these methodologies, detailing their protocols, applications, and integration, with a specific focus on validating the function of NBS domain genes involved in plant immunity and evolution [3] [73]. The choice between these strategies depends on research goals, time constraints, and the plant species under investigation, and their synergistic use can powerfully accelerate functional genomics research.

Virus-Induced Gene Silencing (VIGS): A Transient Knockdown Tool

Core Principles and Mechanism

VIGS is a post-transcriptional gene silencing (PTGS) technique that utilizes a recombinant viral vector to trigger sequence-specific degradation of endogenous plant mRNAs [72] [74]. The process begins when an engineered virus containing a fragment of the plant target gene is introduced into the plant. The plant's cellular machinery replicates the viral RNA, forming double-stranded RNA (dsRNA) intermediates. These dsRNAs are recognized and cleaved by Dicer-like enzymes (DCL) into 21-24 nucleotide small interfering RNAs (siRNAs). The siRNAs are incorporated into an RNA-induced silencing complex (RISC), which uses the siRNA as a guide to identify and cleave complementary endogenous mRNA molecules, thereby silencing the target gene [72] [73]. The entire process is outlined in Figure 1.

Figure 1: Mechanism of Virus-Induced Gene Silencing (VIGS)

Key Viral Vectors for VIGS

The success of VIGS largely depends on the choice of viral vector. Different vectors are suited to different plant families and experimental needs. Table 1 summarizes the most commonly used VIGS vectors.

Table 1: Key Viral Vectors Used in VIGS

Vector Name	Virus Type	Host Range/Applications	Key Features	References
Tobacco Rattle Virus (TRV)	RNA virus	Broad host range; Solanaceae (pepper, tomato, tobacco), Arabidopsis, soybean	Efficient systemic movement, mild symptoms, targets meristematic tissues	[72] [75] [74]
Bean Pod Mottle Virus (BPMV)	RNA virus	Soybean	High efficiency in legumes; often requires particle bombardment	[75]
Barley Stripe Mosaic Virus (BSMV)	RNA virus	Monocots (barley, wheat)	One of the few reliable vectors for cereal crops	[74]
Geminiviruses (CLCrV, ACMV)	DNA virus	Cotton, tomato	DNA-based vectors; useful for species recalcitrant to RNA vectors	[72]
Satellite Virus-Based Systems	DNA/RNA satellite	Tomato, etc.	Two-component system; strong silencing with minimal viral symptoms	[74]

Detailed VIGS Protocol for Gene Validation

The following is a generalized TRV-based VIGS protocol, optimized for challenging species like soybean, which can be adapted for other plants including those used in NBS gene research [75].

Vector Construction:
- A ~300-500 bp fragment of the target gene (e.g., an NBS domain-encoding sequence) is amplified by PCR. This fragment should be designed to minimize off-target silencing using bioinformatic tools.
- The PCR product is cloned into the multiple cloning site of the pTRV2 vector using restriction enzymes (e.g., EcoRI and XhoI).
- The recombinant pTRV2 plasmid and the helper pTRV1 plasmid are independently introduced into Agrobacterium tumefaciens strain GV3101.
Plant Preparation & Agroinfiltration:
- For soybean: Surface-sterilized seeds are soaked until swollen and then bisected longitudinally to create half-seed explants with a fresh cut at the cotyledonary node [75].
- Agrobacterium cultures containing pTRV1 and pTRV2 (with insert) are grown to an OD₆₀₀ of ~1.0-1.5, pelleted, and resuspended in an induction medium (e.g., with acetosyringone).
- The two Agrobacterium suspensions are mixed in a 1:1 ratio.
- The plant explants are immersed in the Agrobacterium mixture for 20-30 minutes with gentle agitation to facilitate infection at the wound site.
Plant Growth and Phenotyping:
- Infected explants are transferred to a sterile tissue culture system to support regeneration.
- Plants are grown under controlled environmental conditions (temperature, humidity, photoperiod) that favor viral spread and silencing efficiency. Silencing phenotypes typically appear 2-4 weeks post-inoculation.
- Silencing efficiency can be monitored by tracking a visible marker like photobleaching in plants silenced for Phytoene Desaturase (PDS), and must be confirmed quantitatively via qRT-PCR.

Stable Genetic Transformation for Functional Validation

Core Principles and Comparison of Methods

Stable genetic transformation involves the permanent integration of a foreign gene into the plant genome, enabling the study of gene function through overexpression, knockout, or knock-in modifications. This results in heritable changes that can be studied over multiple generations [76]. Recent breakthroughs aim to overcome the major bottleneck of plant transformation: the reliance on lengthy, genotype-dependent tissue culture processes.

Table 2: Key Genetic Transformation Methods

Method	Principle	Key Applications	Advantages	Limitations
Agrobacterium-mediated	Uses A. tumefaciens to transfer T-DNA containing the gene of interest into the plant genome [76].	Most dicots (tomato, tobacco, soybean); some monocots.	Relatively simple, low cost, typically single-copy insertions.	Genotype-dependent, often requires tissue culture, can be time-consuming.
Pollen-tube Pathway	Exogenous DNA is applied to the site of pollination and enters the fertilized egg via the pollen tube [76].	Cotton, soybean, wheat.	Bypasses tissue culture; technically simple.	Efficiency can be low and variable; not universally applicable.
Tissue Culture-Free (Wound-Induced)	Activates the plant's innate wound-response and regeneration pathways directly on the parent plant [77] [78].	Tomato, soybean, tobacco.	Dramatically speeds up process (weeks vs. months); avoids tissue culture; works with CRISPR.	Still being optimized for broad species application.

Innovative Tissue Culture-Free Transformation Protocol

A groundbreaking method developed by Patil et al. (2025) combines wound-induced regeneration with Agrobacterium delivery to accelerate the creation of transgenic plants [77] [78]. The workflow is depicted in Figure 2.

Figure 2: Tissue Culture-Free Transformation Workflow

Detailed Steps:

Construct Design: A synthetic gene cassette is engineered to contain two key components: a wound-responsive promoter driving genes like WIND1 (which triggers cell reprogramming) and IPT (involved in cytokinin biosynthesis for shoot growth), and a CRISPR/Cas9 system for precise gene editing [78].
Agrobacterium Delivery: The construct is cloned into Agrobacterium, which is applied directly to wound sites on the parent plant (e.g., stem nodes or clipped shoots).
Wound-Induced Regeneration: The activation of WIND1 and IPT reactivates the plant's developmental programs, triggering the formation of new adventitious shoots directly from the wound site, bypassing the need for de-differentiation into callus.
Selection and Harvest: The newly formed shoots, which carry the genetic modification, are grown to maturity. In tested species like tomato and soybean, this process has generated transgenic or gene-edited seeds in as little as 3.5 weeks, with success rates of 21% to 35% [77].

Application to NBS Domain Gene Evolution Research

The evolution of the NBS-LRR gene family is characterized by extensive expansion, diversification, and tandem duplications, leading to large, variable repertoires in plant genomes [3] [13] [22]. Functional validation is crucial to understand the role of specific NBS genes in pathogen resistance and evolutionary adaptation.

Studying Gene Family Expansion: Researchers identified 12,820 NBS-domain-containing genes across 34 plant species [3]. VIGS is an ideal tool for the high-throughput functional screening of these candidate genes. For example, silencing a specific NBS gene (GaNBS in Orthogroup 2) in resistant cotton demonstrated its role in reducing virus titer, directly validating its function in disease resistance [3].
Understanding Fitness Costs: High constitutive expression of NBS-LRR genes can be lethal to plant cells [13]. Stable transformation allows researchers to study the effects of overexpressing specific NBS genes, helping to elucidate the fitness costs that have shaped their complex regulatory networks, including control by microRNAs [13].
Elucidating Regulatory Mechanisms: VIGS can be used to silence regulatory genes, such as those encoding transcription factors or components of the small RNA biogenesis pathway, to study their impact on the expression and function of NBS-LRR genes [13].

Integrated Strategy and The Scientist's Toolkit

For a comprehensive research program on NBS gene evolution, an integrated approach is most powerful. VIGS should be used for rapid, high-throughput preliminary screening of multiple candidate NBS genes identified from genomic studies. Promising candidates can then be subjected to more detailed, heritable functional analysis using stable genetic transformation (including CRISPR/Cas9 editing) to create permanent mutant lines for in-depth phenotypic and evolutionary analysis.

Research Reagent Solutions for Functional Validation

Reagent / Tool	Function / Application	Examples & Notes
TRV-based VIGS Vectors	Induces transient gene silencing in a wide range of dicot plants.	pTRV1 and pTRV2 binary vectors; the insert is cloned into pTRV2 [72] [75].
Gateway-Compatible Vectors	Allows rapid recombination-based cloning of target gene fragments.	Reduces time and increases throughput for vector construction.
Agrobacterium Strains	Delivery vehicle for genetic material into plant cells.	GV3101 is a common disarmed strain for VIGS and transformation [75].
Marker Genes	Visual indicators of successful transformation or silencing.	GFP for tracking infection [75]; PDS for visualizing silencing (photobleaching) [72] [75].
Wound-Response Plasmids	Enable tissue culture-free transformation.	Plasmids carrying WIND1 and IPT genes under wound-responsive promoters [77] [78].
CRISPR/Cas9 Systems	For precise gene editing in stable transformation.	Used to create knockouts or precise modifications in NBS-LRR genes [78].

The functional validation of genes, particularly within large and complex families like the NBS-LRR genes, is critical for understanding plant immunity and evolution. VIGS stands out as a rapid, flexible, and powerful tool for initial, high-throughput functional screening. In parallel, advancements in stable genetic transformation, especially the new tissue culture-free methods, are breaking down long-standing technical barriers, enabling the faster creation of stable genetic lines for deeper analysis. By strategically combining these approaches, researchers can efficiently bridge the gap from genomic sequence to biological function, accelerating the pace of discovery in plant evolutionary genetics and the development of improved, disease-resistant crops.

Optimizing Genome Assemblies for Accurate NBS Gene Prediction

The evolution of land plants is fundamentally linked to their ability to adapt to pathogenic threats, a process mediated significantly by nucleotide-binding site (NBS) domain genes. These genes encode one of the largest families of plant resistance (R) proteins and play a crucial role in effector-triged immunity [3]. Current research has identified 12,820 NBS-domain-containing genes across 34 plant species, spanning from mosses to monocots and dicots, displaying remarkable structural diversity with 168 distinct domain architecture classes [3]. This diversity encompasses both classical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific structural variations, underscoring the dynamic evolutionary history of this gene family.

Accurate genome assembly and annotation present particular challenges for NBS genes due to their characteristic genomic organization. These genes are often arranged in clusters of tandemly duplicated sequences, and their inherent similarity can lead to local genome assembly collapse and annotation problems [79]. Furthermore, standard annotation pipelines frequently misannotate NBS loci because their multiplicity of similar sequences causes issues with repeat masking, and they are often expressed at low levels, providing limited RNA-seq evidence for gene prediction [79]. These technical challenges necessitate specialized approaches for genome assembly and annotation when the research focus includes comprehensive characterization of NBS gene families.

The Assembly Imperative: Foundation for Gene Discovery

The quality of genome assembly directly impacts the completeness and accuracy of NBS gene prediction. High-quality reference genomes are the cornerstone of modern genomics, yet error-free eukaryotic genome assembly remains challenging despite technological advances [80]. For NBS genes specifically, the combination of their repetitive nature, tandem duplication patterns, and sequence similarity creates obstacles for conventional assembly algorithms.

Long-read sequencing technologies from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) theoretically solve many of these problems by spanning repetitive regions, thus providing larger "puzzle pieces" for assembly [81]. However, ONT sequencing presents unique challenges, as errors tend to accumulate and assembly statistics plateau as sequencing depth increases [81]. Robust experimental design is therefore essential, with evidence suggesting that eukaryotic genome assembly requires high-molecular-weight DNA extractions that increase read length, coupled with computational protocols that reduce error through pre-assembly correction and read selection [81].

Recent studies indicate that pure ONT sequencing and assembly outperforms hybrid approaches, with contiguous assemblies achievable at sequencing coverage of >60× [81]. However, simply increasing sequencing depth is insufficient; pre-assembly filtering and read correction improve contiguity, while post-assembly polishing using short Illumina reads increases accuracy [81]. These findings highlight the importance of rigorous experimental design in obtaining assemblies suitable for comprehensive NBS gene family analysis.

Addressing Haplotypic Duplications in Diploid and Polyploid Genomes

A particularly challenging aspect of assembling NBS genes involves "haplotypic duplications," where alleles in heterozygous regions are mistakenly assembled as paralogous genes [80]. This problem is especially pertinent for NBS genes in diploid or polyploid plant genomes, where false duplicates can create illusions of gene family expansions, leading to incorrect conclusions about genome evolution and functioning [80].

Specialized tools such as Mabs have been developed to optimize parameters of popular genome assemblers Hifiasm and Flye, creating assemblies with more accurately assembled genes [80]. Mabs employs a novel metric called AG (number of Accurately assembled Genes) that improves upon traditional BUSCO assessments by differentiating between true multicopy orthogroups (composed of paralogues) and false multicopy orthogroups (composed of uncollapsed alleles) based on coverage analysis [80]. This approach is particularly valuable for NBS gene research, where distinguishing true gene family expansions from assembly artifacts is essential for evolutionary studies.

Table 1: Key Challenges in NBS Gene Assembly and Annotation

Challenge	Impact on NBS Genes	Potential Solution
Tandem Repeats	Causes assembly collapse in clustered NBS regions [79]	Long-read sequencing to span repetitive regions [81]
Haplotypic Duplications	Alleles mistaken for paralogs, inflating gene counts [80]	Tools like Mabs with AG metric for parameter optimization [80]
Repeat Masking	NBS genes incorrectly masked as transposable elements [79]	Homology-based prediction (HRP) bypassing automated annotation [79]
Low Expression	Limited RNA-seq evidence for gene prediction [79]	Combined evidence approach using protein homology [79]

Specialized Annotation Strategies for NBS Genes

Conventional genome annotation pipelines often fail to adequately predict NBS genes due to their complex genomic organization. Standard gene annotation tools that rely on automated gene prediction followed by protein motif/domain-based search (PDS) prove imprecise for NBS genes, as repeat masking prior to genome annotation often prevents comprehensive detection [79]. This has led to the development of specialized methods designed specifically for resistance gene annotation.

The Homology-Based R-gene Prediction (HRP) Method

The full-length Homology-based R-gene Prediction (HRP) method represents a significant advance in NBS gene identification [79]. This approach uses a two-level homology search: first identifying an initial set of R-genes in the automated gene prediction using protein domains, then using these R-genes for full-length homology searches in the genome assembly. This strategy successfully addresses the complex genomic organization of NBS-LRR gene loci and has proven more effective than well-established methods like RenSeq [79].

In practical tests, HRP identified 363 NB-LRR genes in the tomato genome, including 103 of 105 novel genes previously identified by the manually curated RenSeq method [79]. The method's efficiency was further demonstrated in Beta vulgaris genomes, where it identified up to 45% more full-length NB-LRR genes compared to previous approaches [79]. HRP also proved valuable for R-gene allele mining, enabling identification of previously undiscovered Fom-2 homologs in five Cucurbita species genomes [79].

Figure 1: HRP Method Workflow for comprehensive NBS gene identification

Integrated Annotation Approaches

Successful NBS gene annotation typically requires an integrated approach combining multiple evidence types. This includes de novo, homology, and transcriptome-based predictions [82]. For example, in the high-quality eggplant genome assembly, researchers used RNA from five different tissues (root, stem, leaf, flower, and fruit) for both next-generation transcriptome sequencing and full-length transcriptome sequencing, enabling prediction of 36,582 coding genes [82]. Such comprehensive transcriptomic data provides valuable supporting evidence for gene prediction, though it may be insufficient alone for lowly expressed NBS genes.

Additional quality assessment tools such as BUSCO (Benchmarking Universal Single-Copy Orthologs) help evaluate assembly completeness by assessing the presence of evolutionarily conserved single-copy genes [82]. For the eggplant genome, BUSCO evaluation showed that 2,190 homologous single-copy genes were assembled, representing 94.2% of all expected single-copy genes [82]. This metric provides a useful indicator of overall assembly quality, though specialized metrics like AG may be more appropriate for assessing gene families prone to haplotypic duplications.

Table 2: Comparison of NBS Gene Annotation Methods

Method	Principle	Advantages	Limitations
Domain Search (PDS)	Searches for NBS domains in predicted proteins [79]	Standardized, works with any annotation	Misses fragmented/false genes; repeat masking issues [79]
RenSeq	Resistance gene enrichment & sequencing [79]	High-quality manual curation; targeted approach	Labor-intensive; requires specialized libraries [79]
HRP Method	Two-level homology using initial R-gene set [79]	Comprehensive; identifies full-length genes	Depends on initial gene set quality [79]
Combined Evidence	Integrates de novo, homology, transcriptome [82]	Multiple supporting evidence types	Resource-intensive; may miss low-expression genes [82]

Experimental Design and Workflow Optimization

Based on current research, an optimized workflow for genome assembly targeting NBS gene characterization involves multiple stages with specific quality control checkpoints. The following protocol integrates best practices from recent studies to maximize the accuracy of NBS gene prediction.

DNA Extraction and Sequencing Strategies

Successful assembly begins with high-quality input DNA. For eukaryotic genomes, high-molecular-weight DNA extractions are critical, as they increase sequence read length, which is particularly beneficial for spanning repetitive NBS gene clusters [81]. Protocols should include verification of DNA quality through pulsed-field gel electrophoresis and quantification using fluorometric methods (e.g., Qubit) rather than spectrophotometry alone [81].

For nematode samples, a recommended approach includes growing organisms on specialized growth medium, harvesting by centrifugation, and performing repeated washing until supernatant is clear [81]. DNA extraction then utilizes a modified phenol-chloroform approach after flash-freezing in liquid nitrogen and proteinase K digestion [81]. Size selection using kits such as the Short Read Eliminator Kit from Circulomics Inc. further enhances read length by removing fragmented DNA [81].

Sequencing technology selection should be guided by research goals. Oxford Nanopore Technologies (ONT) offers advantages in versatility, low input DNA requirements, and cost, making it suitable for individual research laboratories [81]. ONT library preparation can be modified from standard protocols by replacing the first AmpureXP bead clean step with additional treatment with the Short Read Eliminator Kit, improving read length [81].

Assembly and Quality Assessment Pipeline

The assembly process should incorporate specialized tools and metrics designed to address challenges specific to gene families like NBS genes. The Mabs suite provides parameter optimization for Hifiasm and Flye assemblers, creating genome assemblies with more accurately assembled genes than default parameters in 5 out of 6 tested cases [80].

Figure 2: Optimized Assembly Pipeline for NBS Gene Research

Post-assembly processing should include decontamination steps using tools like Blobtools2 and SIDR, which utilize taxonomic assignment, read coverage depth, and GC content to identify non-target contigs [81]. SIDR employs ensemble-based machine learning to train models capable of discriminating target and contaminant contigs based on measured predictor variables, allowing assignment of probable taxonomic origin to contigs that lack BLAST identification [81].

Quality assessment should move beyond traditional metrics like N50, incorporating gene-specific assessments such as the AG metric that differentiates between true and false multicopy orthogroups based on coverage [80]. This is particularly relevant for NBS genes, which often exist in multicopy families and are prone to haplotypic duplication artifacts.

The Scientist's Toolkit: Essential Research Reagents and Tools

Table 3: Essential Research Reagents and Computational Tools

Tool/Reagent	Function	Application in NBS Research
Circulomics SRE Kit	Size selection for long DNA fragments [81]	Increases read length for spanning NBS clusters
ONT SQK-LSK109	Ligation sequencing kit [81]	Produces long reads for repetitive region assembly
Mabs Suite	Parameter optimizer for Hifiasm/Flye [80]	Reduces haplotypic duplications in gene families
HRP Pipeline	Homology-based R-gene prediction [79]	Identifies full-length NBS genes missed by annotation
BlobTools2	Taxonomic identification of scaffolds [81]	Removes contamination from assembly
BUSCO/AG Metric	Assembly completeness assessment [80]	Evaluates gene space completeness accurately
Purge_dups	Haplotig removal tool [80]	Addresses allele duplication in assemblies

Accurate genome assembly and annotation are fundamental to understanding the evolution of NBS domain genes in land plants. The structural diversity of these genes—with 168 distinct classes identified across land plants—reflects their dynamic evolutionary history and adaptation to diverse pathogenic challenges [3]. The development of specialized methods like the HRP annotation pipeline and assembly optimization tools such as Mabs represents significant advances in our ability to comprehensively characterize this important gene family.

Future directions in this field will likely involve even more integrated approaches, combining emerging sequencing technologies with improved computational methods. As genome assembly techniques continue to advance toward telomere-to-telomere resolution, opportunities will expand for studying complex genomic regions harboring NBS gene clusters. Similarly, machine learning approaches show promise for further improving gene prediction accuracy, particularly for challenging gene families with unique characteristics like NBS genes.

For researchers focusing on plant-pathogen coevolution, implementing the optimized workflows described in this guide will enable more accurate characterization of NBS gene families, leading to better understanding of plant immunity evolution and facilitating the development of disease-resistant crops through targeted breeding strategies. The continued refinement of these methods remains essential for advancing our knowledge of plant genome evolution and the molecular basis of disease resistance.

Validating Function and Revealing Patterns Through Cross-Species Analysis

Functional Validation through Virus-Induced Gene Silencing (VIGS) Assays

Virus-Induced Gene Silencing (VIGS) has emerged as a powerful reverse genetics tool for rapidly characterizing gene function in plants, particularly in species with complex genomes that pose challenges for stable transformation. This technique leverages the plant's innate RNA interference (RNAi) machinery, using recombinant viral vectors to trigger sequence-specific degradation of target endogenous mRNA transcripts, leading to transient gene knockdown and observable phenotypic changes [72]. The application of VIGS is especially valuable in the context of plant immunity research, where it enables direct functional testing of candidate resistance genes, including those encoding nucleotide-binding site (NBS) domain proteins [3].

The NBS gene family represents one of the largest and most diverse classes of plant resistance (R) genes, playing crucial roles in effector-triggered immunity (ETI) against various pathogens [41]. These genes exhibit remarkable structural diversity and evolutionary dynamics, with significant expansion observed across land plants from bryophytes to higher angiosperms [3]. Recent studies have identified numerous NBS-encoding genes with diverse domain architectures, including classical (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific structural patterns [3]. However, the functional validation of these genes remains a critical step in understanding their roles in plant defense mechanisms.

This technical guide provides comprehensive methodologies for implementing VIGS assays to functionally validate NBS domain genes, with particular emphasis on experimental design, protocol optimization, and integration with evolutionary genomics frameworks. By bridging evolutionary insights with functional validation, researchers can effectively decipher the molecular mechanisms underlying plant immunity and accelerate the development of disease-resistant crops.

The Evolutionary Context of NBS Domain Genes

NBS domain genes constitute a major component of the plant immune system, with their evolution characterized by significant diversification and expansion across land plants. Comparative genomic analyses have revealed 12,820 NBS-domain-containing genes across 34 species ranging from mosses to monocots and dicots, classified into 168 distinct classes based on domain architecture patterns [3]. This diversity encompasses both classical configurations and species-specific structural patterns, highlighting the dynamic evolutionary history of this gene family.

The evolutionary trajectory of NBS genes is marked by several key mechanisms:

Gene duplication events: Both whole-genome duplication (WGD) and small-scale duplications (SSD), including tandem, segmental, and transposon-mediated duplications, have driven family expansion [3].
Orthogroup distribution: Analysis has identified 603 orthogroups with both core (commonly shared) and unique (species-specific) distributions, with tandem duplications playing a significant role in lineage-specific adaptations [3].
Lineage-specific variations: Significant differences in NBS gene subfamily composition exist across plant species. Dicots typically possess both TNL (TIR-NBS-LRR) and CNL (CC-NBS-LRR) subfamilies, while monocots have largely lost TNL genes [41] [22]. Some medicinal plants like Salvia miltiorrhiza show marked reduction in TNL and RNL subfamily members [41].

Table 1: Evolutionary Distribution of NBS-LRR Genes Across Plant Species

Plant Species	Total NBS Genes	CNL Subfamily	TNL Subfamily	RNL Subfamily	Key Evolutionary Features
Arabidopsis thaliana	~207 [41]	~61% [83]	~35% [83]	~4% [83]	Balanced subfamily distribution
Oryza sativa (rice)	~505 [41]	~100%	Lost	Lost	Complete absence of TNL genes
Solanum tuberosum (potato)	~447 [41]	Majority [22]	Minority [22]	-	nTNL dominance
Salvia miltiorrhiza	196 [41]	61 (typical)	2	1	Severe reduction in TNL/RNL
Capsicum annuum (pepper)	252-288 [22] [83]	248 [22]	4 [22]	-	Extreme nTNL dominance
Gossypium hirsutum (cotton)	12,820 (across 34 species) [3]	Multiple classes	Multiple classes	Multiple classes	Extensive diversification

The functional implications of this evolutionary diversity are profound. NBS genes are often organized in clusters throughout plant genomes, with approximately 54% of pepper NBS-LRR genes forming 47 distinct clusters [22]. This genomic arrangement facilitates the rapid generation of new resistance specificities through unequal crossing over and gene conversion, enabling plants to keep pace with evolving pathogens. The integration of evolutionary analysis with functional validation through VIGS provides a powerful framework for identifying key genetic elements contributing to disease resistance in crop species.

Principles of Virus-Induced Gene Silencing

Molecular Mechanisms

VIGS operates through the plant's natural antiviral defense mechanism known as post-transcriptional gene silencing (PTGS). The process begins when recombinant viral vectors containing fragments of target plant genes are introduced into plant tissues. Once inside plant cells, these vectors replicate and produce double-stranded RNA (dsRNA) replication intermediates, which are recognized by the plant's RNAi machinery as foreign molecules [72].

Cellular Dicer-like (DCL) enzymes then process these dsRNA molecules into 21-24 nucleotide small interfering RNAs (siRNAs). These siRNAs are incorporated into an RNA-induced silencing complex (RISC), which uses the siRNA as a guide to identify and cleave complementary mRNA sequences, including both viral RNAs and endogenous transcripts sharing sequence similarity with the inserted fragment [72]. This results in targeted degradation of the corresponding plant mRNA, effectively reducing expression of the gene of interest and enabling functional characterization through observation of resulting phenotypes.

Viral Vectors for VIGS

Several viral vectors have been developed for VIGS applications, with each offering distinct advantages and limitations:

Tobacco Rattle Virus (TRV) is one of the most widely used VIGS vectors due to its broad host range, efficient systemic movement, and mild symptom development [75] [72]. The TRV system utilizes a bipartite design with two plasmid vectors: TRV1 encodes replicase and movement proteins, while TRV2 contains the coat protein gene and a multiple cloning site for insertion of target gene fragments [72].

Bean Pod Mottle Virus (BPMV) has been successfully employed in soybean functional genomics, though it often requires particle bombardment for delivery and can induce leaf phenotypic alterations that complicate phenotypic evaluation [75].

Other viral vectors including Pea Early Browning Virus (PEBV), Soybean Yellow Common Mosaic Virus (SYCMV), Apple Latent Spherical Virus (ALSV), and Cucumber Mosaic Virus (CMV) have also been adapted for VIGS in various plant species [75].

The following diagram illustrates the molecular mechanism of TRV-mediated VIGS:

Experimental Design for VIGS Assays

Target Gene Selection and Fragment Design

Effective VIGS relies on careful selection of target gene fragments. For NBS domain genes, researchers should identify specific regions that maximize silencing efficiency while minimizing off-target effects:

Fragment length: Optimal fragments range from 200-500 base pairs, with 200-300 bp often providing the best balance between silencing efficiency and vector stability [84].
Sequence specificity: Selected regions should share <40% similarity with other genes in the genome to prevent non-target silencing [84]. Tools like the SGN VIGS Tool (https://vigs.solgenomics.net/) can assist in identifying unique target sequences [84].
Domain considerations: For NBS domain genes, targeting conserved regions within the NBS domain itself can be effective, though verification of specificity through homologous family analysis is essential [84].
Control constructs: Always include both empty vector controls (TRV1 + TRV2 without insert) and positive controls (e.g., TRV:PDS targeting phytoene desaturase for photobleaching phenotype) [75].

Vector Construction and Agrobacterium Preparation

The following protocol details the construction of TRV-based VIGS vectors and preparation of Agrobacterium cultures:

Amplification of target fragment: Using high-fidelity DNA polymerase, amplify the selected gene fragment from cDNA with gene-specific primers containing appropriate restriction sites (e.g., EcoRI and XhoI) [75].
Vector ligation: Digest the pTRV2 vector with corresponding restriction enzymes and ligate the purified PCR product using standard molecular cloning techniques [75].
Transformation and sequence verification: Transform ligation products into E. coli DH5α competent cells, select positive colonies, and verify insert sequence through Sanger sequencing [84].
Agrobacterium transformation: Introduce verified recombinant plasmids and empty vector controls into Agrobacterium tumefaciens strain GV3101 through heat shock or electroporation [85].
Culture preparation: Inoculate single colonies of Agrobacterium harboring TRV1, TRV2-empty, and TRV2-target constructs into liquid LB media containing appropriate antibiotics (kanamycin 50 μg/mL, gentamicin 25 μg/mL) and grow overnight at 28°C with shaking [85].
Induction: Dilute cultures 1:10 in fresh LB media with antibiotics, 10 mM MES, and 200 μM acetosyringone, and grow until OD600 reaches 0.8-1.2. Harvest bacterial pellets by centrifugation and resuspend in induction buffer (10 mM MES, 10 mM MgCl2, 200 μM acetosyringone) to final OD600 of 0.5-1.5 [85]. Maintain at room temperature for 3-4 hours before infiltration.

Plant Infiltration Methods and Optimization

Infiltration Techniques

Multiple infiltration methods have been developed for different plant species and tissue types:

Cotyledon infiltration: The most common method for dicot plants like cotton, tomato, and pepper. Puncture superficial wounds on the abaxial side of cotyledons from 7-10-day-old seedlings using a 25G needle, then flood with Agrobacterium mixture using a needleless syringe until fully saturated [85].
Pericarp cutting immersion: Particularly effective for recalcitrant tissues like Camellia drupifera capsules. Bisect explants and immerse fresh cut surfaces in Agrobacterium suspension for 20-30 minutes [84]. This method achieved ~94% infiltration efficiency in optimized systems [84].
Other methods: Direct injection, peduncle injection, and fruit-bearing shoot infusion can be effective for specific tissues and plant species [84].

Critical Optimization Parameters

Several factors significantly influence VIGS efficiency and must be optimized for each plant system:

Plant developmental stage: Optimal silencing effects vary with developmental stage. In Camellia drupifera capsules, maximum silencing efficiency for CdCRY1 (69.80%) and CdLAC15 (90.91%) was observed at early and mid developmental stages, respectively [84].
Agroinoculum concentration: OD600 values between 0.5-1.5 generally provide good results, with optimal concentration potentially species-dependent [72].
Environmental conditions: Temperature, humidity, and photoperiod significantly impact VIGS efficiency. Most systems perform well at 20-23°C with 14:10 light:dark photoperiod and high humidity maintained immediately after infiltration [85] [72].
Co-cultivation period: Maintaining high humidity for 16-24 hours post-infiltration enhances Agrobacterium infection efficiency [85].

The following workflow diagram illustrates the complete VIGS experimental process:

Validation and Assessment of Silencing Efficiency

Molecular Validation Techniques

Confirming successful gene silencing is crucial for interpreting VIGS results. Multiple molecular techniques provide complementary validation:

Reverse-transcription quantitative PCR (RT-qPCR): The gold standard for quantifying silencing efficiency at the transcript level. Proper reference gene selection is critical for accurate normalization. Studies in cotton have identified GhACT7 and GhPP2A1 as the most stable reference genes under VIGS conditions, while commonly used genes like GhUBQ7 and GhUBQ14 showed poor stability [85].
Protein-level analysis: Western blotting or specific immunoassays can confirm reduction of target protein levels, though antibodies are not always available for NBS domain proteins.
Visual markers: For optimized systems, GFP fluorescence can indicate successful infection and silencing distribution when using pTRV2-GFP derivatives [75].

Phenotypic Assessment

Functional validation of NBS domain genes typically involves challenging silenced plants with target pathogens and assessing disease responses:

Disease scoring: Quantitative assessment of disease symptoms, lesion size, pathogen proliferation, and hypersensitive response compared to control plants.
Biochemical assays: Measurement of defense-related compounds like reactive oxygen species, callose deposition, and pathogenesis-related (PR) protein expression.
Comparative analysis: Evaluate responses in susceptible versus resistant plant genotypes. For example, in cotton NBS genes, significant genetic variation was identified between susceptible (Coker 312; 5,173 variants) and tolerant (Mac7; 6,583 variants) accessions [3].

Table 2: Key Research Reagents for VIGS Experimental Workflow

Reagent/Resource	Specifications	Function/Application	Considerations
TRV Vectors	pTRV1 (pYL192), pTRV2 (pYL156)	Bipartite vector system for VIGS	TRV1 encodes replication proteins; TRV2 for target insertion
Agrobacterium Strain	GV3101	Delivery of TRV constructs to plant cells	Optimized for plant transformations
Antibiotics	Kanamycin (50 μg/mL), Gentamicin (25 μg/mL)	Selection for vector maintenance	Concentration critical for bacterial viability & selection
Induction Compounds	Acetosyringone (200 μM), MES (10 mM)	Induce vir genes; buffer pH	Essential for T-DNA transfer efficiency
Reference Genes	GhACT7, GhPP2A1 [85]	RT-qPCR normalization in cotton	Species-specific validation required
Positive Controls	TRV:PDS (photobleaching), TRV:CLA1 (albinism) [85]	System functionality assessment	Visual confirmation of silencing
Bioinformatics Tools	SGN VIGS Tool, Primer3, PlantCARE	Target selection, primer design, CRE analysis	Ensure specificity & effectiveness

Case Study: VIGS of NBS Genes in Cotton

A comprehensive study demonstrates the application of VIGS for functional validation of NBS domain genes in cotton. Researchers identified 12,820 NBS-domain-containing genes across 34 plant species and classified them into 168 architectural classes [3]. Expression profiling revealed putative upregulation of specific orthogroups (OG2, OG6, OG15) in different tissues under various biotic and abiotic stresses in cotton accessions with contrasting responses to cotton leaf curl disease (CLCuD) [3].

Key experimental findings include:

Genetic variation: Comprehensive analysis identified significant variation in NBS genes between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions, with 6,583 unique variants in Mac7 and 5,173 in Coker 312 [3].
Protein interactions: Protein-ligand and protein-protein interaction assays demonstrated strong binding of putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus [3].
Functional validation: Silencing of GaNBS (OG2) in resistant cotton through VIGS established its putative role in virus tittering, confirming the functional importance of this NBS gene in disease resistance [3].

This case study highlights the power of integrating evolutionary genomics with VIGS-mediated functional validation to identify key genetic elements contributing to disease resistance.

Troubleshooting and Technical Considerations

Common Challenges and Solutions

Low silencing efficiency: Optimize fragment design to ensure uniqueness and appropriate length. Adjust Agrobacterium density and infiltration method for specific tissues. Extend the incubation period before phenotypic assessment [84] [72].
Inconsistent silencing across plants: Standardize plant growth conditions, developmental stage at infiltration, and environmental parameters post-infiltration. Ensure uniform Agrobacterium culture preparation [72].
Non-specific phenotypes: Include multiple controls (empty vector, non-infiltrated, positive control) to distinguish target gene effects from viral symptoms or experimental artifacts [75] [85].
Poor systemic spread: Verify vector construction and Agrobacterium viability. Consider alternative infiltration methods or viral vectors better suited to the target species [84].

Integration with Evolutionary Genomics

When applying VIGS to study NBS domain gene evolution, consider these specialized approaches:

Orthogroup-targeting: Design VIGS constructs to target conserved regions within specific orthogroups to assess functional conservation across species [3].
Lineage-specific genes: Include species-specific NBS genes in functional screens to identify novel resistance determinants that have emerged in particular lineages [41] [22].
Expression-correlated silencing: Prioritize NBS genes showing differential expression during pathogen challenge or between resistant and susceptible genotypes [3] [83].

Virus-Induced Gene Silencing represents a powerful approach for functionally validating NBS domain genes within an evolutionary framework. The integration of phylogenetic analyses with targeted functional studies enables researchers to identify key genetic elements governing plant immunity and understand how these systems have evolved across land plants. As genomic resources continue to expand for non-model species, VIGS provides an accessible, rapid, and cost-effective method for bridging sequence information with biological function, ultimately accelerating the development of disease-resistant crops through molecular breeding approaches.

The technical guidelines presented in this document provide a comprehensive framework for implementing VIGS assays to study NBS gene function, with emphasis on experimental design, protocol optimization, and integration with evolutionary genomics. By following these methodologies and considering the troubleshooting recommendations, researchers can effectively leverage this powerful technology to advance our understanding of plant immunity and its evolution.

This technical whitepaper synthesizes findings from a large-scale comparative genomic analysis of Nucleotide-Binding Site (NBS) domain genes across 34 plant species, from mosses to monocots and dicots. The study identified 12,820 NBS-domain-containing genes, classifying them into 168 distinct classes with both classical and novel domain architecture patterns. Evolutionary analyses revealed 603 orthogroups with core and lineage-specific expansions, while expression profiling demonstrated differential regulation under biotic and abiotic stresses. The research provides a comprehensive framework for understanding the evolutionary dynamics of plant immune receptor genes and their implications for disease resistance breeding.

Plant immunity relies on a sophisticated surveillance system where intracellular nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, also known as NLR proteins, function as critical immune receptors. These proteins recognize pathogen effectors and initiate effector-triggered immunity (ETI), providing plants with specific resistance against diverse pathogens [3]. NBS genes represent one of the largest and most variable gene families in plants, with repertoires ranging from fewer than 100 to over 1,000 members across different species [13]. This remarkable diversity stems from continuous evolutionary arms races with rapidly evolving pathogens, making the comparative analysis of NBS gene repertoires essential for understanding plant-pathogen coevolution.

The typical structure of an NBS-LRR protein includes three fundamental domains: an N-terminal domain (either Toll/Interleukin-1 receptor [TIR] or coiled-coil [CC]), a central nucleotide-binding site (NBS) domain, and a C-terminal leucine-rich repeat (LRR) region [22]. Based on their N-terminal domains, NLRs are classified into distinct subfamilies: CNLs (containing CC domains), TNLs (with TIR domains), and RNLs (featuring RPW8 domains) [86]. The NBS domain contains several conserved motifs—including the P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL—that are essential for ATP/GTP binding and hydrolysis, which activate downstream immune signaling [22].

Recent advances in sequencing technologies have enabled comprehensive comparative analyses of NBS gene repertoires across multiple plant species. This whitepaper examines the evolutionary patterns, structural diversification, and functional specialization of NBS genes across 34 plant species, providing insights into the genetic basis of disease resistance in plants.

Results

Genome-Wide Identification and Classification of NBS Genes

The comprehensive analysis of 34 plant species covering lineages from mosses to monocots and dicots identified 12,820 NBS-domain-containing genes [3]. These genes were classified into 168 distinct classes based on their domain architecture, revealing significant diversity among plant species. The study discovered both classical structural patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific patterns (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS) [3].

Table 1: NBS Gene Distribution Across Major Plant Lineages

Plant Lineage	Number of Species Analyzed	Total NBS Genes Identified	Notable Structural Patterns
Bryophytes	Included	Not specified	Minimal NLR repertoires (~25 in Physcomitrella patens)
Lycophytes	Included	Not specified	Highly reduced NLR repertoires (~2 in Selaginella moellendorffii)
Monocots	Multiple	Not specified	Complete absence of TNL genes
Dicots	Multiple	Not specified	Both TNL and CNL subtypes present
Total	34	12,820	168 domain architecture classes

The research further demonstrated that the number of NBS-LRR genes varies substantially across plant genomes. For example, while bryophytes and lycophytes possess minimal NLR repertoires (approximately 25 in Physcomitrella patens and only 2 in Selaginella moellendorffii), extensive gene expansion has occurred in flowering plants [3]. This expansion is particularly pronounced in angiosperms, with some species harboring thousands of NBS-LRR genes.

Evolutionary Analysis and Orthogroup Distribution

Orthogroup (OG) analysis revealed 603 orthogroups across the examined species, with evidence of both core (widely conserved) and unique (lineage-specific) orthogroups [3]. Core orthogroups (OG0, OG1, OG2, etc.) represent evolutionarily conserved NBS genes maintained across multiple species, while unique orthogroups (OG80, OG82, etc.) are highly specific to particular lineages, likely reflecting species-specific pathogen pressures.

Tandem duplications were identified as a major mechanism driving the expansion and diversification of NBS gene repertoires. These duplication events frequently lead to the formation of gene clusters, with 54% of NBS-LRR genes in pepper (Capsicum annuum) forming 47 physical clusters across the genome [22]. Similar clustering patterns have been observed across diverse plant species, contributing to the rapid evolution of novel recognition specificities.

Table 2: NBS Gene Subfamily Distribution in Selected Species

Plant Species	Total NBS Genes	CNL Subfamily	TNL Subfamily	RNL Subfamily	Atypical/Other
Arabidopsis thaliana	210	40	Not specified	Not specified	170
Dendrobium officinale	74	10	0	Not specified	64
Capsicum annuum	252	2 (typical)	4	1 (RN)	245
Salvia miltiorrhiza	196	61	0	1	134
Solanaceae (9 species)	819	583	182	54	Not specified
Asparagus officinalis	27	Not specified	Not specified	Not specified	Not specified

The distribution of NBS gene subfamilies shows remarkable lineage-specific patterns. Monocots, including orchids and grasses, have completely lost TNL-type genes, while eudicots typically maintain both CNL and TNL subtypes [40]. For example, comprehensive analysis of six orchid species revealed a complete absence of TNL-type genes, consistent with the pattern observed in other monocots [40]. In medicinal plants like Salvia miltiorrhiza, researchers identified 196 NBS genes, with only 62 possessing complete N-terminal and LRR domains, and a notable reduction in TNL and RNL subfamily members [41].

Genomic Distribution and Clustering Patterns

NBS genes are distributed unevenly across plant chromosomes, with a strong tendency to cluster in specific genomic regions. In pepper, NBS-LRR genes are distributed across all chromosomes, with chromosome 3 harboring the highest number (38 genes) while chromosomes 2 and 6 contain the lowest (5 genes each) [22]. Similar clustering patterns have been observed across multiple species, with chromosomal termini often enriched with NBS-LRR genes [87].

These clusters frequently arise from tandem duplications and genomic rearrangements, creating hotspots for the evolution of novel resistance specificities. Analysis of the Solanaceae family revealed that whole genome duplication (WGD) has played a significant role in the expansion of NBS-LRR genes, with the most recent whole genome triplication (WGT) particularly impacting this gene family [87].

Domestication and Its Impact on NBS Gene Repertoires

Comparative analyses between domesticated crops and their wild relatives have revealed that domestication has significantly impacted NBS gene repertoires. A study of 15 domesticated crop species and their wild relatives found that five crops—grapes, mandarins, rice, barley, and yellow sarson—exhibited significantly reduced immune receptor gene repertoires compared to their wild counterparts [88].

This pattern is particularly evident in asparagus, where domesticated Asparagus officinalis contains only 27 NLR genes, compared to 47 in its wild relative A. kiusianus and 63 in A. setaceus [86]. This contraction of the NLR repertoire during domestication is associated with increased disease susceptibility in the cultivated species. Orthologous gene analysis identified only 16 conserved NLR gene pairs between A. setaceus and A. officinalis, representing the NLR genes preserved during the domestication process [86].

The duration of domestication appears positively associated with the extent of immune receptor gene loss, suggesting that domestication imposes cumulative pressure on the maintenance of NBS gene repertoires, consistent with relaxed selection rather than strong cost-of-resistance effects [88].

Methodologies for NBS Gene Identification and Analysis

Genome-Wide Identification of NBS Genes

The identification of NBS genes across multiple species typically follows a standardized bioinformatics workflow:

Figure 1: Workflow for genome-wide identification of NBS genes. Key steps include domain searches using multiple complementary methods followed by domain architecture validation.

Domain Identification Using HMM and BLAST Approaches

The primary method for identifying NBS genes involves Hidden Markov Model (HMM) searches using the conserved NB-ARC domain (Pfam: PF00931) as a query. Researchers typically employ PfamScan.pl HMM search script with a default e-value cutoff (1.1e-50) using the background Pfam-A_hmm model [3] [86]. All genes containing the NB-ARC domain are initially considered NBS genes and filtered for further analysis.

Complementary BLAST searches provide additional validation. Local BLASTp analyses (BLAST+ v2.0) are conducted against reference NLR protein sequences from model plants like Arabidopsis thaliana, Oryza sativa, and other relevant species, applying a stringent E-value cutoff of 1e-10 [86]. Candidate sequences identified through both methods are extracted using bioinformatics tools like TBtools [86].

Domain Architecture and Classification

Protein domains are characterized using InterProScan and NCBI's Batch CD-Search, with sequences containing the NB-ARC domain (E-value ≤ 1e-5) retained as bona fide NLR genes [86]. Final classification is performed by querying the Pfam and PRGdb 4.0 databases, with genes categorized based on their complete domain architecture [86]. Classification follows established systems that place similar domain-architecture-bearing genes under the same classes [3].

Evolutionary and Phylogenetic Analysis

Evolutionary analyses employ OrthoFinder v2.5.1 for orthogroup inference, utilizing the DIAMOND tool for fast sequence similarity searches and the MCL clustering algorithm for gene clustering [3]. Orthologs and orthogrouping are carried out with DendroBLAST, while multiple sequence alignment is performed using MAFFT 7.0 [3]. Gene-based phylogenetic trees are constructed using maximum likelihood algorithms implemented in FastTreeMP with 1000 bootstrap replicates [3].

For specific plant families, researchers often employ additional analyses. In the Solanaceae family, the DupGen_finder (v1.0) program is used to classify gene duplication types, including whole genome duplications (WGD), tandem duplications (TD), proximal duplications (PD), transposon-related duplications (TRD), and dispersed segmental duplications (DSD) [87].

Expression Analysis Under Stress Conditions

Expression profiling of NBS genes involves analyzing RNA-seq data from various tissues under biotic and abiotic stresses. Researchers typically retrieve FPKM values from specialized databases such as:

IPF database
Cotton Functional Genomics Database (CottonFGD)
Cottongen database
NCBI BioProjects [3]

The RNA-seq data is categorized into three types: (1) tissue-specific (leaf, stem, flower, pollen, etc.), (2) abiotic stress-specific (dehydration, cold, drought, heat, etc.), and (3) biotic-stress specific (responses to various pathogens) expression profiling [3]. Data processing follows established transcriptomic pipelines, with final visualization using heatmaps to illustrate differential expression patterns.

Table 3: Essential Research Reagents and Resources for NBS Gene Analysis

Resource Category	Specific Tool/Resource	Primary Function	Application Context
Genome Databases	NCBI, Phytozome, Plaza	Access to genome assemblies and annotations	Initial data retrieval for comparative analyses
Domain Analysis	PfamScan, InterProScan, HMMER	Identification of conserved protein domains	NBS gene identification and classification
Orthogroup Analysis	OrthoFinder v2.5.1, DIAMOND	Orthogroup inference and sequence similarity search	Evolutionary analysis and conserved gene family identification
Phylogenetic Analysis	MAFFT, FastTreeMP, MEGA	Multiple sequence alignment and tree construction	Evolutionary relationship reconstruction
Expression Analysis	IPF Database, CottonFGD, Cottongen	Access to tissue-specific and stress-induced expression data	Expression profiling under different conditions
Functional Validation	VIGS (Virus-Induced Gene Silencing)	Functional characterization of candidate NBS genes	In planta validation of gene function
Specialized Tools	RGAugury, PRGminer	Prediction of resistance gene analogs	Genome-wide R gene identification

Emerging Computational Tools: PRGminer

Recent advances include the development of deep learning-based tools like PRGminer, which provides a comprehensive approach to identifying and classifying R-genes that outperforms previous methods in terms of efficacy and precision [36]. PRGminer operates in two phases: Phase I predicts input protein sequences as R-genes or non-R-genes, while Phase II classifies the predicted R-genes into eight different classes based on domain architecture [36]. This tool achieves an accuracy of 98.75% in k-fold training/testing and 95.72% on independent testing, representing a significant improvement over traditional alignment-based methods [36].

Regulatory Mechanisms and Expression Patterns

miRNA-Mediated Regulation of NBS Genes

Plants implement multiple mechanisms to control the transcript levels of NBS-LRR defense genes, as their high expression can be lethal to plant cells [13]. Diverse miRNAs target NBS-LRRs in eudicots and gymnosperms, functioning as negative transcriptional regulators. There is a tight association between NBS-LRR diversity and miRNAs, with miRNAs typically targeting highly duplicated NBS-LRRs [13].

The interaction between miRNAs and NBS-LRRs represents a co-evolutionary model where duplicated NBS-LRRs from different gene families periodically give birth to new miRNAs. Most newly emerged miRNAs target the same conserved, encoded protein motif of NBS-LRRs, particularly the P-loop region, consistent with a model of convergent evolution [13]. This regulatory system potentially allows plants to maintain extensive NLR repertoires without exhausting functional NLR loci, offsetting the fitness costs associated with NLR maintenance [3].

Figure 2: miRNA-mediated regulatory network for NBS-LRR genes. This co-evolutionary model illustrates how plants balance the benefits and costs of NBS-LRR defense genes.

Expression Profiling Under Stress Conditions

Expression analyses demonstrate that NBS genes show specific upregulation under various stress conditions. In cotton, expression profiling revealed putative upregulation of specific orthogroups (OG2, OG6, and OG15) in different tissues under various biotic and abiotic stresses in susceptible and tolerant plants to cotton leaf curl disease (CLCuD) [3].

Similar patterns were observed in Dendrobium officinale, where transcriptome analysis following salicylic acid (SA) treatment identified 1,677 differentially expressed genes (DEGs), including six NBS-LRR genes that were significantly up-regulated [40]. One gene in particular, Dof020138, was closely associated with pathogen identification pathways, MAPK signaling pathways, plant hormone signal transduction pathways, and various biosynthetic and energy metabolism pathways [40].

Promoter analyses across multiple species have revealed an abundance of cis-acting elements in NBS genes related to plant hormones and abiotic stress, providing mechanistic insights into their stress-responsive expression patterns [41].

Discussion: Evolutionary Patterns and Functional Implications

Lineage-Specific Evolution of NBS Gene Repertoires

The comparative analysis of NBS genes across 34 plant species reveals distinct evolutionary patterns among major plant lineages. The minimal NLR repertoires in bryophytes and lycophytes suggest that substantial gene expansion occurred primarily in flowering plants [3]. This expansion has been driven by various mechanisms including whole-genome duplication (WGD) and small-scale duplications (SSD) encompassing tandem, segmental, and transposon-mediated duplications [3].

The complete absence of TNL-type genes in monocots, including orchids and grasses, represents a major lineage-specific evolutionary pattern [40]. This loss may be potentially driven by NRG1/SAG101 pathway deficiency in these lineages [40]. In contrast, most eudicots maintain both TNL and CNL subtypes, though with significant variation in their relative proportions. For example, in the Solanaceae family, analysis of nine species revealed 819 NBS-LRR genes, divided into 583 CNL, 182 TNL, and 54 RNL genes [87].

Impact of Domestication on Disease Resistance

The reduction of NBS gene repertoires in domesticated crops compared to their wild relatives has significant implications for disease resistance breeding. Studies show that domesticated asparagus (A. officinalis) not only has fewer NLR genes than its wild relatives but also that the majority of preserved NLR genes in the cultivated species demonstrate either unchanged or downregulated expression following fungal challenge [86]. This suggests that the increased disease susceptibility of domesticated crops is driven by both the contraction of NLR gene repertoire and the functional impairment of retained NLR genes—likely a consequence of artificial selection favoring yield and quality traits over disease resistance [86].

Future Directions for Research and Breeding

The identification of core orthogroups conserved across multiple species provides valuable candidates for broad-spectrum resistance breeding. These evolutionarily conserved NBS genes may recognize conserved pathogen patterns or play fundamental roles in immune signaling cascades. Conversely, species-specific NBS genes offer insights into lineage-specific pathogen pressures and potential sources of specialized resistance.

Functional validation through approaches like virus-induced gene silencing (VIGS), as demonstrated with GaNBS (OG2) in resistant cotton, provides critical evidence for the putative role of specific NBS genes in disease resistance [3]. The strong interaction of putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus in protein-ligand and protein-protein interaction studies further supports their functional importance in pathogen recognition and signal transduction [3].

The comprehensive analysis of NBS gene repertoires across 34 plant species provides unprecedented insights into the evolution and diversification of plant immune receptor genes. The identification of 12,820 NBS-domain-containing genes classified into 168 architectural classes highlights the remarkable diversity of this gene family, while the discovery of 603 orthogroups reveals both conserved and lineage-specific evolutionary patterns.

The significant reduction of NBS gene repertoires in domesticated species underscores the importance of incorporating wild relatives in breeding programs to enhance disease resistance. The regulatory mechanisms controlling NBS gene expression, particularly miRNA-mediated regulation, represent crucial components of the plant immune system that balance effective pathogen defense with the fitness costs of maintaining these defense genes.

This comparative genomic framework provides a foundation for future research aimed at understanding plant adaptation mechanisms and offers valuable resources for developing disease-resistant crops through targeted breeding strategies. The integration of computational predictions with functional validation will be essential for translating these genomic insights into practical applications for crop improvement.

Plant immunity is fundamentally shaped by the evolution of specific gene families that enable recognition of pathogens. The Nucleotide-Binding Site (NBS)-Leucine-Rich Repeat (LRR) gene family represents one of the largest and most critical classes of plant resistance (R) genes, with approximately 80% of cloned R genes encoding proteins belonging to this family [22] [41]. These genes encode intracellular immune receptors that are central to the plant's effector-triggered immunity (ETI), which provides a robust, often race-specific defense response against adapting pathogens [89] [41].

The evolution of NBS-LRR genes across land plants reveals a dynamic history of gene expansion, diversification, and loss. Comparative genomic analyses across species from mosses to monocots and dicots have identified thousands of NBS-domain-containing genes with remarkable structural diversity [3]. These genes are typically classified based on their N-terminal domains into TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL) subfamilies [3] [41]. Phylogenetic studies show significant variation in the prevalence of these subfamilies across plant lineages, with notable losses of TNL genes in monocots and specific dicot species [22] [41]. This evolutionary landscape provides the essential context for understanding species-specific resistance mechanisms, such as those deployed against powdery mildew in cannabis and pepper.

Powdery Mildew Resistance in Cannabis: NLR and mlo-Based Mechanisms

Genomic Architecture of NBS-LRR Genes in Cannabis

Cannabis sativa L., an economically important crop for medicinal, recreational, and industrial purposes, faces significant production challenges due to powdery mildew (PM) disease, primarily caused by the biotrophic fungus Golovinomyces ambrosiae [90]. The plant's defense against this pathogen involves two primary genetic mechanisms: NBS-LRR-mediated resistance and loss-of-susceptibility (mlo-based) resistance [89].

Research indicates that cannabis NBS-LRR genes function as intracellular immune receptors that recognize pathogen-secreted effector proteins, triggering a defense cascade that often includes a hypersensitive response (HR) - characterized by localized cell death at infection sites - and the production of antimicrobial compounds [89] [90]. This recognition follows the gene-for-gene model, where specific R genes products directly or indirectly interact with complementary pathogen avirulence (Avr) effectors [89].

Table 1: Key Characteristics of Cannabis Powdery Mildew Resistance Loci

Locus Name	Chromosomal Location	Resistance Type	Key Features	Molecular Markers
PM1	Chromosome 2 (NC_044375.1)	Qualitative (NLR-mediated)	First reported R-gene in cannabis; contains conserved NBS-LRR domains	Not specified in studies
PM2	Chromosome 9 (NC_083609.1)	Qualitative (NLR-mediated)	Single dominant locus; induces localized hypersensitive response; suppresses pathogen sporulation	SNP markers developed from associated SNPs

Experimental Characterization of the PM2 Locus

The novel powdery mildew resistance locus PM2 was recently identified and characterized through an integrated approach combining bulked segregant analysis with RNA sequencing (BSR-Seq) [90]. The experimental methodology encompassed several key stages:

Population Development: Researchers developed F1 mapping populations by crossing PM-resistant parents (W03 and N88) with a susceptible cultivar (AC). The inheritance pattern observed in segregating populations (1:1 resistant to susceptible ratio) indicated that PM resistance is controlled by a single dominant locus [90].
Phenotypic Screening: A large diversity panel of 510 cannabis genotypes was evaluated for PM susceptibility using clone assays. Plants were inoculated via "dusting" with fungal spores from sporulating infected leaves and maintained at 23°C with 80% relative humidity for 48 hours, then at 70% RH for the remainder of the trial. Disease scoring occurred at 4 weeks post-inoculation [90].
BSR-Seq and Genetic Mapping: Resistant and susceptible bulks from F1 populations were subjected to RNA sequencing. Analysis of SNPs identified a major region on chromosome 9 associated with PM resistance, which was designated as the PM2 locus [90].
Functional Characterization: Histochemical analyses revealed that PM2-induced resistance is mediated by a highly localized hypersensitive response in the epidermal and mesophyll cells of infected leaves. This response involves accumulation of reactive oxygen species (ROS), particularly hydrogen peroxide (H₂O₂), leading to programmed cell death that restricts pathogen growth and sporulation [90] [91].

The following diagram illustrates the experimental workflow for PM2 locus identification:

Pepper Defense Mechanisms: NBS-LRR Diversity and Powdery Mildew Resistance

Genomic Landscape of NBS-LRR Genes in Pepper

Comprehensive analysis of the pepper (Capsicum annuum L.) genome has identified 252 NBS-LRR resistance genes distributed unevenly across all chromosomes, with 54% (136 genes) forming 47 physical gene clusters [22]. These clusters are primarily driven by tandem duplications and genomic rearrangements, highlighting the dynamic evolution of resistance genes in pepper.

Phylogenetic and structural analyses reveal a striking dominance of the non-TIR-NBS-LRR (nTNL) subfamily, which comprises 248 genes, over the TIR-NBS-LRR (TNL) subfamily, represented by only 4 genes [22]. This distribution reflects lineage-specific adaptations and evolutionary pressures. Structural characterization identified six conserved motifs within the NBS domain (P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL) that are essential for ATP/GTP binding and resistance signaling [22].

Table 2: Distribution and Classification of NBS-LRR Genes in Pepper

Chromosome	Total NBS-LRR Genes	Gene Clusters	Notable Features
Chr 3	38	10 (largest: 8 genes)	Highest gene density and cluster number
Chr 2, 6	5 each	0 (Chr 6)	Lowest gene count
Chr 12	Not specified	Not specified	Highest subfamily diversity (TN, NL, NN, NLN, N)
Chr 0 (unassigned)	31	Not specified	Exclusively NB-ARC genes
Total Genome	252	47 clusters	136 genes (54%) in clusters

Mapping Powdery Mildew Resistance in Pepper

While pepper is susceptible to various diseases including phytophthora blight and root-knot nematodes, powdery mildew caused by Leveillula taurica significantly impacts pepper development and growth [92]. Recent research has employed bulked segregant analysis combined with DNA re-sequencing (BSA-seq) to map resistance genes:

Population Development: An F₂ segregating population was constructed by crossing the highly resistant material "NuMex Suave Red" with the extremely susceptible material "c89" [92].
BSA-seq and QTL Mapping: BSA-seq analysis identified a major quantitative trait locus (QTL) located on chromosome 5 (7.20-11.75 Mb) associated with powdery mildew resistance [92].
Fine Mapping: Using InDel and KSAP molecular markers developed from the QTL region, researchers refined the candidate interval to 64.86 kb encompassing five genes [92].
Candidate Gene Identification: Among the genes in the mapped interval, the ubiquitin-conjugating enzyme E2 gene (Capana05g000392) showed significantly upregulated expression in multiple resistant materials. A critical single nucleotide polymorphism (SNP) at position 241 of the CDS sequence (A/G) results in an amino acid polymorphism (M/V) between susceptible and resistant parents, suggesting this gene as a robust potential factor against powdery mildew in pepper [92].

The following diagram illustrates the NBS-LRR mediated immune signaling pathway in plants:

Comparative Analysis: Evolutionary Insights and Breeding Implications

Evolutionary Dynamics of NBS-LRR Genes

The comparative analysis of NBS-LRR genes in cannabis and pepper reveals distinct evolutionary trajectories within the broader context of land plant evolution. Pepper demonstrates a remarkable expansion of nTNL genes (248 genes) with near-complete loss of TNL representatives (only 4 genes), while cannabis maintains functional TNL-type resistance genes as evidenced by the PM1 and PM2 loci [22] [90]. This pattern aligns with the observed differential evolution of NBS-LRR subfamilies across angiosperms, where some lineages show significant reduction or loss of specific subfamilies [41].

Both species exhibit clustered genomic arrangements of NBS-LRR genes, with pepper showing particularly high clustering (54% of genes in clusters). These clusters often arise from tandem duplications and generate reservoirs of genetic diversity for pathogen recognition [22]. The evolution of these gene families is driven by a combination of whole-genome duplications and small-scale duplications, including tandem, segmental, and transposon-mediated events [3].

Implications for Disease Resistance Breeding

The characterization of specific resistance loci and the comprehensive profiling of NBS-LRR gene families in cannabis and pepper provide valuable resources for molecular breeding programs:

Marker-Assisted Selection: The development of genetic markers for PM2 in cannabis and the identification of the Capana05g000392 polymorphism in pepper enable efficient tracking of resistance traits in breeding populations [92] [90].
Pyramiding Strategies: The identification of multiple resistance mechanisms (NLR-based and mlo-based) in cannabis allows for pyramiding different types of resistance genes to develop more durable resistance [89].
Engineering Broad-Spectrum Resistance: Understanding the structural and functional diversity of NBS-LRR genes facilitates engineering approaches, such as modifying LRR domains to alter recognition specificities [22] [36].
Harnessing Natural Diversity: The extensive natural variation in NBS-LRR genes across germplasm resources provides a foundation for identifying novel resistance specificities through eco-tilling and genome-wide association studies [22] [3].

Table 3: Research Reagent Solutions for Studying Powdery Mildew Resistance

Reagent/Resource	Function/Application	Example Use in Case Studies
BSR-Seq (Bulked Segregant RNA-Seq)	Identification and mapping of resistance loci by combining transcriptome data with genetic analysis	Mapping of PM2 locus in cannabis [90]
BSA-seq (Bulked Segregant Analysis with sequencing)	QTL mapping using DNA sequencing of pooled extremes from a segregating population	Identification of major QTL on chromosome 5 in pepper [92]
VIGS (Virus-Induced Gene Silencing)	Functional validation of candidate genes through targeted silencing	Validation of NBS-LRR gene function in tung tree and cotton [3] [38]
Genetic Markers (SNPs, InDels)	Tracking resistance alleles in breeding programs	SNP markers for PM2 introgression in cannabis [90]
HMMER Software	Identification of NBS-domain-containing genes in genome sequences	Comprehensive identification of NBS-LRR genes in pepper and tung tree [22] [38]
PRGminer	Deep learning-based prediction and classification of resistance genes	High-throughput identification of R genes in newly sequenced genomes [36]

The case studies of powdery mildew resistance in cannabis and pepper defense mechanisms exemplify the evolutionary innovation of NBS domain genes in plant immunity. While both species utilize NBS-LRR genes as central components of their defense arsenals, they exhibit distinct genomic distributions and evolutionary histories of these gene families. The characterization of specific resistance loci (PM1, PM2 in cannabis; Capana05g000392 in pepper) provides not only insights into molecular mechanisms of disease resistance but also practical tools for crop improvement. As research continues to unravel the complex evolutionary dynamics of NBS genes across land plants, these findings contribute to a broader understanding of plant-pathogen co-evolution and the development of sustainable disease management strategies through molecular breeding and genetic engineering.

Protein-Ligand Interactions and Effector Recognition Specificity

Plant immunity relies heavily on a sophisticated intracellular surveillance system mediated by nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins. These molecular sentinels detect pathogen-derived effector molecules and initiate robust defense responses, culminating in effector-triggered immunity (ETI). The central mechanism underlying this pathogen recognition lies in specific protein-ligand interactions between plant NBS-LRR receptors and their cognate pathogen effectors. Understanding the structural basis and specificity of these interactions provides crucial insights into plant defense mechanisms and evolutionary adaptation. Within the broader context of NBS domain gene evolution in land plants, these recognition interfaces represent dynamic evolutionary battlefields shaped by continuous host-pathogen co-evolution. This technical guide examines the molecular principles governing these specific interactions, the experimental methodologies for their characterization, and their evolutionary significance across plant lineages.

Structural Organization of NBS-LRR Proteins

NBS-LRR proteins constitute one of the largest and most diverse gene families in plants, with complex domain architecture that facilitates their recognition and signaling functions.

Core Domain Architecture

N-terminal Domain: Typically contains either a Toll/interleukin-1 receptor (TIR) domain or a coiled-coil (CC) motif, defining the major TNL and CNL subfamilies. These domains are involved in initiating downstream signaling cascades following activation [61] [93].
Nucleotide-Binding Site (NBS) Domain: Also called NB-ARC domain, this central region functions as a molecular switch, binding and hydrolyzing ATP to regulate protein activation states [61] [93]. It contains several conserved motifs including the P-loop, kinase-2, and RNBS-A through RNBS-D motifs [61].
Leucine-Rich Repeat (LRR) Domain: The C-terminal domain comprised of multiple tandem LRR units that form a solenoid structure with a parallel β-sheet lining the inner concave surface. This domain provides the primary recognition interface and exhibits the highest sequence diversity, with evidence of diversifying selection acting on solvent-exposed residues [61] [93].

Table 1: Major Domains of Plant NBS-LRR Proteins and Their Functions

Domain	Structural Features	Primary Functions	Conserved Motifs
N-terminal	TIR or CC configuration	Signaling initiation; protein-protein interactions	TIR motifs (TNLs); CC motif (CNLs)
NBS (NB-ARC)	STAND family ATPase	Molecular switch; nucleotide-dependent conformational changes	P-loop, kinase-2, RNBS-A, RNBS-B, RNBS-C, RNBS-D
LRR	Tandem repeats forming β-sheet/α-helical structure	Effector recognition; molecular binding	Variable solvent-exposed residues; diversifying selection

Genomic Diversity and Evolution

NBS-LRR genes are ancient in origin, found in non-vascular plants, gymnosperms, and angiosperms, with wide variation in copy number across species—from fewer than 100 to over 1,000 genes per genome [13] [61]. They evolve through a birth-and-death process characterized by frequent gene duplications, unequal crossing-over, and diversifying selection, particularly in the LRR region [61] [94]. Two evolutionary patterns are observed: type I genes evolve rapidly with frequent gene conversions, while type II genes evolve slowly with rare gene conversion events [13] [61].

Mechanisms of Effector Recognition

Plant NBS-LRR proteins employ distinct molecular strategies for pathogen detection, ranging from direct binding to indirect surveillance mechanisms.

Direct Recognition Models

Direct recognition involves physical interaction between the NBS-LRR protein and the pathogen effector. Key evidence comes from several well-characterized systems:

The rice NBS-LRR protein Pi-ta directly binds the corresponding Magnaporthe grisea effector AVR-Pita through its LRR domain, with specificity determined by complementary molecular surfaces [93].
Flax rust resistance proteins L5, L6, and L7 show direct physical interaction with specific variants of the fungal effector AvrL567 in yeast two-hybrid assays, recapitulating the recognition specificity observed in planta [93].
The atypical Arabidopsis TNL protein RRS1 binds the bacterial wilt pathogen protein PopP2 in split-ubiquitin yeast two-hybrid experiments, though this interaction alone may not be sufficient for activation [93].

In direct recognition, the LRR domain typically serves as the primary binding interface, with specificity determined by polymorphic residues in the solvent-exposed β-sheets. This creates a highly variable molecular surface capable of recognizing diverse pathogen ligands.

Indirect Recognition (Guard Hypothesis)

The guard hypothesis proposes that NBS-LRR proteins monitor ("guard") host cellular components that are modified by pathogen effectors. Key examples include:

RPM1-RIN4-AvrRpm1/AvrB System: The Arabidopsis NBS-LRR protein RPM1 guards the host protein RIN4, which is targeted by bacterial effectors AvrRpm1 and AvrB. These effectors phosphorylate RIN4, and RPM1 detects this modification, leading to defense activation [93].
RPS2-RIN4-AvrRpt2 System: The same guardee (RIN4) is cleaved by the bacterial cysteine protease AvrRpt2, and this modification is detected by a different NBS-LRR protein, RPS2 [93].
RPS5-PBS1-AvrPphB System: The NBS-LRR protein RPS5 guards the host kinase PBS1, which is cleaved by the bacterial protease AvrPphB. Cleavage of PBS1 triggers RPS5 activation [93].

This indirect mechanism allows plants to monitor a limited number of host targets while detecting multiple effectors that converge on the same cellular components, providing an efficient surveillance strategy.

Figure 1: Direct vs. Indirect Effector Recognition Mechanisms

Evolutionary Dynamics of Recognition Specificity

The evolutionary arms race between plants and pathogens has shaped the diversification of NBS-LRR genes and their recognition specificities across land plants.

Genomic and Phylogenetic Patterns

NBS-LRR genes exhibit distinctive evolutionary patterns across plant lineages:

Lineage-Specific Expansions: Different plant families show amplification of distinct NBS-LRR subfamilies. For example, specific expansions occur in Solanaceae, Asteraceae, and legumes, creating family-specific recognition repertoires [61] [94].
Differential Conservation: TNL-class genes are completely absent from cereal genomes, suggesting lineage-specific losses, while CNL genes are conserved across monocots and dicots [61].
Birth-and-Death Evolution: NBS-LRR genes undergo frequent duplication followed by functional diversification or pseudogenization, creating dynamic gene clusters with heterogeneous evolutionary rates [61].

Table 2: Evolutionary Patterns of NBS-LRR Genes Across Plant Lineages

Plant Group	NBS-LRR Repertoire	Distinctive Features	Evolutionary Mechanisms
Bryophytes (e.g., Physcomitrella patens)	~25 NLR genes	Minimal repertoire	Ancient origins; limited diversification
Cereals/Monocots	Hundreds of genes; highly variable	Absence of TNL subclass	Lineage-specific loss; CNL expansion
Dicots	Hundreds to thousands of genes	Both TNL and CNL subclasses	Birth-and-death evolution; tandem duplications
Tung Trees (Vernicia species)	90-149 genes	Domain loss events in susceptible species	Differential selection; promoter evolution

Regulatory Evolution

Plants have evolved sophisticated regulatory mechanisms to manage the expression of NBS-LRR genes, balancing effective defense with autoimmunity costs:

miRNA-Mediated Regulation: Diverse miRNA families (e.g., miR482/2118) target NBS-LRR transcripts, typically recognizing conserved coding sequences like the P-loop motif. This regulatory layer emerged in gymnosperms and is maintained across angiosperms [13].
Transcriptional Control: Transcription factors like WRKY64 regulate specific NBS-LRR genes, with promoter variations (e.g., W-box element polymorphisms) contributing to expression differences between resistant and susceptible genotypes [38].
Expression-Tuning Mechanisms: The requirement for precise NBS-LRR expression levels has driven the evolution of compensatory regulatory networks, potentially enabling the maintenance of large NBS-LRR repertoires without fitness costs [13] [3].

Experimental Methodologies for Studying Recognition

Characterizing protein-ligand interactions in NBS-LRR effector recognition requires multidisciplinary approaches and carefully controlled experiments.

Functional Complementation Assays

The pioneering study on the potato Rx NBS-LRR protein established a powerful framework for domain interaction analysis:

Experimental Workflow:

Domain Separation: Rx fragments (CC-NBS and LRR) were expressed as separate HA epitope-tagged constructs [95].
Agrobacterium-mediated transient expression in Nicotiana benthamiana leaves [95].
Functional Output Assessment: Hypersensitive response (HR) cell death upon PVX coat protein (CP) co-expression [95].
Interaction Validation: Co-immunoprecipitation of separated domains to confirm physical interactions [95].

Key Findings:

Co-expression of CC-NBS and LRR as separate molecules reconstituted CP-dependent HR, demonstrating functional complementation [95].
The LRR domain is required for activation even in constitutive signaling mutants, indicating its role beyond initial recognition [95].
Intramolecular interactions between domains were disrupted by effector recognition, suggesting a mechanism for activation [95].

Figure 2: Experimental Workflow for Rx Functional Complementation Studies

Interaction Mapping Techniques

Multiple biochemical and genetic approaches are employed to characterize recognition interfaces:

Yeast Two-Hybrid Systems: Used to demonstrate direct interactions between flax L proteins and AvrL567 effectors, and between RRS1 and PopP2 [93].
Split-Ubiquitin Systems: Complementary membrane-based system for detecting interactions with membrane-associated proteins [93].
Co-immunoprecipitation: Validates physical associations in plant systems, as used for Rx domain interactions and RIN4 complexes [95] [93].
Virus-Induced Gene Silencing (VIGS): Functional validation approach, as employed in tung trees to confirm VmNBS-LRR-mediated Fusarium wilt resistance [38].

Structural Analysis Methods

While no full-length plant NBS-LRR structures are available, several approaches provide structural insights:

Homology Modeling: Threading plant NBS domains onto crystal structures of mammalian STAND ATPases like APAF-1 [61].
Domain Swapping: Chimeric protein studies identify specificity determinants, particularly in the LRR region.
Site-Directed Mutagenesis: Targeted mutations in conserved motifs (e.g., P-loop) test functional requirements, as shown in Rx studies where P-loop mutations disrupted certain domain interactions [95].

Research Reagent Solutions Toolkit

Table 3: Essential Research Reagents and Resources for Studying NBS-LRR Recognition

Reagent/Resource	Specifications	Research Application	Example Implementation
HMMER Software	HMMER v3.1b2 with Pfam models (NBS: pfam00931)	Identification of NBS-domain containing genes from genomic data	Identification of 274 NBS-LRR genes in grass pea genome [96]
Agrobacterium Transformation System	Agrobacterium tumefaciens strains GV3101, LBA4404	Transient expression in N. benthamiana for functional assays	Rx domain complementation assays [95]
Epitope Tag Systems	HA, FLAG, GFP tags for protein detection	Protein localization, interaction studies, and immunoprecipitation	HA-tagged Rx domains for co-immunoprecipitation [95]
VIGS Vectors	TRV-based (Tobacco Rattle Virus) vectors	Functional validation through targeted gene silencing	Validation of Vm019719 function in tung tree Fusarium resistance [38]
Yeast Two-Hybrid System	GAL4-based or split-ubiquitin systems	Protein-protein interaction mapping	Direct interaction between L and AvrL567 proteins [93]
OrthoFinder Pipeline	OrthoFinder v2.5.1 with DIAMOND and MCL	Evolutionary analysis and orthogroup identification	Classification of 12,820 NBS genes into 168 architectural classes [3]

Protein-ligand interactions governing effector recognition specificity in plant NBS-LRR proteins represent a sophisticated molecular interface shaped by evolutionary arms races. The structural modularity of NBS-LRR proteins, with distinct signaling and recognition domains, enables both direct and indirect detection mechanisms while maintaining conserved activation pathways. The evolutionary dynamics of these genes—characterized by birth-and-death evolution, lineage-specific expansions, and regulatory adaptations—highlight their central role in plant-pathogen coevolution across land plants. Experimental approaches combining functional complementation, interaction mapping, and genomic analyses continue to reveal the intricate molecular logic underlying these recognition specificities. As research progresses, integrating structural biology with evolutionary genomics promises to unlock new strategies for engineering durable disease resistance in crop plants, informed by natural diversity and recognition mechanisms refined over millions of years of plant evolution.

Orthogroup Analysis Revealing Core and Species-Specific NBS Genes

The nucleotide-binding site (NBS) domain gene family represents a cornerstone of the plant immune system, encoding intracellular receptors responsible for pathogen recognition and activation of defense responses [3]. These genes, often characterized by their canonical NBS-leucine-rich repeat (LRR) architecture, play a pivotal role in effector-triggered immunity (ETI), enabling plants to detect pathogen effector proteins and initiate robust defense cascades [41]. The evolutionary dynamics of NBS genes, driven by duplication events and selective pressures, have resulted in substantial diversification across land plants, creating complex repertoires that underlie species-specific adaptation to pathogens [3].

Orthogroup analysis has emerged as a powerful computational framework for elucidating evolutionary relationships among genes across multiple species. By clustering genes into groups descended from a single gene in the last common ancestor of the species being considered, this approach enables systematic identification of core conserved genes and species-specific innovations [97] [34]. Applied to NBS domain genes, orthogroup analysis reveals fundamental insights into the evolutionary mechanisms that have shaped plant immunity from early land plants to modern angiosperms. This technical guide examines the methodology, findings, and implications of orthogroup analysis in decoding the complex evolutionary history of NBS genes across the plant kingdom.

Orthogroup Inference Methodologies

Computational Frameworks for Orthology Inference

Orthogroup inference represents a critical bioinformatics workflow for comparative genomics, with several sophisticated algorithms available for large-scale analyses. OrthoFinder has established itself as a benchmark tool, employing a comprehensive phylogenetic approach that infers orthogroups, gene trees, the rooted species tree, and gene duplication events [34]. The algorithm utilizes DIAMOND for rapid sequence similarity searches, followed by clustering with the MCL algorithm and phylogenetic analysis using DendroBLAST [3] [34]. Benchmarking through the Quest for Orthologs initiative has demonstrated OrthoFinder's superior accuracy in ortholog inference, outperforming other methods by 3-30% on standardized tests [34].

For projects involving hundreds or thousands of genomes, FastOMA provides a scalable alternative with linear time complexity, enabling analysis of thousands of eukaryotic genomes within 24 hours [98]. This method leverages OMAmer for k-mer-based placement of sequences into hierarchical orthologous groups (HOGs), followed by taxonomy-guided resolution of nested gene families [98]. While maintaining high precision comparable to OMA, FastOMA dramatically reduces computational requirements through innovative algorithms that avoid all-against-all sequence comparisons [98].

Visualization and interpretation of orthogroup results are facilitated by tools like OrthoBrowser, which creates interactive static websites for exploring phylogenies, gene trees, multiple sequence alignments, and synteny relationships [97]. This platform enhances accessibility to complex orthogroup datasets, enabling researchers to identify, interact with, and share information about gene families of interest without requiring advanced computational expertise [97].

Workflow for NBS Gene Identification and Classification

A standardized workflow for NBS gene orthogroup analysis encompasses multiple stages of data processing and validation (Figure 1). The initial step involves identification of NBS domain-containing genes across target species using Hidden Markov Model (HMM) searches with Pfam domain models (e.g., NB-ARC domain PF00931) at stringent e-value thresholds (e.g., 1.1e-50) [3]. Subsequent domain architecture classification categorizes genes into structural classes (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and identifies species-specific patterns through systematic domain annotation [3] [41].

Figure 1: Workflow for Orthogroup Analysis of NBS Genes

The classification system differentiates between typical NBS-LRR proteins (containing both N-terminal and LRR domains) and atypical forms with truncated architectures [41]. Genes are further subdivided based on N-terminal domain presence into TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL) subclasses [41] [22]. This comprehensive structural characterization provides the foundation for meaningful orthogroup inference and evolutionary interpretation.

Evolutionary Dynamics of NBS Genes Across Land Plants

Comparative Genomics Reveals Extensive Diversification

Comprehensive analysis of NBS genes across 34 plant species spanning evolutionary lineages from bryophytes to higher eudicots has identified 12,820 NBS-domain-containing genes, illuminating patterns of gene family expansion and structural diversification [3]. These genes distribute into 168 distinct domain architecture classes, encompassing both canonical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and numerous species-specific configurations (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS) [3]. The remarkable diversity in domain architecture underscores the dynamic evolutionary processes that have shaped NBS gene repertoires through domain shuffling, duplication, and functional innovation.

Orthogroup analysis of these sequences resolved 603 distinct orthogroups (OGs), comprising both core orthogroups (OG0, OG1, OG2) conserved across multiple lineages and unique orthogroups (OG80, OG82) restricted to specific species [3]. The prevalence of tandem duplications within these orthogroups highlights a key mechanism for NBS gene family expansion and adaptation to rapidly evolving pathogen pressures [3]. These findings align with similar patterns observed in taxon-specific studies, where pepper (Capsicum annuum) genomes revealed 252 NBS-LRR genes with 54% physically clustered in 47 genomic regions, and Salvia miltiorrhiza exhibited 196 NBS-domain-containing genes [41] [22].

Lineage-Specific Evolution of NBS Gene Subfamilies

Comparative analysis across diverse plant lineages reveals striking variation in NBS gene subfamily composition and evolutionary trajectories (Table 1). Eudicot species generally maintain both CNL and TNL subfamilies, though with substantial lineage-specific differences in relative proportions [41] [22]. Monocot species, including Oryza sativa and Triticum aestivum, exhibit complete absence of TNL genes, representing a major lineage-specific loss [41]. Gymnosperms such as Pinus taeda display contrasting patterns with dramatic expansion of TNL subfamilies, comprising 89.3% of typical NBS-LRR genes [41].

Table 1: Evolutionary Distribution of NBS-LRR Gene Subfamilies Across Plant Lineages

Plant Species/Lineage	Total NBS Genes	CNL Subfamily	TNL Subfamily	RNL Subfamily	Notable Evolutionary Patterns
Arabidopsis thaliana	207 [41]	~61% [41]	~36% [41]	~3% [41]	Balanced subfamily representation
Oryza sativa (Rice)	505 [41]	100% [41]	0% [41]	0% [41]	Complete loss of TNL and RNL
Salvia miltiorrhiza	196 [41]	75 CC-domain [41]	2 [41]	1 [41]	Marked reduction in TNL/RNL
Capsicum annuum	252 [22]	48 CC-domain [22]	4 [22]	1 RNL-like [22]	Dominance of nTNL (98.4%)
Pinus taeda	311 [41]	~10.7% [41]	~89.3% [41]	-	Dramatic TNL expansion
Early Land Plants	~25 [3]	Limited repertoire	Limited repertoire	Limited repertoire	Ancestral compact NLR repertoire

The medicinal plant Salvia miltiorrhiza exemplifies particularly extreme subfamily distribution, with only 2 TNL and 1 RNL genes identified among 196 NBS-domain-containing genes [41]. This pattern extends across the Salvia genus, with comparative analysis of five Salvia species revealing complete absence of TNL subfamily members and minimal RNL representation [41]. These findings suggest distinct evolutionary pressures in certain lineages that have driven the contraction or loss of specific NBS subfamilies, possibly compensated by expansion and functional diversification of remaining subfamilies.

Functional Characterization of Core and Species-Specific Orthogroups

Expression Profiling Under Biotic and Abiotic Stresses

Functional analysis of NBS orthogroups through transcriptomic profiling provides critical insights into their roles in plant stress responses. Examination of orthogroup expression patterns across different tissues and stress conditions has identified putative functional specialization among conserved orthogroups [3]. OG2, OG6, and OG15 demonstrate particular significance, showing upregulated expression in various tissues under diverse biotic and abiotic stresses in cotton species with contrasting susceptibility to cotton leaf curl disease (CLCuD) [3].

These expression patterns suggest that core orthogroups may represent fundamental components of plant immune signaling networks, while species-specific orthogroups potentially contribute to specialized adaptation to lineage-specific pathogen pressures. Integration of expression data with orthogroup classification enables prioritization of candidate genes for functional validation and elucidates how evolutionary conservation correlates with functional importance in plant immunity.

Genetic Variation and Protein Interaction Networks

Comparative analysis of genetic variation between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions revealed substantial differences in NBS gene sequences, with 6,583 unique variants in the tolerant genotype compared to 5,173 in the susceptible line [3]. This disparity in genetic variation suggests potential mechanisms underlying resistance specificity and highlights the contribution of species-specific NBS genes to pathogen recognition capabilities.

Protein-ligand and protein-protein interaction studies further demonstrated strong binding affinity between putative NBS proteins and ADP/ATP, consistent with the nucleotide-binding function of the NBS domain [3]. Importantly, interaction assays also identified specific associations between NBS proteins and core components of the cotton leaf curl disease virus, providing mechanistic insights into recognition specificity and resistance protein function [3].

Experimental Validation and Methodological Framework

Functional Validation Through Virus-Induced Gene Silencing

Direct experimental validation of orthogroup functional predictions represents a critical step in establishing gene function. Virus-induced gene silencing (VIGS) of GaNBS (OG2) in resistant cotton demonstrated its essential role in viral pathogen response, with silenced plants showing increased virus titers and compromised resistance [3]. This functional confirmation establishes OG2 as a bona fide resistance orthogroup with conserved function in plant defense against viral pathogens.

The integration of orthogroup analysis with functional validation provides a powerful framework for prioritizing candidate genes for detailed mechanistic studies. This approach efficiently bridges computational predictions with experimental confirmation, accelerating the identification of evolutionarily conserved resistance genes with potential applications in crop improvement programs.

The Researcher's Toolkit for NBS Orthogroup Analysis

Table 2: Essential Computational Tools and Reagents for NBS Orthogroup Analysis

Tool/Reagent	Category	Primary Function	Application in NBS Analysis
OrthoFinder [34]	Software	Phylogenetic orthology inference	Core orthogroup identification across species
FastOMA [98]	Software	Scalable orthology inference	Large-scale analyses (1000+ genomes)
OMAmer [98]	Software	k-mer-based sequence placement	Rapid homology detection and grouping
OrthoBrowser [97]	Software	Results visualization and exploration	Interactive orthogroup data exploration
Pfam HMM Models [3]	Database	Protein domain annotation	NBS domain identification (NB-ARC domain)
DIAMOND [3]	Software	Sequence similarity search	Rapid all-against-all sequence comparison
VIGS Vectors [3]	Experimental	Virus-induced gene silencing	Functional validation of NBS gene function
RNA-seq Libraries [3]	Experimental	Transcriptome profiling	Expression analysis of NBS orthogroups

Orthogroup analysis has fundamentally advanced our understanding of NBS gene evolution in land plants, revealing both conserved principles and lineage-specific innovations in plant immune receptor repertoires. The identification of core orthogroups underscores the evolutionary conservation of essential immune signaling components, while species-specific orthogroups highlight the dynamic adaptation of plant immunity to diverse pathogen pressures. The integration of computational orthology inference with experimental validation provides a powerful framework for deciphering the complex evolutionary history and functional diversity of NBS genes.

Future research directions will benefit from the expanding genomic resources for non-model plant species and continued refinement of orthology inference methods capable of processing thousands of genomes. The application of structural phylogenomics, integrating protein structure prediction with orthogroup analysis, promises to enhance resolution of deep evolutionary relationships among NBS genes. Furthermore, leveraging orthogroup classifications to inform comparative functional studies across species will illuminate how evolutionary conservation and divergence translate to immune receptor function, ultimately enabling strategic manipulation of NBS genes for crop improvement and sustainable agriculture.

Conclusion

The evolution of NBS domain genes represents a remarkable story of plant adaptation, characterized by dynamic expansion, diversification, and sophisticated regulatory mechanisms. From their origins in early land plants to their specialized functions in modern crops, these genes have evolved through tandem duplications, whole-genome events, and lineage-specific adaptations, including the notable loss of TNL subfamilies in monocots. The development of advanced computational tools, particularly deep learning approaches, has revolutionized our ability to identify and classify these complex genes, while functional studies continue to reveal their crucial roles in pathogen recognition and defense signaling. The intricate balance plants maintain between robust immunity and the fitness costs of NBS gene expression, often mediated by miRNAs, highlights the sophistication of this evolutionary arms race. For biomedical and clinical research, understanding the molecular mechanisms of plant NBS genes offers unexpected insights into innate immunity principles that may inform human immune receptor studies. Future directions should focus on harnessing this knowledge for developing novel disease resistance strategies in crops, exploring potential applications in synthetic biology, and further investigating the conserved evolutionary principles that may bridge plant and animal immunity systems.