Decoding the Plant Immune Repertoire: A Comprehensive Guide to NBS Gene Domain Architecture Patterns

Jacob Howard Nov 27, 2025 496

This article synthesizes current knowledge on the domain architecture of plant Nucleotide-Binding Site (NBS) genes, the largest class of disease resistance (R) genes.

Decoding the Plant Immune Repertoire: A Comprehensive Guide to NBS Gene Domain Architecture Patterns

Abstract

This article synthesizes current knowledge on the domain architecture of plant Nucleotide-Binding Site (NBS) genes, the largest class of disease resistance (R) genes. We explore the foundational principles of NBS domain organization, from classical TNL and CNL structures to the discovery of 168 distinct architectural classes encompassing significant diversity across plant species. The review details state-of-the-art methodologies for identifying and characterizing these genes, including deep learning tools like PRGminer, and addresses common challenges in annotation and analysis. Furthermore, we present comparative evolutionary analyses that reveal patterns of gene family expansion, loss, and diversification, and examine functional validation techniques that link specific architectures to disease resistance phenotypes. This resource is tailored for researchers and scientists in plant pathology and genetics, providing a structured framework to understand and exploit NBS gene diversity for crop improvement.

The Structural Blueprint of Plant Immunity: Unveiling Classical and Novel NBS Domain Architectures

The nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4 (NB-ARC) domain is a critical functional module in plant disease resistance (R) proteins, which are fundamental components of the plant innate immune system. Most R proteins implicated in pathogen recognition through gene-for-gene relationships belong to the nucleotide-binding site leucine-rich repeat (NBS-LRR) family, with the NB-ARC domain serving as their central molecular switch [1]. This domain is characterized by its role as a functional ATPase domain that binds and hydrolyzes ATP, a process thought to regulate the activation status of R proteins and subsequent initiation of defense signaling cascades [2] [1]. The NB-ARC domain's significance is underscored by its presence in one of the largest gene families in plants, with genomes encoding hundreds of such proteins—approximately 150 in Arabidopsis thaliana, over 400 in Oryza sativa (rice), and an estimated 1,700 potential NBS-encoding sequences in wheat [3] [1].

Structurally, the NB-ARC domain consists of three subdomains: NB, ARC1, and ARC2 [2]. This domain belongs to the STAND (signal transduction ATPases with numerous domains) family of ATPases, which function as molecular switches in disease signaling pathways across kingdoms [1]. The NB-ARC domain is evolutionarily conserved in plants and exhibits similarity to mammalian NOD-LRR proteins, though these similarities likely result from convergent evolution rather than shared ancestry [1]. In plants, NBS-LRR proteins can be divided into two major subfamilies based on their N-terminal domains: those with Toll/interleukin-1 receptor (TIR) domains (TNLs) and those with coiled-coil (CC) domains (CNLs). Notably, TNLs are completely absent from cereal genomes, indicating lineage-specific evolution of these immune receptors [1].

Structural Organization and Conserved Motifs

The NB-ARC domain exhibits a conserved structural organization characterized by an ordered series of motifs that facilitate nucleotide binding and hydrolysis. Motif analysis across diverse plant species, including Triticeae crops, has confirmed the general structural organization of the NBS domain in cereals, characterized by the presence of six commonly conserved motifs: P-loop, RNBS-A, Kinase-2, Kinase-3a, RNBS-C, and GLPL [3]. Research has revealed the existence of at least 11 distinct distribution patterns of these motifs along the NBS domain, indicating both conserved core architecture and evolutionary diversification [3].

The table below summarizes the key conserved motifs in the NB-ARC domain, their consensus sequences, and their functional roles:

Table 1: Core Conserved Motifs of the NB-ARC Domain

Motif Name	Consensus Sequence	Structural Position	Primary Function
P-loop	G-x(4)-GK-[TS]	NB subdomain	Phosphate binding; nucleotide coordination [4] [5]
RNBS-A	Not specified	NB subdomain	Conserved motif; role in nucleotide binding [3] [1]
Kinase-2	hhhhDE	NB subdomain	Magnesium ion coordination; ATP hydrolysis [3] [5]
Kinase-3a	Not specified	ARC1 subdomain	Conserved motif; structural stability [3]
RNBS-C	Not specified	ARC2 subdomain	Subfamily-specific; distinguishes TNLs/CNLs [3] [1]
GLPL	Gly-Leu-Pro-Leu	ARC2 subdomain	Structural motif; potential role in domain interactions [3]
MHD	Met-His-Asp	C-terminus of ARC2	Regulatory control; co-ordination of nucleotide state [2]

The P-loop (Walker A motif) represents a glycine-rich sequence that forms a phosphate-binding loop, with a conserved lysine residue that is crucial for nucleotide binding [5]. The Kinase-2 (Walker B motif) contains conserved aspartate and glutamate residues that coordinate magnesium ions and are essential for ATP hydrolysis [5]. The MHD motif located at the carboxy-terminus of the ARC2 subdomain fulfills a critical regulatory function, analogous to the sensor II motif in AAA+ proteins, by coordinating the nucleotide and controlling subdomain interactions [2].

Diagram: Structural Organization of the NB-ARC Domain

Molecular Function and Signaling Mechanism

The NB-ARC domain functions as a molecular switch that regulates R protein activity through nucleotide-dependent conformational changes. Specific binding and hydrolysis of ATP has been experimentally demonstrated for the NBS domains of tomato CNL proteins I2 and Mi-1, confirming their functional as ATPases [1]. In the proposed mechanistic model, the NB-ARC domain exists in an auto-inhibited ADP-bound state in the absence of pathogen effectors. Upon pathogen recognition, often through direct or indirect detection of pathogen effectors by the LRR domain, nucleotide exchange occurs (ADP to ATP), triggering conformational changes that activate downstream signaling [2] [1].

The MHD motif plays a particularly crucial role in regulating this molecular switch. Extensive mutational analysis of the MHD motif in the R proteins I-2 and Mi-1 has identified several autoactivating mutations of the invariant histidine and conserved aspartate residues [2]. When combined with autoactivating hydrolysis mutants in the NB subdomain, these mutations show non-additive effects, indicating the MHD motif's central regulatory role in controlling R protein activity [2]. Three-dimensional modeling of the NB-ARC domain based on the APAF-1 template structure suggests that the MHD motif fulfills a function analogous to the sensor II motif in AAA+ proteins, coordinating the nucleotide and controlling subdomain interactions [2].

Recent evidence also indicates that oligomerization represents a critical step in NBS-LRR protein signaling, as demonstrated by the oligomerization of tobacco N protein (a TNL) in response to pathogen elicitors [1]. This oligomerization mirrors signaling mechanisms in mammalian NOD proteins and suggests a conserved activation mechanism across STAND ATPases.

Diagram: NB-ARC Domain Molecular Switch Mechanism

Experimental Analysis Methodologies

Database Mining and Sequence Identification

Experimental characterization of NB-ARC domains begins with comprehensive identification of NBS-encoding genes from genomic and transcriptomic resources. A representative methodology involves:

Primary Search Using PSI-BLAST: Researchers typically select a known NBS domain sequence as a query to construct a Position Specific Scoring Matrix (PSSM). For example, one study used the core NBS domain of the Lr21 protein from wheat (GenBank: ACO53397), which confers resistance to leaf rust, comprising 176 amino acids extending from the GSGKTTFA motif to the RSPIAA motif [3].
Data Source Integration: Sequence data are mined from multiple sources including protein annotations in GenBank and EST databases. The DFCI Gene Indices (formerly TIGR Gene Indices), which contain clustered and assembled ESTs and cDNA sequences, serve as valuable resources for identifying expressed NBS domains [3].
Motif Validation: Identified sequences are analyzed for the presence of characteristic NB-ARC motifs (P-loop, RNBS-A, Kinase-2, Kinase-3a, RNBS-C, GLPL) using motif analysis tools. This confirms the structural integrity of identified domains and reveals variant motif distribution patterns [3].

Structural-Functional Analysis Through Mutagenesis

Structure-function relationships in the NB-ARC domain are primarily elucidated through targeted mutagenesis:

Site-Directed Mutagenesis: Critical residues in conserved motifs (e.g., the invariant histidine and aspartate in the MHD motif) are systematically mutated to assess their impact on protein function [2].
Phenotypic Characterization: Mutant proteins are tested for autoactivation phenotypes in plant systems. Autoactivating mutations often trigger defense responses in the absence of pathogens, indicating disruption of the regulatory mechanism [2].
Biochemical Assays: The ATPase activity of wild-type and mutant NB-ARC domains is quantified through enzymatic assays measuring ATP hydrolysis. This confirms the nucleotide dependence of the domain [1].
Structural Modeling: Three-dimensional models of the NB-ARC domain are constructed using homologous structures as templates (e.g., APAF-1), providing a framework for interpreting mutational data and formulating hypotheses about mechanism [2].

Diagram: Experimental Workflow for NB-ARC Domain Analysis

Research Reagent Solutions

The table below outlines essential research reagents and resources for experimental investigation of NB-ARC domains:

Table 2: Essential Research Reagents for NB-ARC Domain Studies

Reagent/Resource	Specifications	Research Application
PRGdb	Plant Resistance Gene database	Source of known R-gene sequences for query design and comparative analysis [3]
DFCI Gene Indices	Tentative Contigs (TCs) and singletons from EST clustering	Identification of expressed NBS-encoding sequences without full genome sequencing [3]
PSI-BLAST	Position-Specific Iterative BLAST algorithm with PSSM	Sensitive identification of divergent NBS-encoding sequences in databases [3]
MEME Suite	Motif discovery and analysis tools (e.g., MEME)	Identification of conserved motifs in NBS domains; 8 conserved NBS motifs identified in Arabidopsis [1]
APAF-1 Structure	PDB ID: 1Z6T or other APAF-1 structures	Template for homology modeling of plant NB-ARC domains [2]
I-2 and Mi-1 Genes	Tomato CNL proteins with demonstrated ATPase activity	Model systems for structure-function analysis of NB-ARC domains [2] [1]

The NB-ARC domain represents a versatile molecular switch platform that has evolved in plants to support pathogen recognition and immune signaling. Its conserved core structure—comprising the P-loop, RNBS-A, Kinase-2, Kinase-3a, RNBS-C, GLPL, and MHD motifs—provides the structural framework for nucleotide-dependent regulation while allowing evolutionary diversification through sequence variation and motif distribution patterns. The mechanistic model of the NB-ARC as a nucleotide-dependent molecular switch, regulated by the MHD motif and capable of oligomerization, provides a foundation for understanding how plant immune proteins transition from resting to active states. Future research elucidating the precise structural changes associated with nucleotide exchange and hydrolysis will further refine this model and potentially enable engineering of disease resistance proteins with enhanced recognition capabilities.

Plant nucleotide-binding site (NBS) genes constitute one of the largest and most critical gene families encoding disease resistance (R) proteins, which serve as essential components of the plant immune system. These genes are characterized by their distinctive domain architecture patterns, which determine their function in pathogen recognition and defense signaling. The central NBS domain (NB-ARC) is a conserved feature that binds nucleotides and facilitates molecular switching during immune activation. Through extensive genome-wide studies across diverse plant species, researchers have identified major architectural classes within this gene family, primarily categorized based on their N-terminal and C-terminal domain configurations. Understanding these domain architecture patterns provides crucial insights into the evolution of plant immune systems and enables the development of disease-resistant crop varieties through molecular breeding approaches [6] [7].

Classification and Domain Architecture of NBS Genes

Major Architectural Classes

Plant NBS-encoding genes are systematically classified based on their specific domain compositions and arrangements. The major classes include CNL, TNL, RNL, and NL, each defined by characteristic N-terminal domains and the presence or absence of C-terminal leucine-rich repeats (LRRs). These architectural patterns represent functional specializations within the plant immune system, with different classes playing distinct roles in pathogen recognition and defense signaling [6] [8].

CNL (Coiled-Coil NBS-LRR): This class features an N-terminal coiled-coil (CC) domain, a central NBS (NB-ARC) domain, and a C-terminal LRR domain. The CC domain is involved in protein-protein interactions and signaling initiation. CNLs are universally present in both monocots and dicots and represent one of the most abundant NBS classes across plant species [6] [8].

TNL (Toll-Interleukin-1 Receptor NBS-LRR): TNL proteins contain an N-terminal TIR (Toll-Interleukin-1 Receptor) domain, a central NBS domain, and a C-terminal LRR domain. The TIR domain possesses enzymatic activity involved in defense signaling. Notably, TNL genes are absent in monocots but present in dicots, representing a significant evolutionary divergence in immune receptor repertoires [6] [9].

RNL (RPW8 NBS-LRR): This class is characterized by an N-terminal RPW8 (Resistance to Powdery Mildew 8) domain, followed by NBS and LRR domains. RNLs often function as helper proteins in cell death signaling and are generally less numerous than CNLs or TNLs, typically numbering in the single digits per genome [7] [8].

NL (NBS-LRR): NL proteins contain the NBS and LRR domains but lack distinctive N-terminal domains like CC, TIR, or RPW8. This class represents a significant portion of the NBS gene repertoire in many plant species and may represent ancestral forms or products of domain loss through evolution [6] [10].

Table 1: Distribution of NBS Gene Architectural Classes in Selected Plant Species

Plant Species	CNL	TNL	RNL	NL	Total NBS Genes	Reference
Helianthus annuus (Sunflower)	100	77	13	162	352	[6]
Hordeum vulgare (Barley)	14 (CC-NBS), 6 (CC-NBS-LRR)	0	Not specified	53 (NBS-LRR), 25 (NBS)	96	[10]
Asparagus officinalis (Garden asparagus)	Majority	0	Few	Included in total	27	[8]
Dendrobium officinale	10	0	Not specified	Included in total	74	[9]

Irregular and Non-Canonical Types

Beyond the major classes, plants possess various irregular NBS architectures resulting from domain losses, combinations with novel domains, or extensive diversification. These include:

Truncated Forms: Proteins lacking complete domains, such as those missing LRR domains (CN, TN types) or containing only the NBS domain (N type) [11] [8].
Species-Specific Architectures: Unique domain combinations identified in comprehensive analyses, such as TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, and Sugar_tr-NBS patterns [7].
Domain Fusion Proteins: NBS genes combined with integrated domains that may function as decoys or sensors for pathogen effectors [10].

Table 2: Conserved Motifs in NBS Domain and Their Functions

Motif Name	Consensus Sequence	Function	Location
P-loop	GMGGIGKTT	ATP/GTP binding	NBS domain
Kinase-2	LVLDDVW	Hydrolysis activity	NBS domain
RNBS-A	FDLxLKxR	Signaling regulation	NBS domain
GLPL	GxPLLxLK	Structural stability	NBS domain
MHD	MHDIV	Molecular switch	NBS domain
RNBS-D	CFAL	Unknown	NBS domain

Experimental Protocols for NBS Gene Identification

Genome-Wide Identification Pipeline

The standard workflow for comprehensive identification of NBS genes involves multiple bioinformatic steps and validation procedures:

Step 1: Sequence Retrieval

Obtain complete genome sequences and annotation files from relevant databases (Phytozome, NCBI, Plaza, or species-specific genome portals) [6] [8].

Step 2: HMM Profiling

Perform Hidden Markov Model searches using the NB-ARC domain (PF00931) as query against all predicted protein sequences.
Apply stringent E-value cutoff (1e-5 to 1e-10) to identify candidate NBS-encoding genes [6] [8].

Step 3: Domain Architecture Analysis

Validate candidate genes using InterProScan, NCBI's Batch CD-Search, or SMART database.
Classify genes into architectural classes based on presence/absence of CC, TIR, RPW8, and LRR domains [8].

Step 4: Additional Validation

Conduct local BLASTp searches against known NBS reference sequences from model plants.
Confirm NBS-specific conserved motifs (P-loop, Kinase-2, GLPL, MHD) using MEME suite or similar tools [8].

Figure 1: Workflow for Genome-Wide Identification of NBS Genes

Phylogenetic and Evolutionary Analysis

Orthogroup Analysis

Use OrthoFinder v2.5+ with DIAMOND for sequence similarity searches and MCL for clustering.
Identify core orthogroups (e.g., OG0, OG1, OG2) and species-specific orthogroups [7].

Selection Pressure Analysis

Calculate non-synonymous to synonymous substitution rates (dN/dS) using PAML or similar packages.
Identify sites under positive selection using MEME, FEL, or REL methods [10].

Gene Cluster Identification

Map NBS genes to chromosomes and identify clusters as genomic regions with ≥2 NBS genes within 200 kb.
Analyze tandem duplication events using BEDTools with distance threshold ≤8 intervening genes [6] [8].

Genomic Distribution and Evolution

Chromosomal Organization and Gene Clusters

NBS genes typically display non-random distribution patterns across plant genomes, with strong tendencies toward clustering in specific chromosomal regions. In sunflower (Helianthus annuus), NBS genes were located on all 17 chromosomes, forming 75 distinct gene clusters, with one-third particularly concentrated on chromosome 13 [6]. Similarly, in barley (Hordeum vulgare), 50% of NBS genes were located on chromosomes 7H, 2H, and 3H, preferentially distributed in distal telomeric regions [10]. These clustering patterns reflect the evolutionary history of NBS gene expansion through local duplication events.

Gene duplication mechanisms play crucial roles in NBS gene family expansion. Tandem duplication represents a primary mechanism, evidenced by the identification of 9 tandem clusters containing 22.35% of barley NBS genes [10]. Segmental duplication also contributes significantly, particularly in polyploid species like soybean [10]. The dynamic birth-and-death evolution of NBS genes, characterized by repeated cycles of duplication, divergence, and eventual pseudogenization or deletion, enables plants to rapidly adapt to changing pathogen spectra [10].

Evolutionary Patterns Across Plant Lineages

Comparative genomic analyses reveal distinctive evolutionary trajectories for different NBS architectural classes across plant lineages. CNLs and RNLs diverged prior to the separation of Rosid I and Rosid II lineages of angiosperms, with both clades remaining as sister groups in plant families like Fabaceae and Brassicaceae [6]. TNLs show species-specific nesting patterns, while CNLs exhibit clade-specific nesting, with RNLs nested within the CNL-A clade [6].

A significant evolutionary pattern concerns the distribution of TNL genes, which are absent in monocots but present in dicots [9]. This absence in monocots, including grasses and orchids, may be potentially driven by NRG1/SAG101 pathway deficiency [9]. Recent studies in orchids have revealed substantial NBS gene degeneration, including type changes and NB-ARC domain degeneration, as a major driver of NBS gene diversity [9].

Figure 2: Evolutionary Pathways of NBS Gene Classes

Expression Profiles and Functional Validation

Expression Patterns and Regulation

NBS genes exhibit complex expression patterns characterized by functional divergence with basal level tissue-specific expression [6]. Comprehensive transcriptomic analyses reveal that different NBS architectural classes show distinct expression profiles across tissues, developmental stages, and in response to various biotic and abiotic stresses [7] [10].

In barley, 87 out of 96 identified NBS genes were supported by expression evidence, displaying various and quantitatively uneven expression patterns across distinct tissues, organs, and development stages [10]. Expression profiling in cotton identified putative upregulation of specific orthogroups (OG2, OG6, and OG15) in different tissues under various biotic and abiotic stresses in plants with varying susceptibility to cotton leaf curl disease [7].

MicroRNA regulation represents another important layer of NBS gene expression control. Studies in barley identified 14 potential miRNA-R gene target pairs, providing insight into the post-transcriptional regulation of NBS genes [10]. This regulatory mechanism may enable plants to maintain extensive NLR repertoires without exhausting functional NLR loci, potentially offsetting fitness costs associated with NLR maintenance [7].

Functional Characterization and Validation

Virus-Induced Gene Silencing (VIGS)

Silencing of GaNBS (OG2) in resistant cotton demonstrated its putative role in virus tittering, confirming functional importance in disease resistance [7].

Salicylic Acid Response Experiments

Treatment of Dendrobium officinale with salicylic acid identified 1,677 differentially expressed genes, including six NBS-LRR genes that were significantly up-regulated [9].
Weighted Gene Co-expression Network Analysis (WGCNA) revealed that Dof020138 was closely related to pathogen identification pathways, MAPK signaling pathways, plant hormone signal transduction pathways, and biosynthetic pathways [9].

Protein Interaction Studies

Protein-ligand and protein-protein interaction analyses demonstrated strong interaction of putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus [7].
Genetic variation analyses between susceptible and tolerant cotton accessions identified several unique variants in NBS genes, with Mac7 (tolerant) exhibiting 6583 variants and Coker312 (susceptible) showing 5173 variants [7].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents and Resources for NBS Gene Studies

Reagent/Resource	Function/Application	Example Sources/References
NB-ARC HMM Profile (PF00931)	Identification of NBS domains in protein sequences	Pfam Database
InterProScan	Domain architecture analysis and classification	EMBL-EBI
OrthoFinder v2.5+	Orthogroup analysis and evolutionary relationships	[7]
MEME Suite	Identification of conserved protein motifs	[8]
PlantCARE	Prediction of cis-acting regulatory elements	[8]
Phytozome/JGI	Genome databases for multiple plant species	[6]
PRGdb 4.0	Curated database of plant resistance genes	[8]
NCBI Batch CD-Search	Domain identification and classification	[8]
WoLF PSORT	Subcellular localization prediction	[8]
TBtools	Integrative toolkit for biological data analysis	[8]

The systematic classification of NBS genes into major architectural classes (CNL, TNL, RNL, NL) and irregular types provides a critical framework for understanding plant immunity mechanisms. These domain architecture patterns reflect both conserved evolutionary relationships and species-specific adaptations to pathogen pressures. The development of standardized experimental protocols for NBS gene identification, coupled with comprehensive databases and analytical tools, has enabled researchers to explore this complex gene family across diverse plant species. Future research focusing on functional characterization of irregular NBS types and comparative analyses across wider phylogenetic ranges will further enhance our understanding of plant disease resistance mechanisms and facilitate the development of durable disease resistance in crop species.

A landmark comparative genomic study has fundamentally expanded our understanding of plant immune system diversity through the discovery of 168 distinct domain architecture classes in nucleotide-binding site (NBS) domain genes across 34 plant species. This unprecedented diversity, encompassing both canonical resistance genes and numerous previously unknown structural configurations, reveals the remarkable evolutionary plasticity of plant immune receptors. The research provides a comprehensive framework for understanding how plants have evolved complex defense mechanisms through domain rearrangements, duplications, and functional innovations. This architectural expansion has significant implications for developing sustainable crop resistance strategies and offers new avenues for engineering broad-spectrum disease resistance in agricultural systems.

Plant immunity relies on a sophisticated surveillance system capable of detecting diverse pathogens through specialized receptor proteins. The nucleotide-binding site (NBS) domain genes represent one of the largest and most important families of plant resistance (R) genes, encoding intracellular proteins responsible for pathogen recognition and defense activation. These proteins function as key initiators of effector-triggered immunity (ETI), the second layer of plant innate immunity that provides strain-specific resistance [12] [13].

Plant NBS-LRR proteins are structurally similar to mammalian NOD-like receptors (NLRs) but likely evolved through convergent evolution [12]. They typically contain a central NBS domain responsible for nucleotide binding and ATP hydrolysis, flanked by variable N-terminal and C-terminal domains. The N-terminal domains generally fall into two major classes: Toll/interleukin-1 receptor (TIR) domains or coiled-coil (CC) motifs, defining the TNL and CNL subfamilies respectively [12]. The C-terminal region most commonly contains leucine-rich repeats (LRRs) involved in pathogen recognition [12].

Until recently, research focused primarily on canonical NBS-LRR architectures, but emerging evidence suggests substantial architectural diversity exists beyond these standard configurations. This review examines the groundbreaking discovery of 168 domain architecture classes and its implications for understanding plant immunity mechanisms and evolution.

Methodology: Genome-Wide Discovery of Domain Architectures

Comparative Genomic Framework and Species Selection

The identification of 168 domain architecture classes resulted from a systematic analysis of 12,820 NBS-domain-containing genes across 34 plant species representing diverse evolutionary lineages from mosses to monocots and dicots [14]. This phylogenetic breadth enabled researchers to trace the evolutionary trajectories of NBS genes across land plant history.

Table 1: Key Methodological Components for Domain Architecture Discovery

Method Component	Implementation	Primary Function
Domain Prediction	Pfam domain analysis with hidden Markov models	Identification of protein domains within sequences
Architecture Classification	Pattern recognition of linear domain arrangements	Categorization of proteins based on domain combinations
Orthogroup Analysis	OrthoMCL clustering algorithm	Grouping evolutionarily related genes across species
Expression Profiling	RNA-seq analysis of different tissues under stress conditions	Linking gene architecture to functional expression patterns
Genetic Variation Analysis	Variant calling between susceptible and tolerant accessions	Connecting structural diversity to functional outcomes

Domain Identification and Architectural Classification

Protein domains, defined as structural, functional, and evolutionary units that can fold independently, were identified using hidden Markov model profiles from the Pfam database [15]. The "domain architecture" refers to the specific linear arrangement of domain(s) within individual proteins. Researchers categorized architectures based on:

Single-domain architectures (containing only NBS domains)
Multi-domain architectures (combining NBS with other domains)
Species-specific structural patterns
Evolutionarily conserved classical patterns

The 168 architecture classes emerged from systematic classification of all possible domain combinations observed across the 12,820 identified NBS-containing genes [14].

Experimental Validation Workflow

Beyond bioinformatic prediction, the study employed multiple experimental approaches to validate the functional significance of discovered architectures:

Results: The Expansive Landscape of NBS Domain Architectures

The discovery of 168 domain architecture classes represents a quantum leap in understanding plant immune receptor diversity. Among the 12,820 NBS-domain-containing genes identified, researchers observed both expected classical patterns and surprising novel configurations:

Table 2: Classification of NBS Domain Architecture Classes

Architecture Category	Examples	Evolutionary Significance
Classical Architectures	NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR	Evolutionarily conserved across multiple plant lineages
Species-Specific Patterns	TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS	Recent evolutionary innovations potentially adapted to specific pathogen pressures
Integrated Domain Architectures	WRKY-integrated NLRs, HMA-integrated NLRs	Domain fusions creating "integrated decoys" for pathogen effector recognition
Degenerate Architectures	NBS proteins lacking LRR domains	Functional specialization through domain loss

The research identified 603 orthogroups (OGs) with both core orthogroups (common across multiple species) and unique orthogroups (highly species-specific) [14]. Tandem duplications appeared as a major driver of this architectural diversification, particularly in expanding specific resistance gene families.

Non-Canonical Architectures and Integrated Domains

Beyond classical NBS-LRR configurations, the study revealed numerous non-canonical architectures with significant functional implications. These included integrated domain architectures (NLR-IDs) where NBS proteins have fused with additional domains that serve as "baits" for pathogen-derived effector proteins [13].

The WRKY domain integrated into the Arabidopsis RRS1 NLR protein represents one such example, where the integrated domain mimics the authentic host targets of pathogen effectors [13]. Similarly, rice RGA5 and Pik-1 proteins contain integrated heavy metal-associated (HMA) domains that directly bind effector proteins from Magnaporthe oryzae [13]. These integrated domains effectively create molecular traps that detect pathogen manipulation of host cellular machinery.

Evolutionary Dynamics of Domain Architectures

The research demonstrated that domain architecture diversity has been maintained beyond a core set of universal components present in all plant genomes. Approximately 65% of plant domain architectures are universally conserved across plant lineages, while the remaining architectures show lineage-specific distributions [15]. This pattern suggests both functional conservation of essential immune components and continuous innovation through lineage-specific adaptations.

Whole genome duplications have significantly contributed to architectural expansion by providing genetic material for domain rearrangements and functional diversification [15]. The data show a progressive, lineage-wise expansion of domain architectures during plant evolution, largely explained by changes in nuclear ploidy resulting from rounds of whole genome duplication [15].

Functional Implications of Architectural Diversity

Expression Patterns and Stress Responses

Expression profiling revealed distinct regulation patterns for different orthogroups under various biotic and abiotic stresses. Specifically, orthogroups OG2, OG6, and OG15 showed putative upregulation in different tissues under various stress conditions in both susceptible and tolerant plants to cotton leaf curl disease (CLCuD) [14]. This expression specificity suggests that architectural differences correspond to functional specialization in pathogen recognition and defense signaling.

The research further connected architectural variation to expression responses through salicylic acid (SA) treatment experiments in Dendrobium officinale, which identified significant upregulation of six NBS-LRR genes, with one gene (Dof020138) showing particular importance in multiple defense-related pathways [9].

Genetic Variation and Disease Resistance

Comparative analysis between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions revealed substantial genetic variation in NBS genes. The tolerant Mac7 accession contained 6,583 unique variants in NBS genes, while the susceptible Coker312 contained 5,173 variants [14]. This correlation between architectural diversity and resistance phenotypes suggests that structural variation in NBS genes directly contributes to disease resistance capabilities.

Protein-ligand and protein-protein interaction studies further demonstrated strong interactions between putative NBS proteins and ADP/ATP, as well as different core proteins of the cotton leaf curl disease virus [14], providing mechanistic explanations for the observed resistance differences.

Functional Validation through Genetic Manipulation

Virus-induced gene silencing (VIGS) of GaNBS (OG2) in resistant cotton demonstrated its putative role in virus tittering, providing direct experimental evidence for the functional importance of this specific architectural class [14]. This functional validation confirmed that the identified architectural diversity corresponds to meaningful functional differences in plant immunity.

Research Applications and Practical Implementation

The Scientist's Toolkit: Essential Research Reagents and Methods

Table 3: Key Research Reagent Solutions for Domain Architecture Studies

Research Reagent/Method	Function/Application	Experimental Context
Pfam Domain Prediction	Identification of protein domains using hidden Markov models	Genome-wide annotation of domain architectures across species
OrthoMCL Clustering	Grouping evolutionarily related genes into orthogroups	Comparative analysis of gene families across multiple species
Virus-Induced Gene Silencing (VIGS)	Transient gene silencing for functional validation	Testing role of specific NBS genes in disease resistance
RNA-seq Expression Profiling	Transcriptome analysis under stress conditions	Linking gene architecture to expression patterns and function
Protein-Ligand Interaction Assays	Measuring binding interactions with nucleotides and pathogen proteins	Validating mechanistic functions of architectural variants
Whole Genome Sequencing	Identifying genetic variants in resistant vs susceptible accessions	Connecting structural variation to functional differences

Experimental Design Considerations

For researchers investigating NBS domain architectures, several methodological considerations emerge from this study:

Phylogenetic Scope: Including species representing diverse evolutionary lineages enables distinguishing conserved versus lineage-specific architectural innovations.
Functional Validation: Bioinformatic predictions require experimental validation through approaches like VIGS, protein interaction assays, and expression analysis.
Integration of Omics Data: Combining genomic, transcriptomic, and proteomic data provides a comprehensive view of architecture-function relationships.

The following diagram illustrates a recommended experimental workflow for characterizing novel domain architectures:

The discovery of 168 domain architecture classes in plant NBS genes represents a paradigm shift in our understanding of plant immune receptor diversity. This architectural expansion demonstrates the remarkable evolutionary plasticity of plant genomes in generating structural innovation for pathogen recognition. The findings reveal that plants have evolved far more complex and diverse immune recognition capabilities than previously appreciated.

Future research directions should focus on:

Elucidating the specific recognition mechanisms of novel architectural classes
Engineering synthetic NBS genes with custom architectures for broad-spectrum resistance
Exploring architectural diversity in neglected crop species and wild relatives
Integrating architectural data with pathogen effectoromics to predict recognition specificities

This expanded canon of domain architectures provides both a new conceptual framework for understanding plant immunity and practical resources for developing durable disease resistance in agricultural systems. The continuing exploration of this architectural diversity will undoubtedly yield new insights into plant-pathogen coevolution and innovative strategies for crop protection.

The plant immune system relies heavily on a diverse and complex family of genes known as nucleotide-binding site leucine-rich repeat (NBS-LRR or NLR) genes. These genes encode intracellular receptors that recognize pathogen effectors and initiate effector-triggered immunity (ETI), a robust defense response often accompanied by programmed cell death [16] [12]. The domain architecture of NLR proteins—the specific combination and arrangement of functional domains—is fundamental to their function and varies significantly across plant lineages. This in-depth technical guide examines the distinct domain architecture patterns of NLR genes in cereals (monocots), dicots, and orchids, framing these patterns within the broader context of plant evolution and adaptation. Understanding these species-specific architectures is crucial for researchers and scientists aiming to harness natural resistance mechanisms for crop improvement and disease control.

Comparative Analysis of NLR Domain Architectures Across Species

The canonical structure of an NLR protein includes a conserved nucleotide-binding site (NBS or NB-ARC) domain, a C-terminal leucine-rich repeat (LRR) domain, and a variable N-terminal domain. The N-terminal domain is the primary basis for classifying NLRs into major subfamilies: TNL (Toll/Interleukin-1 Receptor domain), CNL (Coiled-Coil domain), and RNL (Resistance to Powdery Mildew 8 domain) [17] [12]. TNL and CNL proteins typically function as pathogen sensors, while RNL proteins often act as helpers in downstream signaling cascades [17]. The proliferation and retention of these subfamilies have followed markedly different trajectories in various plant lineages.

Table 1: Summary of NLR Gene Family Composition in Selected Plant Species

Species	Family/Type	Total NLRs Identified	CNL	TNL	RNL	Key Architectural Notes	Citation
Oryza sativa (Rice)	Cereal / Monocot	505	Pre dominant	0	Limited	Complete absence of TNL subfamily.	[16]
Zea mays (Maize)	Cereal / Monocot	Not Specified	Pre dominant	0	Limited	Complete absence of TNL subfamily.	[16]
Dioscorea rotundata (Yam)	Monocot	167	166	0	1	Complete absence of TNL; 74% of CNLs are partial (NL, CN, or N-only).	[18]
Arabidopsis thaliana	Dicot	150-207	~100	~62	~8	Balanced presence of all three subfamilies.	[16] [12]
Fragaria spp. (Strawberry)	Rosaceae / Dicot	Varies by species	>50% (Non-TNL)	<50%	Included in Non-TNL	Non-TNLs (CNLs & RNLs) constitute over half the repertoire.	[17]
Salvia miltiorrhiza	Lamiaceae / Dicot	196 (62 typical)	61	2 (TIR)	1	Marked reduction/relictual TNL and RNL subfamilies.	[16]
Dendrobium officinale	Orchid / Monocot	74	10 (CNL)	0	N/A	Complete absence of TNL; majority of NBS genes are non-NBS-LRR subclass.	[9]

Architectural Patterns in Cereals and Monocots

Monocot species, including major cereals like rice (Oryza sativa) and maize (Zea mays), exhibit a striking and consistent architectural pattern: the complete absence of TNL genes [16] [18] [12]. This loss is considered a defining evolutionary event in the monocot lineage. The NLR repertoire in these plants is dominated by CNL-type genes. For instance, in white Guinea yam (Dioscorea rotundata), another monocot, 166 of the 167 identified NLR genes were CNLs, with only a single RNL gene and no TNLs [18]. Furthermore, a significant proportion of these CNLs are "atypical," meaning they lack one or more canonical domains. In D. rotundata, only 64 of the 166 CNLs possess a complete CC-NBS-LRR architecture, while others are classified as NL (NBS-LRR, missing CC), CN (CC-NBS), or N (NBS-only) [18]. This suggests a dynamic evolutionary process involving domain loss and gene degeneration in monocots.

Architectural Patterns in Dicots

Dicots generally possess a more diverse NLR architecture, containing members of all three subfamilies (TNL, CNL, RNL). However, significant variation exists among families. The model dicot Arabidopsis thaliana has a balanced complement of approximately 100 CNLs and 62 TNLs, along with several RNLs [16] [12]. In contrast, other dicot families show distinct patterns of subfamily expansion and contraction.

Salvia miltiorrhiza (Lamiaceae): A dramatic reduction of the TNL and RNL subfamilies is observed. Among 62 typical NLRs, only 2 possess a TIR domain and just 1 is an RNL, with the remaining 61 being CNLs [16]. This indicates a lineage-specific degeneration.
Fragaria spp. (Rosaceae): In wild strawberries, non-TNL genes (a category encompassing both CNLs and RNLs) constitute over 50% of the NLR family, outnumbering TNLs in all eight diploid species studied [17]. This suggests an evolutionary trajectory favoring the expansion of non-TNL types within this genus.

Unique Architectural Patterns in Orchids

Orchids, as monocots, share the characteristic complete absence of TNL genes observed in other monocot species [9]. Phylogenetic analysis of CNL-type genes in orchids like Dendrobium officinale, D. nobile, and D. chrysotoxum reveals that they are classified into a limited number of branches and show significant degeneration of the NB-ARC domain [9]. A prominent feature in orchids is the high proportion of NBS genes that belong to the "non-NBS-LRR" subclass, meaning they lack the LRR domain entirely [9]. This widespread domain loss highlights a unique evolutionary path for NLR genes in the Orchidaceae family.

Detailed Experimental Methodologies for NLR Gene Identification and Analysis

The study of NLR gene families relies on a suite of bioinformatic and molecular biology techniques. Below is a detailed protocol for genome-wide identification and initial characterization, synthesized from multiple studies [16] [19] [9].

Genome-Wide Identification and Domain Classification

Data Retrieval: Download the complete genomic sequences, protein sequences, and corresponding annotation files (GFF3/GTF) for the target species from public databases such as Phytozome, NCBI, or specialized genome portals.
HMMER Search:
- Obtain the Hidden Markov Model (HMM) profile for the NBS (NB-ARC) domain (Pfam: PF00931).
- Use the hmmsearch command from the HMMER package (v3.3) to scan the proteome of the target species. A typical E-value cutoff is < 1 x 10^-4 [19] [17].
- Command example: hmmsearch -E 1e-4 --domE 1e-4 Pfam_NB-ARC.hmm target_proteome.fa > hmmsearch_results.out
BLASTP Search (Supplementary):
- To capture divergent sequences that may be missed by HMM, perform a BLASTP search using a curated set of known NBS domain sequences as a query against the target proteome.
- Command example: blastp -query known_nbs.fa -db target_proteome.fa -evalue 1e-2 -outfmt 6 -out blastp_results.out [19].
Consolidation and Verification:
- Combine the results from HMMER and BLASTP, removing redundant entries.
- Subject all candidate sequences to domain verification using tools like hmmscan (against the full Pfam-A database) or NCBI's CD-search to confirm the presence of the NBS domain.
Subclassification:
- Identify N-terminal and C-terminal domains to classify genes into TNL, CNL, and RNL subfamilies.
- TIR Domain: Use HMMER with PF01582.
- RPW8 Domain: Use HMMER with PF05659.
- Coiled-Coil (CC) Domain: Predict using the COILS algorithm or MARCOIL with a probability threshold > 0.1 [17].
- LRR Domain: Use HMMER with relevant profiles (e.g., PF00560, PF07723, PF07725, PF12799, PF13516, PF13855, PF14580) [17].
Validation: Manually check domain architecture using SMART and InterProScan to ensure accuracy.

Phylogenetic and Evolutionary Analysis

Sequence Alignment:
- Extract the amino acid sequences of the NBS domains from all identified NLR genes.
- Perform multiple sequence alignment using MAFFT (v7) or ClustalW with default parameters.
- Trim the alignment to remove poorly aligned regions using TrimAl.
Phylogenetic Tree Construction:
- Construct a Maximum Likelihood (ML) phylogenetic tree using IQ-TREE (v1.6.12).
- Use ModelFinder within IQ-TREE to select the best-fit model of amino acid substitution (e.g., JTT, WAG, LG).
- Run with 1000 ultrafast bootstrap (UFBoot) replicates to assess branch support [19] [17].
- Visualize the final tree using iTOL (Interactive Tree of Life).
Analysis of Gene Duplication:
- Use MCScanX to identify tandem and segmental duplication events. Genes located within 200-250 kb of each other with no more than 8 intervening non-NLR genes are typically considered tandem duplicates [17] [18].
- Visualize syntenic relationships and gene clusters using TBtools.

Figure 1: A workflow for the genome-wide identification and evolutionary analysis of plant NLR genes.

The Scientist's Toolkit: Key Research Reagent Solutions

Table 2: Essential Reagents and Tools for NLR Gene Research

Reagent / Tool	Function / Application	Technical Notes
HMMER Suite	Identifies protein domains using Hidden Markov Models.	Core tool for initial NLR identification with Pfam NB-ARC (PF00931) profile.
Pfam Database	Curated collection of protein domain families.	Source of HMM profiles for NBS, TIR, LRR, and RPW8 domains.
MAFFT	Multiple sequence alignment software.	Creates accurate alignments of NBS domains for phylogenetic analysis.
IQ-TREE	Efficient software for maximum likelihood phylogenetics.	Infers evolutionary relationships with model selection and branch support.
MCScanX	Analyzes gene collinearity and duplication events.	Identifies tandem and segmental duplications driving NLR expansion.
TBtools	Integrative toolkit for biological data analysis.	User-friendly platform for visualization, synteny analysis, and charting.
Salicylic Acid (SA)	Plant hormone and defense signaling molecule.	Used in treatments to validate NLR gene induction in ETI response [9].
Virus-Induced Gene Silencing (VIGS)	Functional characterization through gene knockdown.	Validates the role of specific NLRs in pathogen resistance [7].

Visualizing NLR Signaling and Regulatory Pathways

The core function of NLRs is to initiate immune signaling upon pathogen perception. The following diagram summarizes the key pathways, integrating knowledge across the cited studies.

Figure 2: Simplified NLR-mediated immune signaling and regulatory network. Sensor TNLs and CNLs recognize pathogen effectors, often leading to the activation of helper RNLs, which amplify the defense signal. This cascade culminates in the hypersensitive response and systemic immunity. The expression of NLRs is fine-tuned by miRNAs, which target NLR transcripts for cleavage to prevent autoimmunity and reduce fitness costs [20].

Nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes represent the largest and most important class of disease resistance (R) genes in plants, enabling recognition of diverse pathogens and triggering robust immune responses [16] [21]. These genes encode intracellular proteins that perceive pathogen-secreted effectors through a sophisticated domain architecture, initiating effector-triggered immunity (ETI) often accompanied by a hypersensitive response [16] [22]. Understanding the evolutionary history and structural diversification of NBS-LRR genes provides crucial insights into plant immunity mechanisms and informs strategies for developing disease-resistant crops. This review synthesizes current knowledge on the deep evolutionary origins of NBS-LRR genes within the green lineage and examines the patterns of domain architecture that have emerged through plant evolution, offering a foundation for comparative genomics and functional studies in plant immunity.

Deep Evolutionary Origins in the Green Lineage

The NBS-LRR gene family originated in the common ancestor of the entire green lineage, with fundamental diversification occurring before the separation of green algae and land plants [23]. Phylogenetic analyses indicate that the NBS-LRR family rapidly diverged into three major subclasses with distinct domain combinations—TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), and RNL (RPW8-NBS-LRR)—prior to the split of green algae, demonstrating the ancient foundation of this crucial immune component [23].

This early origin is particularly remarkable given the extensive morphological and physiological differences between green algae and vascular plants. The conservation of NBS-LRR genes across this evolutionary divide highlights the fundamental importance of intracellular pathogen recognition in plant evolution. The maintenance of these complex genetic architectures over hundreds of millions of years suggests they provided a critical selective advantage despite the significant metabolic cost of maintaining large gene families [16].

Table 1: Evolutionary Distribution of NBS-LRR Subclasses Across Plant Lineages

Plant Group	Species	CNL	TNL	RNL	Total NBS-LRR Genes	Key Evolutionary Notes
Green Algae	Ancient ancestor	Present	Present	Present	Unknown	Origin before lineage separation
Monocots	Oryza sativa (rice)	Present	Absent	Absent	275-505	Complete TNL loss [16] [9]
Eudicots	Arabidopsis thaliana	Present	Present	Present	101-207	All three subclasses maintained
Solanaceae	Solanum melongena (eggplant)	231	36	2	269	All subclasses present [21]
Medicinal Plants	Salvia miltiorrhiza	61	2 (reduced)	1	196	Marked TNL and RNL reduction [16]
Orchids (Monocots)	Dendrobium officinale	10	Absent	Unknown	74 (22 with LRR)	TNL absence consistent with monocots [9]

Domain Architecture and Classification

The protein structure of NBS-LRR genes follows a modular architecture with three core components: a variable N-terminal domain, a central nucleotide-binding site (NBS) domain, and a C-terminal leucine-rich repeat (LRR) domain [16] [21] [22]. The N-terminal domain determines the primary classification into three major subfamilies: TNL (containing Toll/Interleukin-1 receptor domain), CNL (containing coiled-coil domain), and RNL (containing RPW8 domain) [21] [24].

The NBS domain, also referred to as NB-ARC, is approximately 300 amino acids and contains strictly ordered motifs including P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL, which function in ATP/GTP binding and hydrolysis [22] [24]. This domain serves as a molecular switch for immune signaling, transitioning between ADP-bound (inactive) and ATP-bound (active) states upon pathogen perception [25]. The LRR domain consists of 20-30 amino acid repeats that facilitate protein-protein interactions and are primarily responsible for pathogen recognition specificity [22] [24]. The remarkable diversity of LRR domains enables plants to recognize a vast array of taxonomically unrelated pathogens, including viruses, bacteria, fungi, and insects [22].

Table 2: Domain Architecture Classification of NBS-LRR Genes

Classification	N-terminal Domain	Central Domain	C-terminal Domain	Function in Immunity	Representative Examples
TNL	TIR (Toll/Interleukin-1 Receptor)	NBS (NB-ARC)	LRR	Pathogen recognition, signal transduction	Arabidopsis RPS4, tobacco N gene [25] [26]
CNL	CC (Coiled-Coil)	NBS (NB-ARC)	LRR	Pathogen recognition, hypersensitive response	Arabidopsis RPS2, RPS5 [16] [26]
RNL	RPW8 (Resistance to Powdery Mildew 8)	NBS (NB-ARC)	LRR	Signal transduction, downstream defense	Arabidopsis ADR1 [16]
N	None	NBS (NB-ARC)	None	Regulatory functions	Various species [25] [24]
NL	None	NBS (NB-ARC)	LRR	Pathogen recognition	Various species [25] [24]

Methodologies for NBS-LRR Gene Identification and Analysis

Genome-Wide Identification Pipeline

The standard bioinformatics pipeline for identifying NBS-LRR genes across plant genomes employs a Hidden Markov Model (HMM)-based approach using the NB-ARC domain (PF00931) from the Pfam database as a query [16] [21] [22]. The typical workflow begins with HMMER software (HMMER3) using an expectation value (E-value) cutoff of < 10⁻²⁰ for initial identification, followed by construction of a species-specific HMM profile to capture more divergent family members with an E-value threshold < 0.01 [21] [22]. Candidate genes are subsequently verified through domain analysis using SMART, CDD, and Pfam databases to confirm the presence of characteristic NBS-LRR domains and remove false positives such as kinase-domain proteins [25] [22].

Structural and Phylogenetic Analysis

Following identification, structural characterization involves motif prediction using MEME suite with default parameters (motif count typically set to 10), domain architecture determination, and gene structure analysis using GFF3 annotation files visualized with tools such as TBtools [25] [21]. Phylogenetic analysis employs multiple sequence alignment using ClustalW or MAFFT, followed by tree construction via Maximum Likelihood methods in MEGA software with bootstrap validation (typically 1000 replicates) [25] [22]. Chromosomal distribution and cluster analysis identify tandem duplication events, with clusters typically defined as containing ≥2 NBS-LRR genes within 200 kb [21] [24].

Evolutionary Trajectories and Lineage-Specific Adaptations

Differential Loss and Expansion Across Plant Lineages

The evolutionary history of NBS-LRR genes is characterized by significant lineage-specific gains and losses, particularly affecting the TNL subclass. Comprehensive genomic analyses reveal that monocots, including cereals (rice, wheat, maize) and orchids, have completely lost TNL genes, while maintaining CNL and occasionally RNL subclasses [16] [9]. This pattern is exemplified in rice genomes, which contain 275-505 NBS-LRR genes exclusively from the CNL subclass [16]. In contrast, most eudicots retain both TNL and CNL subfamilies, though with considerable variation in relative proportions [21] [24].

Beyond the monocot-dicot divergence, additional lineage-specific patterns have emerged. In the medicinal plant Salvia miltiorrhiza, a dramatic reduction in TNL and RNL subfamilies was observed, with only 2 TNL and 1 RNL members identified alongside 61 CNL genes [16]. Similarly, in tung trees (Vernicia spp.), V. fordii possesses no TNL genes, while its resistant counterpart V. montana retains 3 TNL genes, suggesting potential functional significance [27]. These distribution patterns reflect both evolutionary constraints and adaptive specializations to different pathogen pressures.

Mechanisms of Genomic Diversification

The NBS-LRR gene family exhibits dynamic evolution primarily driven by tandem duplication events and genomic rearrangements [21] [24]. Comparative genomic analyses across diverse species consistently show that NBS-LRR genes are frequently organized in clusters, with 54-63% of genes residing in such arrangements [21] [22] [24]. These clusters are predominantly homogeneous, containing genes derived from recent common ancestors, though heterogeneous clusters with phylogenetically distant members also occur [22].

Tandem duplication facilitates the generation of new recognition specificities through sequence divergence and domain shuffling, enabling plants to adapt to rapidly evolving pathogens. This mechanism is evidenced by the strong correlation between cluster locations and regions of local duplication observed in pepper, eggplant, and common bean genomes [21] [24] [26]. The LRR domain, in particular, evolves rapidly through positive selection, altering recognition specificities while maintaining the structural framework for protein-protein interactions [22].

Experimental Reagents and Research Tools

Table 3: Essential Research Reagents and Tools for NBS-LRR Gene Analysis

Reagent/Tool	Category	Specific Function	Application Example
HMMER Suite	Bioinformatics	Hidden Markov Model search	Identify NBS domains in genome sequences [16] [22]
PF00931 (NB-ARC)	Database Resource	Conserved domain model	Query for initial gene identification [25] [21]
MEME Suite	Bioinformatics	Motif discovery and analysis	Identify conserved NBS motifs (P-loop, kinase-2, etc.) [25]
ClustalW	Bioinformatics	Multiple sequence alignment	Align NBS domains for phylogenetic analysis [25] [22]
MEGA Software	Bioinformatics	Phylogenetic tree construction	Evolutionary relationship inference [25] [22]
TBtools	Bioinformatics	Genomic data visualization	Gene structure, chromosomal distribution [25] [21]
VIGS System	Functional Analysis	Virus-induced gene silencing	Functional validation of candidate NBS-LRR genes [27]

The evolutionary foundation of NBS-LRR genes traces back to the common ancestor of the green lineage, with subsequent diversification shaped by lineage-specific adaptations, differential subfamily expansion and contraction, and dynamic genomic reorganization. The conserved yet flexible domain architecture of these genes has enabled plants to recognize rapidly evolving pathogens across hundreds of millions of years of evolution. Future research integrating comparative genomics, functional characterization, and evolutionary analysis will further elucidate how this critical gene family continues to drive plant immunity and adaptation. The methodological framework and evolutionary insights presented here provide a foundation for such investigations, with implications for crop improvement and sustainable agriculture.

From Sequence to Function: Advanced Methods for Mining and Profiling NBS Genes

The study of domain architecture patterns in plant Nucleotide-Binding Site (NBS) genes represents a critical frontier in understanding plant immunity mechanisms. NBS domain genes form one of the largest superfamilies of plant resistance (R) genes, playing pivotal roles in pathogen recognition and defense activation [7] [1]. These genes exhibit remarkable structural diversity, with classical architectures including NBS, NBS-LRR, TIR-NBS, and TIR-NBS-LRR, alongside numerous species-specific structural patterns [7]. The functional characterization of these genes relies heavily on accurate domain annotation, making robust bioinformatic pipelines essential for researchers investigating plant disease resistance, evolutionary biology, and molecular breeding strategies.

The significance of domain analysis extends beyond mere identification to understanding the evolutionary dynamics and functional specialization of plant immune receptors. Studies across diverse species including cotton, tung trees, pepper, and Salvia have revealed substantial variation in NBS gene family sizes, architectures, and subfamily distributions [7] [27] [28]. These differences reflect lineage-specific adaptations and evolutionary pressures, with tandem duplications serving as a major driver of family expansion and diversification [7] [29]. Within this context, bioinformatic tools including HMMER, PfamScan, and SMART provide the methodological foundation for systematic domain annotation, enabling researchers to decipher the complex genomic organization of plant NBS genes and their role in disease resistance mechanisms.

Protein Domains and Their Functional Significance

Protein domains represent structurally and functionally distinct units within proteins that often evolve as independent modules. In the context of plant NBS genes, domains constitute the building blocks of complex immune receptors, with specific domains conferring specialized functions. The NBS domain itself contains several conserved motifs—including the P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, and GLPL motifs—that are essential for nucleotide binding and hydrolysis [1] [29]. Flanking domains such as the Toll/Interleukin-1 Receptor (TIR), Coiled-Coil (CC), and Leucine-Rich Repeat (LRR) domains contribute to signaling, protein-protein interactions, and pathogen recognition specificity [1] [30].

The evolutionary conservation of these domains enables researchers to identify related genes across species through domain-based homology searches. However, the modular nature of protein evolution also means that domains can be rearranged in different combinations, creating diverse architectural patterns with potentially novel functions. This is particularly evident in plant NBS genes, where researchers have identified both classical domain architectures and numerous species-specific combinations [7]. Understanding these architectural patterns provides insights into gene function, evolutionary relationships, and mechanisms of pathogen recognition.

Key Databases for Domain Annotation

Table 1: Major Domain Databases for Plant NBS Gene Analysis

Database	Primary Focus	Key Features	Relevance to NBS Research
Pfam	Protein families and domains	Hidden Markov Models (HMMs) for domain detection; regularly updated	Contains curated HMMs for NBS, TIR, CC, and LRR domains essential for NBS gene identification [7]
InterPro	Integrated resource	Consolidates multiple databases including Pfam, SMART, and PROSITE	Provides comprehensive domain annotations and functional predictions for NBS proteins [28] [31]
SMART	Signaling domain proteins	Emphasis on signaling domains; genomic context visualization	Identifies signaling domains in NBS-LRR proteins and analyzes domain architectures [32] [31]
CDART	Domain architecture	Finds proteins with similar domain architectures	Identifies evolutionarily related NBS proteins through domain architecture similarity [31]

These databases employ complementary approaches to domain annotation, with Pfam utilizing Hidden Markov Models (HMMs) derived from multiple sequence alignments, SMART focusing on signaling domains with specialized detection algorithms, and InterPro providing an integrated view by combining predictions from multiple source databases [31]. For plant NBS gene research, this integrated approach is particularly valuable due to the diversity of domain architectures and the challenge of accurately identifying related genes across species.

Core Methodologies: HMMER, PfamScan, and SMART

HMMER for Domain Detection

The HMMER tool suite implements profile Hidden Markov Models for sensitive sequence database searches and domain detection. In plant NBS gene research, HMMER serves as a fundamental tool for identifying genes containing NBS domains and associated domains such as TIR, CC, and LRR. The typical workflow involves searching protein or nucleotide sequences against pre-built HMM profiles from databases like Pfam using commands such as hmmsearch or hmmscan.

The key advantage of HMMER lies in its statistical framework and sensitivity for detecting distant homologs, which is particularly important for plant NBS genes that exhibit substantial sequence divergence while maintaining conserved domain structures. Studies across multiple plant species have employed HMMER for initial identification of NBS-encoding genes, typically using the NB-ARC domain (PF00931) as the primary search model [27] [28]. The statistical significance of hits is evaluated using E-values, with stricter thresholds (e.g., 1.1e-50) applied to minimize false positives in genome-wide analyses [7].

PfamScan for Comprehensive Domain Annotation

PfamScan is a specific implementation that utilizes HMMER to search sequences against the Pfam database. It provides a standardized approach for identifying Pfam domains in protein sequences and is frequently used in plant NBS gene studies for systematic domain annotation. The typical command-line invocation uses the PfamScan.pl script with the Pfam-A.hmm model database to scan query sequences [7].

In practice, researchers apply PfamScan to identify not only the core NBS domain but also associated domains that define NBS gene subfamilies. For example, the presence of TIR domains (PF01582) distinguishes TNL-type genes, while CC domains help identify CNL-type genes [27] [1]. The domain architecture information derived from PfamScan results enables classification of NBS genes into structural categories and identification of novel architectural patterns that may suggest functional specialization.

SMART for Signaling Domain Analysis

The SMART database (Simple Modular Architecture Research Tool) specializes in the identification and annotation of signaling domains, providing complementary functionality to Pfam for plant NBS gene analysis. SMART integrates multiple detection methods including its own HMM-based domain database, Pfam domains, signal peptide prediction, and internal repeat detection [32] [31].

For NBS gene researchers, SMART offers several distinct advantages: specialized focus on signaling domains relevant to immune receptors, visualization of domain architectures, and identification of additional features such as low-complexity regions and coiled-coil domains that may not be fully captured by Pfam alone [31]. The web interface allows interactive exploration of domain organizations, while programmatic access supports large-scale analyses. Comparative studies have demonstrated that SMART and Pfam may yield slightly different domain boundaries and annotations, highlighting the value of using multiple tools for comprehensive domain characterization [31].

Integrated Bioinformatics Pipeline for Plant NBS Genes

Workflow for Domain-Centric Analysis of Plant NBS Genes

The following diagram illustrates a comprehensive bioinformatics pipeline for analyzing domain architecture patterns in plant NBS genes, integrating HMMER, PfamScan, and SMART methodologies:

Diagram 1: Bioinformatics pipeline for plant NBS gene analysis with domain annotation

This integrated workflow begins with genomic or transcriptomic data as input and progresses through sequential domain analysis steps. The initial HMMER search identifies sequences containing the conserved NB-ARC domain, establishing a candidate NBS gene set. Subsequent PfamScan and SMART analyses provide comprehensive domain annotations, enabling classification of genes into architectural categories such as TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), RNL (RPW8-NBS-LRR), and various truncated forms [27] [28] [29]. Downstream analyses leverage this domain architecture information for evolutionary studies, expression profiling, and functional characterization.

Experimental Protocol for Genome-Wide NBS Gene Identification

A typical experimental protocol for genome-wide identification and domain analysis of NBS genes follows these key steps:

Data Collection and Preparation: Obtain genome assemblies and corresponding annotation files for the target species from public databases (e.g., NCBI, Phytozome, Plaza) [7]. For transcriptomic analyses, retrieve RNA-seq data from relevant databases such as the IPF database, CottonFGD, or NCBI BioProjects [7].
HMM-Based NBS Gene Identification: Use HMMER to search all predicted protein sequences against the NB-ARC domain profile (PF00931). Apply an appropriate E-value threshold (e.g., 1.1e-50) to ensure high-confidence hits while maintaining sensitivity [7]. Convert nucleotide sequences to amino acid sequences if working with genomic regions.
Comprehensive Domain Annotation: Process the candidate NBS genes through PfamScan using the full Pfam-A.hmm database to identify all associated domains. Complement this with SMART analysis to detect signaling domains and structural features that may be missed by Pfam alone [31].
Domain Architecture Classification: Classify genes based on their domain compositions using a standardized classification system [7] [27]. Common categories include:
- CNL: CC-NBS-LRR
- TNL: TIR-NBS-LRR
- RNL: RPW8-NBS-LRR
- CN: CC-NBS
- TN: TIR-NBS
- NL: NBS-LRR
- N: NBS-only
Validation and Manual Curation: Address the challenge of misannotation in automated pipelines by validating predictions through manual inspection, comparison with expressed sequence data, and application of specialized tools like NLRSeek [33] or HRP [34] that are designed specifically for resistance gene annotation.
Downstream Analyses: Utilize the domain architecture information for phylogenetic analysis, identification of orthogroups, assessment of evolutionary dynamics (e.g., tandem duplications), and integration with expression data to identify candidate genes involved in specific disease resistance responses [7] [27].

Advanced Applications in Plant NBS Gene Research

Addressing Annotation Challenges in Plant NBS Genes

Standard genome annotation pipelines frequently misannotate or incompletely capture NBS-LRR genes due to their complex genomic organization, low expression levels, and sequence similarity to repetitive elements [33] [34]. This has led to the development of specialized tools and approaches that complement the standard HMMER/PfamScan/SMART workflow:

Table 2: Specialized Methods for Plant NBS Gene Annotation

Method	Approach	Advantages	Application Examples
NLRSeek	Genome reannotation-based pipeline	Identifies previously missed NLR genes; particularly effective for non-model species	Identified 33.8%-127.5% more NLR genes in yam species compared to conventional methods [33]
HRP (Homology-based R-gene Prediction)	Two-level homology search using full-length R-genes	Better recovers full-length NB-LRR gene models; effective for allele mining	Identified 45 more NB-LRR genes in tomato than RenSeq method; discovered new Fom-2 homologs in Cucurbita [34]
RGAugury	Automated pipeline for R-gene analog prediction	Integrates multiple domain-based searches; classifies RGAs into different families	Provides comprehensive RGA annotation across multiple plant species [34]

These specialized approaches address specific limitations of standard annotation pipelines, particularly for the complex NBS gene family. For example, NLRSeek employs genome reannotation to recover NLR genes missed by automated annotation, while HRP uses a two-level homology search that first identifies R-genes in automated gene predictions then uses these as queries for full-length homology searches in the genome assembly [33] [34]. The integration of these methods with standard domain-based approaches provides a more complete picture of the NBS gene repertoire in plant genomes.

Evolutionary Insights from Domain Architecture Analysis

Comparative analysis of domain architectures across plant species has revealed fundamental insights into the evolutionary dynamics of NBS genes. Large-scale studies examining species ranging from mosses to monocots and dicots have identified 12,820 NBS-domain-containing genes classified into 168 distinct architectural classes [7]. This diversity encompasses both classical patterns and numerous species-specific combinations, reflecting continuous innovation in plant immune receptors.

Phylogenetic analyses based on domain architecture and sequence similarity have demonstrated lineage-specific expansions and losses of particular NBS gene subfamilies. For example, TNL-type genes are absent entirely from cereal genomes [1], while recent studies have documented TNL loss in specific eudicot species including sesame and Vernicia fordii [27]. Similarly, analyses in Salvia miltiorrhiza revealed a marked reduction in TNL and RNL subfamily members compared to other eudicots [28]. These patterns reflect divergent evolutionary trajectories in different plant lineages and highlight how domain architecture analysis contributes to understanding the macroevolution of plant immune systems.

Functional Implications of Domain Architecture Diversity

The diversity of domain architectures in plant NBS genes has profound functional implications for disease resistance mechanisms. Different domains contribute distinct biochemical functions to the multi-domain NBS proteins:

The NBS domain binds and hydrolyzes nucleotides (ATP/GTP), serving as a molecular switch for immune activation [1] [30]
The LRR domain provides recognition specificity through protein-protein interactions, often determining pathogen recognition specificity [1] [30]
The TIR domain engages in specific signaling pathways distinct from those activated by CC domains [1]
The CC domain facilitates protein-protein interactions and signaling complex formation [30]

Studies of specific NBS genes have demonstrated how domain architecture influences function. For example, functional analysis of the Rx CC-NBS-LRR protein from potato revealed that separate protein domains can physically interact and function in trans, with the LRR domain required for both elicitor recognition and activation of signaling domains [30]. Similarly, research on tung tree NBS-LRR genes identified specific orthologous gene pairs with distinct expression patterns in resistant and susceptible varieties, highlighting how sequence variation in promoter regions and coding sequences of NBS genes contributes to functional differences in disease resistance [27].

Research Reagent Solutions for Plant NBS Gene Studies

Table 3: Essential Research Reagents and Resources for Plant NBS Gene Analysis

Category	Specific Resources	Application in NBS Research	Key Features
Bioinformatics Tools	HMMER, PfamScan, SMART, NLRSeek, HRP	Domain annotation, gene identification, evolutionary analysis	Specialized algorithms for domain detection and R-gene annotation [7] [33] [34]
Domain Databases	Pfam, InterPro, SMART, CDART	Domain identification, functional annotation, architecture analysis	Curated domain models, integrated annotations, architecture retrieval [7] [28] [31]
Genomic Resources	NCBI Genome, Phytozome, Plaza, CottonFGD	Source of genome sequences and annotations	Publicly available genome assemblies for multiple plant species [7]
Expression Databases	IPF Database, NCBI BioProject, CottonFGD	Expression profiling under various conditions	Tissue-specific, stress-responsive expression data for NBS genes [7]
Validation Methods	VIGS (Virus-Induced Gene Silencing), Protein-protein interaction assays	Functional characterization of candidate NBS genes	Experimental validation of immune function and molecular interactions [7] [27] [30]

These research reagents collectively enable a comprehensive approach to plant NBS gene analysis, from initial identification through functional characterization. The integration of bioinformatic tools with experimental validation methods is particularly important for establishing links between domain architecture, molecular function, and disease resistance phenotypes.

The integration of HMMER, PfamScan, and SMART domain analysis provides a powerful framework for investigating the complex landscape of plant NBS genes. These bioinformatic pipelines enable researchers to decipher the domain architecture patterns that underlie functional specialization and evolutionary adaptation in plant immune receptors. As genomic resources continue to expand across diverse plant species, these approaches will play an increasingly important role in identifying novel resistance genes and understanding the molecular basis of disease resistance.

Future developments in this field will likely include more sophisticated machine learning approaches for domain annotation and function prediction, improved integration of structural information for functional inference, and enhanced methods for analyzing the complex evolutionary dynamics of large gene families. The continued refinement of specialized tools like NLRSeek and HRP will further address the challenges of accurately annotating NBS genes in plant genomes [33] [34]. Through the application and continued development of these bioinformatic pipelines, researchers can accelerate the discovery of valuable resistance genes and contribute to the development of disease-resistant crops through marker-assisted breeding and genetic engineering.

Plant resistance (R) genes encode proteins that form the core of the plant immune system, enabling the recognition of specific pathogen effectors and the activation of robust defense responses, including the synthesis of antimicrobial compounds, cell wall reinforcement, and programmed cell death in infected cells [35]. The identification of novel R-genes is a critical component of disease resistance breeding programs aimed at safeguarding global food security [35]. However, the accurate identification of these genes in plant genomes remains challenging due to their extraordinary diversity, complex genomic architecture, and sequence variability [35] [36]. Plant R-genes are often organized in clusters of closely duplicated genes and can be mistaken for repetitive elements during standard annotation procedures [35]. Furthermore, their typically low expression levels makes prediction based solely on RNA-Seq data difficult [35].

Traditional computational methods for R-gene identification have primarily relied on alignment-based approaches using tools such as BLAST, HMMER, and InterProScan to identify conserved domains [35] [36]. While effective for genes with high sequence homology, these methods often fail when homology is low, particularly when annotating newly sequenced plant genomes [35]. More recent machine learning approaches, such as support vector machines (SVM), have improved prediction capabilities but still face limitations in feature extraction and model accuracy [35]. The development of PRGminer, a deep learning-based high-throughput prediction tool, represents a significant advancement in overcoming these challenges and enabling accurate, large-scale identification and classification of plant resistance genes [35].

PRGminer: A Deep Learning Framework for R-gene Prediction

Core Architecture and Two-Phase Prediction System

PRGminer employs a sophisticated deep learning framework implemented in two distinct phases that sequentially identify and classify resistance genes. This structured approach enables high-precision prediction while effectively distinguishing between different functional classes of R-genes [35] [37].

Phase I: R-gene Identification - In this initial phase, the tool analyzes input protein sequences to classify them as either R-genes or non-R-genes. The model achieves remarkable accuracy in this binary classification, with reported performance metrics of 98.75% accuracy in k-fold training/testing procedures and 95.72% accuracy on independent testing, with a high Matthews correlation coefficient of 0.98 during training and 0.91 in independent testing [35].

Phase II: R-gene Classification - Sequences identified as R-genes in Phase I proceed to this classification phase, where they are categorized into one of eight specific R-gene classes. The system achieves an overall accuracy of 97.55% in k-fold training/testing and 97.21% on independent testing, with MCC values of 0.93 and 0.92 respectively [35].

The following diagram illustrates the complete PRGminer workflow, from input to final classification:

Figure 1: PRGminer Two-Phase Workflow. The tool processes protein sequences through initial R-gene identification followed by detailed classification into one of eight specific classes.

Deep Learning Methodology and Feature Representation

PRGminer harnesses the power of deep learning algorithms, which utilize multiple layers to extract higher-level features from raw input data [35]. Unlike traditional alignment-based methods, PRGminer uses derived protein sequences as input, extracting both sequential and convolutional features from raw encoded protein sequences based on classification [35]. Among various sequence representations tested, the dipeptide composition approach demonstrated the best prediction performance, providing optimal feature representation for the deep learning model [35].

The model was trained on comprehensive datasets sourced from public databases including Phytozome, Ensemble Plants, and NCBI [35]. The initial dataset contained 18,952 R-genes and 19,212 non-Rgenes, which was divided into training and independent testing sets in a 9:1 ratio [35]. For phase II classification, the R-genes dataset was divided into eight classes with the following distribution: Coiled-coil-NBS-LRR (CNL) with 1,883 sequences, Kinase (KIN) with 8,591 sequences, and six additional well-defined classes [38].

R-gene Classification System and Domain Architectures

Comprehensive Categorization of Resistance Genes

PRGminer classifies resistance genes into eight distinct categories based on their domain architectures and functional characteristics. This classification system encompasses the major known types of plant resistance proteins, providing researchers with detailed structural and functional information about predicted R-genes [37].

Table 1: PRGminer R-gene Classification System and Domain Architectures

Class	Domain Architecture	Key Features	Functional Role
CNL	Coiled-coil, NBS, LRR	Central NB-ARC domain, C-terminal LRR, N-terminal coiled-coil	Intracellular pathogen recognition, ETI activation [37]
TNL	TIR, NBS, LRR	TIR domain at N-terminus, NB-ARC, LRR	Intracellular receptor, ETI signaling [37]
TIR	TIR only	Contains TIR domain, lacks LRR or NBS	Signaling component in immune response [37]
RLP	LRR, Transmembrane	Extracellular LRR, transmembrane region, short cytoplasmic tail	Pathogen recognition at cell surface [37]
RLK	LRR, Kinase	Extracellular LRR, intracellular kinase domain	Pattern recognition, signal transduction [37]
LECRK	Lectin, Kinase, TM	Lectin domain, kinase, potential transmembrane	Carbohydrate recognition, defense signaling [37]
LYK	Lysin Motif, Kinase, TM	LysM domain, kinase, potential transmembrane	Chitin recognition, fungal resistance [37]
KIN	Kinase	Kinase domain primarily	Phosphorylation in defense signaling [37]

NBS Gene Diversity and Architectural Patterns

The comprehensive classification of nucleotide-binding site (NBS) domain genes, which represent one of the largest superfamilies of resistance genes, reveals remarkable architectural diversity across plant species. Recent research has identified 12,820 NBS-domain-containing genes across 34 species ranging from mosses to monocots and dicots [7]. These genes display both classical patterns (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific structural patterns (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS) [7].

Orthogroup analysis has identified 603 orthogroups with some core (most common orthogroups) and unique (highly species-specific) orthogroups showing evidence of tandem duplications [7]. Expression profiling has revealed the putative upregulation of specific orthogroups (OG2, OG6, OG15) in different tissues under various biotic and abiotic stresses, highlighting their functional importance in plant immunity [7].

Experimental Validation and Performance Assessment

Benchmarking and Comparative Performance

PRGminer has undergone rigorous validation using experimentally confirmed R-genes, demonstrating exceptional performance in predicting known resistance genes [35]. The tool's accuracy surpasses traditional methods, particularly for genes with low sequence homology where alignment-based approaches typically fail [35].

Table 2: PRGminer Performance Metrics Across Validation Methods

Validation Metric	Phase I (R-gene Identification)	Phase II (R-gene Classification)
K-fold Training/Testing Accuracy	98.75%	97.55%
Independent Testing Accuracy	95.72%	97.21%
Matthews Correlation Coefficient (K-fold)	0.98	0.93
Matthews Correlation Coefficient (Independent)	0.91	0.92
Processing Time	~2 minutes for standard datasets	Included in total processing time

Functional Validation Through Experimental Approaches

Beyond computational validation, the functional importance of NBS genes predicted by systems like PRGminer has been confirmed through laboratory experiments. In one significant study, researchers employed virus-induced gene silencing (VIGS) to silence the GaNBS (OG2) gene in resistant cotton, demonstrating its putative role in virus tittering and confirming the functional relevance of predicted NBS genes [7].

Protein-ligand and protein-protein interaction studies have further validated the biological significance of predicted NBS genes, showing strong interactions between putative NBS proteins and ADP/ATP as well as core proteins of the cotton leaf curl disease virus [7]. These experimental validations provide crucial evidence supporting the accuracy and biological relevance of computational predictions generated by tools like PRGminer.

Plant Immunity Framework and R-gene Signaling Pathways

The resistance genes predicted by PRGminer operate within the sophisticated two-layered immune system of plants. This system provides comprehensive protection against diverse pathogens through coordinated molecular interactions [35] [28].

Figure 2: Plant Immunity Signaling Pathways. The two-layered immune system showing PAMP-triggered immunity (PTI) and effector-triggered immunity (ETI) pathways mediated by different classes of R-genes.

The first layer, PAMP-triggered immunity (PTI), is initiated when cell surface-localized pattern recognition receptors (PRRs) recognize conserved pathogen-associated molecular patterns (PAMPs) [35] [28]. PRGminer identifies several classes of these receptors, including receptor-like kinases (RLKs) and receptor-like proteins (RLPs) [37]. When pathogens successfully deliver effector proteins to suppress PTI, the second layer of defense, effector-triggered immunity (ETI), is activated primarily through intracellular resistance proteins encoded by NBS-LRR genes [35] [28]. These two immune pathways function synergistically rather than independently, providing robust protection against invading pathogens [28].

The effective implementation of R-gene prediction and validation requires a suite of specialized computational tools and databases. The following research toolkit summarizes essential resources for comprehensive resistance gene analysis.

Table 3: Research Reagent Solutions for R-gene Prediction and Analysis

Resource	Type	Function	Application in R-gene Research
PRGminer	Deep Learning Tool	R-gene identification and classification	High-throughput prediction of resistance genes from protein sequences [35] [37]
PfamScan	Domain Search Tool	Protein domain identification	Detection of conserved R-gene domains (NB-ARC, TIR, CC, LRR) [7]
InterProScan	Integrated Database	Protein sequence analysis	Functional analysis of predicted R-genes [35]
Phytozome	Plant Genomics Database	Genomic data repository	Source of training data and comparative genomics [35]
OrthoFinder	Orthology Analysis Tool	Gene family evolution	Evolutionary analysis of R-gene families across species [7]
RNA-seq Data	Transcriptomic Data	Gene expression profiling	Validation of R-gene expression under stress conditions [7]
VIGS	Functional Validation	Gene silencing	Experimental verification of R-gene function [7]

PRGminer represents a significant advancement in the computational prediction of plant resistance genes, leveraging deep learning to overcome limitations of traditional homology-based approaches. By achieving high accuracy in both identification (>98% training accuracy) and classification (>97% training accuracy) of R-genes, this tool enables researchers to efficiently explore the resistance gene repertoire of plant species [35]. The integration of PRGminer with domain architecture analysis provides valuable insights into the structural diversity and evolutionary dynamics of NBS genes across plant species [7].

As plant pathogens continue to evolve and threaten global food security, tools like PRGminer will play an increasingly crucial role in accelerating the discovery of novel resistance genes and developing strategies for breeding disease-resistant crops [35]. The continued refinement of deep learning approaches in plant genomics promises to further enhance our understanding of plant immunity and contribute to sustainable agricultural practices.

Within the broader context of research on domain architecture patterns in plant Nucleotide-Binding Site (NBS) genes, transcriptomic profiling provides a critical functional lens. The NBS gene family, particularly the NBS-LRR (Leucine-Rich Repeat) subclass, constitutes the largest class of plant disease resistance (R) genes, serving as intracellular immune receptors that initiate effector-triggered immunity (ETI) upon pathogen recognition [16] [9]. The core thesis of this field posits that the diversification of NBS gene domain architectures—including canonical structures like TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), and RNL (RPW8-NBS-LRR), alongside numerous atypical variants—is a fundamental evolutionary strategy that enables plants to perceive diverse biotic and abiotic stressors [7]. This technical guide details how modern transcriptomic approaches are deployed to link these genetic blueprints to dynamic stress responses, providing researchers with methodologies to decipher the expression patterns that underplant adaptive immunity.

Quantitative Data on NBS Genes Across Plant Species

Genome-wide studies reveal significant variation in the size and composition of NBS gene families across plant species, influenced by evolutionary processes such as whole-genome and tandem duplications [7]. The following table summarizes the quantitative data from recent genomic studies.

Table 1: NBS-LRR Gene Family Size in Selected Plant Species

Plant Species	Total NBS Genes Identified	Typical NBS-LRR (with N & LRR domains)	Notable Subfamily Distribution	Key Reference
Salvia miltiorrhiza (Danshen)	196	62	61 CNL, 1 RNL, marked reduction in TNL/RNL [16]	(Wang et al., 2025) [16]
Dendrobium officinale	74	22 NBS-LRR	10 CNL, no TNL genes identified [9]	(Chen et al., 2022) [9]
Sweet Orange (Citrus sinensis)	111	43 with LRR domains	31 CC-domain containing, 15 TIR-domain containing [39]	(Yin et al., 2023) [39]
Tobacco (Nicotiana benthamiana)	156	53 (TNL, CNL, NL)	5 TNL, 25 CNL, 23 NL [25]	(Li et al., 2025) [25]
Cowpea (Vigna unguiculata)	2,188 R-genes (various classes)	Not Specified	Prominent Kinases (KIN) and transmembrane proteins (RLKs/RLPs) [40]	(Rai et al., 2025) [40]

Expression profiling under stress conditions consistently shows differential regulation of NBS genes. In Dendrobium officinale, treatment with the defense hormone salicylic acid (SA) led to the significant upregulation of six NBS-LRR genes, with Dof020138 identified as a key hub gene connected to pathogen recognition and signal transduction pathways [9]. Similarly, analysis of the medicinal plant Salvia miltiorrhiza revealed that the promoters of its SmNBS genes are enriched with cis-acting elements related to plant hormones and abiotic stress, and their expression is closely associated with secondary metabolism [16]. A large-scale study analyzing 12,820 NBS genes from 34 plant species found specific orthogroups (e.g., OG2, OG6, OG15) were upregulated in different tissues under various biotic and abiotic stresses in cotton accessions with varying tolerance to cotton leaf curl disease [7].

Experimental Protocols for Transcriptomic Profiling

A standardized workflow for conducting transcriptomic profiling of NBS genes is essential for generating comparable and reliable data. The following section outlines key experimental and bioinformatic protocols.

Plant Material Preparation and Stress Treatment

Treatment Selection: For biotic stress, researchers often use pathogen inoculations (e.g., Fusarium oxysporum [9]) or treatment with defense hormones like salicylic acid (SA) [9]. For abiotic stress, common treatments include cold, drought, salt, and heat stress [7] [41].
Experimental Design: Include susceptible and tolerant/resistant plant accessions for comparison. For example, a study on cotton used tolerant (Mac7) and susceptible (Coker 312) Gossypium hirsutum accessions to identify unique genetic variants in NBS genes associated with resistance [7]. The design should also encompass multiple time points post-treatment to capture dynamic expression changes.

RNA Extraction, Library Preparation, and Sequencing

RNA Extraction: Use standardized kits (e.g., Qiagen kits) to extract high-quality total RNA from treated and control tissues, ensuring an A260/A280 ratio of 1.8-2.0 [40].
Library Preparation and Sequencing: Prepare sequencing libraries using commercial kits (e.g., NEXTFLEX Rapid DNA-seq kit). Sequencing can be performed on various platforms, with Illumina (short-read) being the most common for RNA-seq. For more complex genomes, a hybrid approach combining Illumina and Nanopore long-read sequencing can be used for superior assembly [40].

Bioinformatic Analysis Workflow

Read Processing and Assembly: Process raw reads by trimming adapters and filtering for quality. Assemble the cleaned reads into transcripts de novo or via reference-based assembly using the respective plant genome [9] [40].
Gene Identification and Differential Expression: Identify NBS-encoding genes from the assembled genome or transcriptome using tools like HMMER with the NB-ARC (PF00931) Hidden Markov Model (HMM) profile [16] [25]. For expression analysis, map RNA-seq reads to the reference transcripts, calculate counts or FPKM values, and identify Differentially Expressed Genes (DEGs) using packages like DESeq2 or EdgeR [9] [7].
Advanced Integrative Analysis: Perform Weighted Gene Co-expression Network Analysis (WGCNA) to identify modules of co-expressed genes and hub genes, as was done to pinpoint Dof020138 in Dendrobium [9]. Analyze promoter regions (e.g., 1500 bp upstream of the start codon) for cis-acting elements using databases like PlantCARE [39] [25].

Table 2: Key Research Reagent Solutions for Transcriptomic Profiling of NBS Genes

Research Reagent / Tool	Function / Application	Example Use in Context
HMMER Suite	Identifies NBS domain-containing genes in genome/transcriptome assemblies using profile HMMs (e.g., PF00931).	Used for genome-wide identification of 156 NBS-LRR genes in N. benthamiana [25].
PlantCARE Database	Identifies cis-acting regulatory elements in promoter sequences.	Revealed hormone and stress-related elements in sweet orange NBS-LRR promoters [39].
MEME Suite	Discovers conserved protein motifs in nucleotide or amino acid sequences.	Analyzed 10 conserved motifs in NBS-LRR proteins of N. benthamiana [25].
Virus-Induced Gene Silencing (VIGS)	Functional validation through transient gene knockdown.	Silencing of GaNBS (OG2) in resistant cotton confirmed its role in virus defense [7].
Weighted Gene Co-expression Network Analysis (WGCNA)	Constructs co-expression networks to identify hub genes and functional modules.	Identified Dof020138 as a central hub in D. officinale's immune response to SA [9].

Signaling Pathways and Molecular Interactions

NBS-LRR proteins function as central hubs in a complex immune signaling network. Understanding their activation and downstream signaling is crucial for interpreting transcriptomic data.

Core NBS-LRR Activation Mechanism

NBS-LRR proteins act as intracellular sensors. In the default state, the NBS domain is bound to ADP. Upon pathogen effector recognition, often mediated by the LRR domain, a conformational change occurs, promoting the exchange of ADP for ATP. This "on" state triggers the protein's signaling activity, leading to the activation of defense responses [25]. This ATP-bound state activates downstream signaling, often culminating in a Hypersensitive Response (HR) and programmed cell death to restrict pathogen spread [16] [25].

Major Signaling Pathways in ETI

The specific downstream signaling cascades differ between the main NBS-LRR subfamilies, particularly CNLs and TNLs.

TNL signaling generally requires helper proteins. For instance, in Arabidopsis, the EDS1/PAD4 complex associates with the RNL helper protein ADR1 to form a "supramolecular complex" that serves as a convergence point for defense signaling [16]. The specific pathways in which CNLs signal are an area of active research, but they can converge with TNLs at the level of RNL helpers or activate parallel pathways [16] [42]. Ultimately, these pathways reprogram the cell, inducing the synthesis of antimicrobial compounds, reinforcement of cell walls, and often the hypersensitive response [16].

Transcriptomic studies reveal that this core immunity network is deeply integrated with other cellular processes. In Salvia miltiorrhiza, the expression of NBS-LRR genes is closely linked to secondary metabolism, suggesting a coordinated resource allocation between defense and the production of bioactive compounds like tanshinones [16]. Furthermore, the widespread control of NBS transcripts by microRNAs is theorized to be a mechanism that allows plants to maintain large NLR repertoires without the fitness costs of constant, high-level expression, a layer of regulation detectable through small RNA sequencing [7].

Transcriptomic profiling has unequivocally established that NBS genes, with their diverse domain architectures, are dynamically regulated by a wide spectrum of biotic and abiotic stresses. The methodologies outlined herein—from rigorous experimental design and advanced sequencing to sophisticated bioinformatic integration—provide a roadmap for elucidating the specific roles of individual NBS genes and their orthogroups. The consistent finding that NBS expression is intertwined with phytohormone signaling, secondary metabolism, and a complex web of helper proteins underscores that these genes are not isolated sentinels but integral nodes in the plant's overall stress adaptation network. Future research, leveraging these transcriptomic insights and functional validation tools like VIGS, will be pivotal in translating this knowledge into strategies for enhancing crop resilience through the targeted manipulation of the NBS gene repertoire.

Nucleotide-binding site (NBS) genes constitute one of the most critical superfamilies of resistance (R) genes that equip plants to detect pathogen effectors and activate robust immune responses [24] [43]. These genes typically encode proteins characterized by a conserved NBS domain (also known as NB-ARC) alongside leucine-rich repeat (LRR) regions and variable N-terminal domains such as TIR (Toll/Interleukin-1 Receptor) or CC (coiled-coil) [44] [24]. The NBS domain itself contains several conserved motifs—including the P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, GLPL, and MHDV—that are essential for nucleotide binding and signaling activation [43]. The remarkable diversity of NBS genes, both in sequence and domain architecture, presents a significant research challenge, particularly for understanding the genetic basis of disease resistance across plant species.

This technical guide frames orthogroup analysis within a broader thesis investigating domain architecture patterns in plant NBS genes. This analytical approach moves beyond single-species studies to enable the systematic identification of evolutionarily conserved core genes and lineage-specific innovations across multiple genomes. Such analyses have revealed that NBS genes are often distributed unevenly across chromosomes and frequently organized in clusters, with studies identifying up to 54% of NBS-LRR genes forming physical clusters in some plant genomes [24]. Furthermore, comparative analyses between wild and cultivated species, such as in the Asparagus genus, have documented significant NLR gene contraction during domestication (e.g., from 63 NLR genes in wild A. setaceus to just 27 in cultivated A. officinalis), providing insights into why domesticated crops often exhibit increased disease susceptibility [8] [44].

Fundamental Principles: Orthogroup Classification and NBS Gene Diversity

Theoretical Framework of Orthogroup Analysis

Orthogroup analysis provides a phylogenetically informed framework for classifying homologous genes across multiple species based on their evolutionary history. An orthogroup encompasses all genes descended from a single gene in the last common ancestor of the species being compared, including both orthologs (genes separated by speciation events) and paralogs (genes separated by duplication events) [45]. This approach is particularly valuable for studying gene families with complex evolutionary histories, such as NBS genes, which frequently undergo tandem duplications and gene loss events.

In the context of NBS gene families, orthogroups are typically categorized into three principal classes:

Core Orthogroups: Contain genes present in all or most surveyed species, representing conserved immune components maintained over evolutionary time.
Group-Specific Orthogroups: Found only in certain taxonomic groups (e.g., pathogens versus non-pathogens, or within specific plant families), potentially associated with specialized adaptations.
Accessory/Genome-Specific Orthogroups: Unique to individual species or genomes, representing recent evolutionary innovations or species-specific adaptations [46].

This classification system enables researchers to distinguish between conserved immune mechanisms shared across plant taxa and specialized adaptations that may underlie differences in pathogen resistance.

Diversity of NBS Domain Architectures

NBS genes exhibit remarkable structural diversity, with numerous domain architecture patterns observed across plant species. Comprehensive analyses of 12,820 NBS-domain-containing genes across 34 plant species have identified 168 distinct domain architecture classes, encompassing both classical and species-specific structural patterns [7]. This diversity is not random but follows discernible evolutionary patterns that can be systematically categorized through orthogroup analysis.

Table 1: Classification of NBS-LRR Genes Based on Domain Architecture

Category	Domain Structure	Representative Subclasses	Characteristics
TNL	TIR-NBS-LRR	TN, TNL, TNL-TIR	Contains TIR domain at N-terminus; predominant in dicots
CNL	CC-NBS-LRR	CN, CNL, CNL-CC	Features coiled-coil domain at N-terminus; common across angiosperms
RNL	RPW8-NBS-LRR	RN, RNL	Contains RPW8 domain; functions in signaling
Truncated Variants	Partial domains	N, NL, NLL, NN, NLN	Lack one or more canonical domains; may retain functionality

The distribution of these architectural classes varies significantly across plant lineages. For instance, studies in pepper (Capsicum annuum) identified 252 NBS-LRR genes with a striking dominance of nTNL types (248 genes) over TNL types (only 4 genes), reflecting lineage-specific evolutionary paths [24]. Similarly, analyses of euasterid species have revealed distinctive patterns in NBS gene composition and clustering compared to eurosid species, underscoring the importance of taxonomic context in interpreting orthogroup analyses [43].

Methodological Workflow: From Gene Identification to Orthogroup Inference

Genome-Wide Identification of NBS Genes

The initial and crucial step in orthogroup analysis involves the comprehensive identification of NBS genes across target genomes. This process requires a multi-pronged approach to ensure both sensitivity and specificity.

Primary Identification Protocols:

Hidden Markov Model (HMM) Searches
- Utilize HMMER software with the NB-ARC domain (Pfam: PF00931) as query
- Apply stringent E-value cutoffs (e.g., 10⁻⁶⁰ for initial screening, 0.01 for candidate selection)
- Construct species-specific HMM profiles using HMMER build for refined searches [43]
- Validate candidates using NCBI's Conserved Domain Database (CDD) with E-value ≤ 1e-5 [44]
Complementary BLAST Searches
- Perform local BLASTp analyses against reference NLR proteins from well-annotated species (Arabidopsis thaliana, Oryza sativa)
- Apply stringent E-value cutoffs (1e-10) to minimize false positives [8]
- Extract candidate sequences using tools like TBtools for further validation [44]
Domain Architecture Validation
- Characterize protein domains using InterProScan and NCBI's Batch CD-Search
- Identify coiled-coil domains using COILS/PCOILS (P ≥ 0.9) or PAIRCOIL2 (P ≤ 0.025) [43]
- Classify genes based on complete domain architecture and chromosomal distribution [24]

Table 2: Key Bioinformatics Tools for NBS Gene Identification and Analysis

Tool Category	Specific Tools	Primary Function	Key Parameters
Domain Identification	HMMER, PfamScan, InterProScan	Identify conserved protein domains	E-value cutoffs (1e-50 to 1e-5)
Sequence Alignment	MAFFT, Clustal Omega, MUSCLE	Multiple sequence alignment	Default parameters typically sufficient
Motif Discovery	MEME Suite	Identify conserved protein motifs	Motif width: ≥6 and ≤50 amino acids
Genome Visualization	TBtools, GSDS 2.0	Visualize gene structures and distributions	Customizable based on project needs
Orthology Inference	OrthoFinder, SonicParanoid, Broccoli	Cluster genes into orthogroups	Inflation parameter (I=1.5-3.0)

Orthology Inference and Orthogroup Construction

Once NBS genes are identified across all target genomes, orthology inference algorithms are employed to cluster them into orthogroups. Several algorithms are available, each with distinct strengths and methodological approaches.

Orthology Inference Workflow:

Data Preparation
- Compile protein sequences for all identified NBS genes in FASTA format
- Ensure consistent annotation standards across all genomes
- For polyploid species, consider specialized pipelines like DaapNLRSeek for accurate NLR prediction [47]
Algorithm Selection and Execution
- OrthoFinder: Phylogenetically informed tree-based inference that normalizes BLAST bit scores based on gene length and phylogenetic distance [8] [45]
- SonicParanoid: Graph-based inference optimized for speed without phylogenetic information
- Broccoli: Tree-based algorithm using network analyses to determine orthology networks [45]
- Execute chosen algorithm with appropriate parameters (e.g., DIAMOND for fast sequence similarity searches, MCL for clustering)
Orthogroup Classification
- Separate orthogroups into core, group-specific, and accessory categories using custom Python scripts [46]
- Designate orthogroups with genes from ≥2 species in both phytopathogenicity groups as "core"
- Classify orthogroups with genes from ≥2 species in only one group as "group-specific"
- Designate orthogroups with genes from only a single genome as "accessory" [46]

Diagram 1: Orthogroup analysis workflow for NBS genes. The process involves three major phases: comprehensive gene identification, computational orthogroup construction, and functional classification with validation.

Data Analysis and Interpretation Frameworks

Evolutionary and Phylogenetic Analyses

Following orthogroup construction, evolutionary analyses provide critical insights into the dynamics of NBS gene family expansion and contraction across plant lineages.

Phylogenetic Reconstruction Protocol:

Multiple Sequence Alignment
- Consolidate protein sequences of candidate NLR genes from all study species into a single file
- Perform multiple sequence alignment using MAFFT or Clustal Omega [44]
- Manually clean alignments to remove sequences with poor ends and incomplete motifs using MEGA [43]
Phylogenetic Tree Construction
- Utilize maximum likelihood method based on JTT matrix-based model implemented in MEGA
- Select tree with highest log likelihood value
- Perform bootstrap analysis with 1000 replicates to assess node support [8]
- Classify NLRs into subfamilies (CNL, TNL, RNL) based on phylogenetic positioning and domain architecture
Evolutionary Dynamics Assessment
- Calculate nonsynonymous (dN) and synonymous (dS) substitution rates for orthologous groups
- Identify signals of positive selection (dN/dS > 1) or purifying selection (dN/dS < 1)
- Date large-scale duplication events through analysis of synonymous substitution patterns [43]

Genomic Distribution and Cluster Analysis

NBS genes frequently exhibit non-random genomic distributions, often forming physical clusters that represent hotspots of rapid evolution and diversification.

Cluster Identification Methodology:

Chromosomal Mapping
- Determine chromosomal distribution of NLR family members using TBtools or similar utilities
- Extract gene positional information from genome annotations
- Visualize distributions through chromosomal mapping [44]
Cluster Definition and Analysis
- Define gene clusters as adjacent NLR pairs separated by ≤8 intervening genes [8]
- Determine relative gene orientations (head-to-head, head-to-tail, tail-to-tail) using BEDTools
- Evaluate statistical significance through χ² tests against random expectations (10,000 permutations) [44]
- Calculate proportion of clustered genes (e.g., 54% of NBS-LRR genes in pepper form 47 distinct clusters) [24]
Collinearity and Synteny Analysis
- Perform cross-species comparisons using "One Step MCScanX" from TBtools [44]
- Identify conserved syntenic blocks containing NBS genes
- Detect lineage-specific rearrangements and breakpoints

Functional Validation and Experimental Integration

Expression Profiling and Transcriptomic Validation

Orthogroup predictions require functional validation to confirm biological relevance. Transcriptomic analyses provide critical evidence for gene expression patterns under various conditions.

Expression Analysis Framework:

Data Collection and Processing
- Retrieve RNA-seq data from public databases (e.g., NCBI BioProjects, species-specific databases)
- Categorize expression data into tissue-specific, abiotic stress-specific, and biotic stress-specific groups [7]
- Process RNA-seq data through standardized transcriptomic pipelines
- Extract FPKM or TPM values for comparative analysis
Differential Expression Analysis
- Compare expression profiles between orthogroups across conditions
- Identify conserved expression patterns in core orthogroups versus condition-specific expression in group-specific orthogroups
- Correlate expression patterns with phenotypic data (e.g., disease susceptibility vs resistance)
Case Study: Asparagus NLR Expression
- Pathogen inoculation assays reveal distinct phenotypic responses (susceptible A. officinalis vs. asymptomatic A. setaceus)
- Expression analysis shows most preserved NLR genes in A. officinalis exhibit unchanged or downregulated expression post-infection
- Functional impairment in disease resistance mechanisms correlates with NLR gene contraction during domestication [8] [44]

Functional Characterization Through Genetic Approaches

Ultimate validation of NBS gene function requires direct genetic manipulation and phenotypic assessment.

Functional Validation Protocols:

Virus-Induced Gene Silencing (VIGS)
- Design VIGS constructs targeting candidate NBS genes from specific orthogroups
- Infect plants with engineered viral vectors and monitor disease progression
- Example: Silencing of GaNBS (OG2) in resistant cotton demonstrated its putative role in virus tittering [7]
Protein Interaction Studies
- Perform protein-ligand and protein-protein interaction assays
- Demonstrate interaction between NBS proteins and pathogen effectors or signaling components
- Example: Two paired NLRs from sugarcane induce immune responses in Nicotiana benthamiana [47]
Genetic Transformation
- Express candidate NBS genes in susceptible varieties
- Assess complementation of resistance phenotypes
- Evaluate potential fitness costs associated with NBS gene expression

Table 3: Key Research Reagent Solutions for NBS Orthogroup Analysis

Reagent/Resource Category	Specific Examples	Function/Application	Technical Notes
Software Platforms	OrthoFinder, SonicParanoid, Broccoli	Orthology inference from genomic data	OrthoFinder recommended for phylogenetic accuracy
Domain Databases	Pfam, InterPro, PRGdb 4.0	Domain identification and classification	PRGdb specialized for plant R genes
Genomic Resources	Phytozome, PLAZA, GreenPhylDB	Reference genomes and annotations	PLAZA offers precomputed orthogroups
Expression Databases	NCBI BioProjects, CottonFGD, Plant Expression Database	Tissue-specific and stress-responsive expression data	Essential for validating predictions
Experimental Tools	VIGS vectors, Yeast two-hybrid systems, Antibodies	Functional validation of candidate genes	VIGS crucial for high-throughput testing

Orthogroup analysis represents a powerful framework for elucidating the complex evolutionary history and functional diversification of NBS gene families across plant species. By systematically classifying NBS genes into core, group-specific, and accessory orthogroups, researchers can distinguish conserved immune components from lineage-specific innovations, providing crucial insights into the genetic basis of disease resistance variation. When integrated with structural analyses of domain architectures, this approach reveals how specific domain combinations correlate with evolutionary conservation or specialization.

The methodological pipeline presented in this guide—encompassing comprehensive gene identification, rigorous orthology inference, evolutionary analysis, and functional validation—provides a robust foundation for investigating NBS gene families within the broader context of domain architecture research. As genomic resources continue to expand, orthogroup analysis will play an increasingly vital role in translating genomic data into actionable insights for crop improvement and disease resistance breeding.

The nucleotide-binding site leucine-rich repeat (NBS-LRR) genes represent the largest and most critical class of disease resistance (R) proteins in plants, forming a fundamental component of the plant immune system. These genes enable plants to recognize pathogen-secreted effectors and trigger robust immune responses through effector-triggered immunity (ETI), often accompanied by hypersensitive response (HR) and programmed cell death (PCD) [16]. The NBS-LRR gene family exhibits remarkable diversity across plant species, with significant variation in gene number, structural architecture, and evolutionary dynamics. Understanding the genetic variation within this family provides crucial insights into plant-pathogen coevolution and facilitates the development of disease-resistant crops through targeted screening approaches [12] [48].

Recent advances in genome sequencing technologies have generated voluminous genomic data, making comprehensive analysis of genetic variations and their functional consequences increasingly feasible [49]. The NBS-LRR genes are characterized by their modular domain architecture, typically containing a conserved nucleotide-binding site (NBS) domain that binds and hydrolyzes ATP to activate downstream immune signaling, and a leucine-rich repeat (LRR) domain responsible for recognizing diverse effectors released by pathogens [16] [12]. The N-terminal domain varies, comprising either a Toll/interleukin-1 receptor (TIR) domain, a coiled-coil (CC) domain, or a resistance to powdery mildew 8 (RPW8) domain, defining the major subfamilies of NBS-LRR proteins [16].

Table 1: Classification of NBS-LRR Gene Subfamilies Based on Domain Architecture

Subfamily	N-terminal Domain	NBS Domain	LRR Domain	Representative Genes	Key Features
TNL	TIR	Present	Present	RPS4, RPP13	Predominantly in dicots; activates specific signaling pathways
CNL	CC	Present	Present	Rpm1, RPS2	Found in both monocots and dicots; recognizes diverse pathogens
RNL	RPW8	Present	Present	ADR1	Regulatory functions; acts as helper NLRs
TN	TIR	Present	Absent	-	Potential adaptors or regulators
CN	CC	Present	Absent	-	Incomplete domains; function not fully characterized
NL	None	Present	Present	-	Atypical NBS-LRR with no N-terminal domain

The evolution of NBS-LRR genes follows a birth-and-death model, characterized by frequent gene duplications and losses, resulting in lineage-specific expansions and contractions [12]. Comparative genomic analyses reveal substantial variation in NBS-LRR gene composition across plant species. For instance, gymnosperms like Pinus taeda exhibit significant expansion of the TNL subfamily (comprising 89.3% of typical NBS-LRRs), while monocotyledonous species such as Oryza sativa (rice) have completely lost the TNL and RNL subfamilies [16]. In medicinal plants like Salvia miltiorrhiza, researchers identified 196 NBS-LRR genes, but only 62 possessed complete N-terminal and LRR domains, with a notable reduction in TNL and RNL subfamily members compared to other angiosperms [16].

Domain Architecture Patterns in Plant NBS-LRR Genes

Structural Organization and Functional Domains

The domain architecture of NBS-LRR proteins follows a modular organization that determines their function in pathogen recognition and immune signaling. These large proteins range from approximately 860 to 1,900 amino acids and contain at least four distinct domains joined by linker regions: a variable amino-terminal domain, the NBS domain, the LRR region, and variable carboxy-terminal domains [12]. The NBS domain, also called the NB-ARC (nucleotide binding adaptor shared by NOD-LRR proteins, APAF-1, R proteins and CED4) domain, contains several defined motifs characteristic of the 'signal transduction ATPases with numerous domains' (STAND) family of ATPases [12]. This domain functions as a molecular switch in disease signaling pathways, with specific binding and hydrolysis of ATP demonstrated for the NBS domains of tomato CNLs I2 and Mi [12].

The LRR region typically consists of multiple repeats (averaging 14 LRRs per protein) that form a solenoid structure providing a versatile binding surface for pathogen recognition [12]. Diversifying selection has maintained variation in the solvent-exposed residues of the β-sheets of the LRR domain, with evidence of significantly elevated ratios of non-synonymous to synonymous nucleotide substitutions [12]. The amino-terminal domain contains either TIR or CC motifs that are involved in protein-protein interactions, potentially with the proteins being guarded or with downstream signaling components [12]. Polymorphism in the TIR domain of the flax TNL protein L6 affects the specificity of pathogen recognition, highlighting the functional importance of this region [12].

Genomic Distribution and Architectural Variation

The distribution of NBS-LRR genes across plant genomes exhibits distinct patterns that reflect evolutionary adaptations to pathogen pressure. These genes are frequently clustered in the genome as a result of both segmental and tandem duplications [12]. There can be wide intraspecific variation in copy number because of unequal crossing-over within clusters, contributing to the dynamic evolution of resistance specificities [12]. The proportion of different NBS-LRR subfamilies varies markedly among plant species, as illustrated in Table 2.

Table 2: Comparative Analysis of NBS-LRR Gene Distribution Across Plant Species

Plant Species	Total NBS-LRR Genes	CNL Subfamily	TNL Subfamily	RNL Subfamily	Atypical NBS-LRR	Reference
Arabidopsis thaliana	~150-207	~60%	~35%	~5%	58 related proteins	[16] [12]
Oryza sativa (rice)	~505	100%	0%	0%	Not reported	[16] [12]
Solanum tuberosum (potato)	~447	Majority	Minority	Minority	Not reported	[16]
Salvia miltiorrhiza	196	61 typical CNL	0	1 typical RNL	134 atypical	[16]
Nicotiana tabacum (tobacco)	603	76.62% traceable to parental genomes	Limited	Limited	45.5% NBS-only	[50]
Triticum aestivum (wheat)	2151	Majority (e.g., Ym1)	Absent or rare	Limited	Not reported	[50]

In tobacco (Nicotiana tabacum), a recent study identified 603 NBS genes, with approximately 45.5% containing only the NBS domain, 23.3% belonging to the CC-NBS (CN) category, and only 2.5% representing TIR-NBS (TN) members [50]. About 76.62% of NBS members in N. tabacum could be traced back to their parental genomes (N. sylvestris and N. tomentosiformis), demonstrating the impact of polyploidization on NBS-LRR gene family expansion [50]. Whole-genome duplication was found to contribute significantly to the expansion of NBS gene families in Nicotiana species [50].

Experimental Protocols for Genetic Variation Screening

Genome-Wide Identification and Characterization

The identification and characterization of NBS-LRR genes across plant genomes involves a multi-step computational pipeline that leverages sequence homology and domain architecture. The standard protocol begins with sequence retrieval and domain identification using Hidden Markov Model (HMM) profiles of conserved domains. Researchers typically employ HMMER software with the PF00931 (NB-ARC) model from the PFAM database to identify candidate NBS-LRR genes [16] [50]. Additional domains (TIR, LRR, CC) are identified using corresponding PFAM models (PF01582, PF00560, PF07723, PF07725, PF12779, etc.) or the NCBI Conserved Domain Database (CDD) [50].

The second phase involves phylogenetic and structural analysis to classify identified genes into subfamilies and determine evolutionary relationships. Multiple sequence alignment of NBS-LRR protein sequences is performed using tools like MUSCLE with default parameters, followed by phylogenetic tree construction using MEGA11 with neighbor-joining method and bootstrap analysis (typically 1000 replicates) [50]. Genomic distribution analysis identifies patterns of gene clustering and duplication through self-BLASTP, MCScanX for detecting segmental and tandem duplications, and synteny analysis across related genomes [50].

For expression profiling, RNA-Seq analysis provides insights into functional specialization. The protocol includes downloading RNA-seq datasets from public repositories like NCBI SRA, quality control using Trimmomatic, read mapping with Hisat2, transcript quantification using Cufflinks with FPKM normalization, and differential expression analysis with Cuffdiff [50]. In Salvia miltiorrhiza, this approach revealed close associations between specific SmNBS-LRR genes and secondary metabolism, with promoter analysis demonstrating abundance of cis-acting elements related to plant hormones and abiotic stress [16].

Functional Validation of Resistance Specificities

The functional validation of NBS-LRR genes involves both association analysis and direct experimental manipulation. Association analysis links genetic variations to resistance phenotypes through population genetics approaches. This includes calculating non-synonymous (Ka) and synonymous (Ks) substitution rates with KaKs_Calculator 2.0 using evolutionary models like Nei-Gojobori (NG) to detect selection pressures [50]. Population genetic analysis of wild plant species provides information concerning the frequencies and diversity of resistance alleles in nature, and on the selection forces maintaining resistance [48].

For direct functional characterization, pathogen recognition assays test the specificity of NBS-LRR proteins against particular pathogen effectors. The classic example is the Arabidopsis Rpm1 protein, which confers resistance to Pseudomonas syringae carrying AvrRpm1 or AvrB [51]. Population studies of Rpm1 have revealed that resistance and susceptibility alleles have co-existed for millions of years, supporting a 'trench warfare' hypothesis rather than a transient arms-race model [51]. This hypothesis proposes that advances and retreats of resistance-allele frequency maintain variation for disease resistance as a dynamic polymorphism [51].

Protein-protein interaction studies determine the physical interaction between NBS-LRR receptors and pathogen effectors. For example, the wheat CC-NBS-LRR protein Ym1 confers resistance to wheat yellow mosaic virus (WYMV) by specifically recognizing the viral coat protein (CP) [52]. This interaction leads to nucleocytoplasmic redistribution, transitioning Ym1 from an auto-inhibited to an activated state, subsequently triggering hypersensitive responses [52]. Functional studies often involve domain-swap experiments to identify specificity determinants, as demonstrated with the flax L gene alleles, where exchanges in the LRR region altered recognition specificities [48].

Visualization and Analysis Tools for Genomic Data

Chromosomal Mapping and Visualization

Effective visualization of genomic data is essential for interpreting the distribution and organization of NBS-LRR genes across chromosomes. The R package chromoMap provides an efficient solution for creating interactive visualizations of chromosomes and mapping chromosomal features with known coordinates [53]. This tool allows the construction of publication-ready plots that integrate multi-omics data (genomics, transcriptomics, and epigenomics) in relation to their occurrence across chromosomes [53].

ChromoMap offers two annotation algorithms: point-annotation (ignoring element size and annotating on a single base) and segment-annotation (using element size to delimit its location) [53]. The package also enables group annotations where elements can be color-coded for effective visualizations, and feature-associated data visualization where numeric data such as gene expression, methylation status, or feature density values can be visualized as scatter/bar plots or heatmaps [53]. A particularly valuable feature for polyploid species is the multitrack function, which allows rendering each chromosome set independently regardless of the species' ploidy, enabling visualization of homologous chromosome pairs in phased diploid/polyploid genome assemblies [53].

For researchers preferring command-line tools, Spaln and GMAP can align sequences to chromosomes and output results in GFF3 and SAM formats that are easily viewed in interactive genome browsers like IGV [54]. These tools are particularly useful for visualizing the locations of NBS-LRR genes on specific chromosomes, as demonstrated in watermelon genome studies where researchers sought to create chromosome maps showing gene distributions [54].

Evolutionary Analysis and Selection Pressure Assessment

Analyzing evolutionary patterns and selection pressures on NBS-LRR genes provides insights into the mechanisms driving their diversification. The LRR region consistently shows evidence of diversifying selection, particularly in solvent-exposed residues that may constitute ligand contact points [48]. Analysis of the flax L locus revealed that unequal exchange events at complex R loci contribute significantly to the generation of new resistance specificities [48]. In these exchanges, the LRR regions are frequently involved in inter-allelic sequence exchanges that alter recognition specificities [48].

The rate of evolution of NBS-LRR-encoding genes can be rapid or slow, even within an individual cluster of similar sequences [12]. For example, the major cluster of NBS-LRR-encoding genes in lettuce includes genes with two distinct patterns of evolution: type I genes evolve rapidly with frequent gene conversions, while type II genes evolve slowly with rare gene conversion events between clades [12]. This heterogeneous rate of evolution is consistent with a birth-and-death model, in which gene duplication and unequal crossing-over are followed by density-dependent purifying selection [12].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents and Resources for NBS-LRR Gene Analysis

Category	Specific Tool/Resource	Application	Key Features
Software Tools	HMMER v3.1b2 with PF00931	Domain identification	Hidden Markov Model search for NB-ARC domain
	MUSCLE v3.8.31	Multiple sequence alignment	Prepares sequences for phylogenetic analysis
	MEGA11	Phylogenetic tree construction	Neighbor-joining method with bootstrap testing
	MCScanX	Genome duplication analysis	Identifies segmental and tandem duplications
	chromoMap R package	Genome visualization	Interactive chromosomal maps with multi-omics data
Databases	PFAM Database	Domain identification	Curated collection of protein domain families
	NCBI CDD	Domain validation	Conserved Domain Database for verification
	NCBI SRA	RNA-seq data	Sequence Read Archive for expression analysis
Experimental Resources	Ph1b mutant lines	Homoeologous recombination	Promotes crossing-over in polyploid species [52]
	Virus-induced gene silencing (VIGS)	Functional validation	Rapid assessment of gene function in plants
	Heterologous expression systems	Functional analysis	Testing gene function in model systems [48]

The research toolkit for genetic variation screening in NBS-LRR genes continues to expand with new technical innovations. For difficult-to-map loci, such as the wheat Ym1 gene, researchers have developed creative genetic strategies including the use of ph1b mutants to promote homoeologous recombination, allowing fine mapping of genes located within alien introgressions [52]. For expression analysis, RNA-seq protocols have been optimized for plant pathogens, with specific applications for diseases like black shank and bacterial wilt in tobacco, providing insights into NBS-LRR gene induction during defense responses [50].

For functional characterization, protein interaction assays such as yeast two-hybrid systems and co-immunoprecipitation are essential for validating direct interactions between NBS-LRR receptors and pathogen effectors, as demonstrated in the Ym1-WYMV coat protein interaction study [52]. Additionally, domain swap approaches through genetic engineering allow researchers to test the functional contributions of specific protein domains to recognition specificity and signaling activation [48].

The screening of genetic variations in NBS-LRR genes and their association with resistance phenotypes has revolutionized our understanding of plant immunity mechanisms. The integration of genomic, transcriptomic, and functional data has revealed the dynamic evolutionary processes that shape this critical gene family, including birth-and-death evolution, diversifying selection, and lineage-specific expansions and contractions. The structural characterization of NBS-LRR domain architectures has provided insights into the molecular basis of pathogen recognition and subsequent immune activation.

Future research directions will likely focus on harnessing this knowledge for crop improvement through both traditional breeding and biotechnology approaches. The identification of key specificity determinants in the LRR regions may enable engineering of novel recognition capabilities in crop plants. Furthermore, understanding the signaling networks downstream of different NBS-LRR subfamilies will facilitate the development of strategies to enhance immune responses without detrimental fitness costs. As genomic technologies continue to advance, the integration of pan-genome analyses with high-throughput phenotyping will accelerate the discovery of valuable resistance alleles in crop wild relatives and landraces, expanding the genetic resources available for breeding disease-resistant crops in a changing climate.

Navigating Analytical Challenges: Degeneration, Annotation, and Validation Hurdles

Nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes constitute one of the largest and most critical gene families in plant innate immunity, encoding intracellular receptors that detect pathogen effectors and initiate defense responses. However, the functional integrity of these genes is frequently compromised through evolutionary processes, particularly the degradation of the central NB-ARC domain and the loss of the LRR domain. This technical review examines the molecular mechanisms, evolutionary patterns, and functional consequences of such degeneration events across diverse plant species. Through systematic analysis of empirical studies and genomic data, we provide a comprehensive framework for identifying, characterizing, and validating these genetic alterations, with direct implications for crop improvement and disease resistance breeding.

Plant NBS-LRR genes encode modular proteins characterized by three core domains: an variable N-terminal domain [typically Toll/interleukin-1 receptor (TIR) or coiled-coil (CC)], a central nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4 (NB-ARC) domain, and a C-terminal leucine-rich repeat (LRR) domain [12] [24]. The NB-ARC domain serves as a molecular switch that cycles between ADP-bound (inactive) and ATP-bound (active) states to regulate signaling activity, while the LRR domain is primarily involved in pathogen recognition specificity and protein-protein interactions [55] [24]. This sophisticated domain architecture enables plants to detect diverse pathogens and initiate effector-triggered immunity (ETI), often culminating in a hypersensitive response to limit pathogen spread [56] [57].

The NBS-LRR gene family represents one of the largest and most diverse gene families in plants, with significant variation in copy number across species. For instance, Arabidopsis thaliana contains approximately 150 NBS-LRR genes, while Oryza sativa possesses over 400, with even greater numbers anticipated in larger, incompletely sequenced genomes [12]. This extensive diversity arises from dynamic evolutionary processes including gene duplication, unequal crossing-over, and diversifying selection, particularly in the LRR region where solvent-exposed residues display elevated ratios of non-synonymous to synonymous substitutions [12] [56]. However, these same evolutionary mechanisms also predispose NBS-LRR genes to various forms of degeneration, including NB-ARC domain degradation and complete LRR domain loss, with significant functional implications for plant immunity.

Mechanisms and Patterns of Domain Degeneration

NB-ARC Domain Degradation

The NB-ARC domain contains several conserved motifs essential for nucleotide binding and hydrolysis, including the P-loop (Walker A), RNBS-A, kinase-2, RNBS-B, RNBS-C, GLPL, and MHD motifs [55] [24]. Structural and biochemical studies of the NB-ARC domain from tomato NRC1 revealed that this domain co-purifies with ADP and functions as a regulated molecular switch, with conformational changes between nucleotide-bound states controlling signaling activity [55]. Degradation of this domain typically involves mutations in these critical motifs, disrupting nucleotide binding or hydrolysis capacity and consequently impairing immune signal transduction.

Phylogenetic analyses across numerous plant species have revealed that NB-ARC domain degeneration is a common evolutionary phenomenon. In Dendrobium orchids, comparative genomics identified numerous NBS genes with degenerate NB-ARC domains, characterized by disrupted conserved motifs and reduced structural integrity [9]. Similarly, studies in pepper (Capsicum annuum) revealed substantial diversity in NB-ARC domain architecture, including instances where degenerated domains retained structural elements but lost functional capacity [24]. This degeneration often follows gene duplication events, where relaxed selective pressures on redundant copies permit the accumulation of deleterious mutations.

LRR Domain Loss

The LRR domain exhibits exceptional variability in sequence and copy number, with an average of 14 LRRs per protein and often 5-10 sequence variants for each repeat [12]. This diversity generates a vast potential for pathogen recognition specificity, with theoretical combinatorial potential exceeding 9×10^11 variants in Arabidopsis alone [12]. However, this structural complexity also renders the LRR domain particularly susceptible to loss through unequal crossing-over, gene conversion, and frameshift mutations.

Comparative analyses between resistant Vernicia montana and susceptible Vernicia fordii revealed significant LRR domain loss in the susceptible species, which lacked LRR1 and LRR4 domains present in its resistant counterpart [57]. Similarly, genome-wide studies in Fabaceae crops identified substantial variation in LRR domain retention, with some species exhibiting preferential associations between NB-ARC domains and specific LRR types [11]. These domain losses directly impact pathogen recognition capacity, compromising the plant's ability to detect effector proteins and initiate immune responses.

Table 1: Documented Cases of Domain Degeneration in Plant Species

Plant Species	NB-ARC Degradation	LRR Domain Loss	Functional Consequences	Citation
Vernicia fordii	Moderate	Complete loss of LRR1 and LRR4 domains	Increased susceptibility to Fusarium wilt	[57]
Dendrobium spp.	Extensive degeneration observed	Multiple instances of complete loss	Reduced pathogen recognition capacity	[9]
Capsicum annuum	Varied degradation patterns	200 of 252 NBS genes lacked LRR domains	Specialization in signaling rather than recognition	[24]
Fabaceae crops	Limited degradation	Preferential association with specific LRR types	Altered recognition specificities	[11]

Evolutionary Drivers of Degeneration

The "birth-and-death" evolutionary model governs NBS-LRR gene evolution, characterized by frequent gene duplication followed by differential retention or degeneration of copies [12] [56]. This process generates substantial variation in NBS-LRR repertoires between even closely related species, reflecting lineage-specific adaptations to pathogen pressures. Genomic architecture significantly influences degeneration patterns, with NBS-LRR genes typically organized in clusters prone to unequal crossing-over and gene conversion [12] [24].

Two distinct evolutionary patterns have been identified in NBS-LRR genes: Type I genes evolve rapidly with frequent gene conversion events, while Type II genes evolve slowly with rare gene conversion between clades [12]. This heterogeneous evolutionary rate creates differential susceptibility to degeneration, with rapidly evolving genes more prone to domain loss through recombination errors. Additionally, subfunctionalization and neofunctionalization following duplication events can preserve degenerated forms that acquire novel regulatory roles, such as serving as decoys or competitive inhibitors in immune signaling networks [56].

Figure 1: Evolutionary pathways leading to NB-ARC domain degradation and LRR domain loss following gene duplication events.

Empirical Evidence and Case Studies

Comparative Genomics in Vernicia Species

A compelling case study of domain degeneration emerges from comparative analysis of Fusarium wilt-resistant Vernicia montana and susceptible Vernicia fordii. Genome-wide identification of NBS-LRR genes revealed 149 candidates in resistant V. montana compared to only 90 in susceptible V. fordii [57]. Beyond quantitative differences, significant structural variations were observed, with V. fordii exhibiting complete absence of TIR domains and loss of specific LRR types (LRR1 and LRR4) retained in its resistant counterpart. These domain losses correlated directly with compromised disease resistance, highlighting the functional significance of structural integrity.

Chromosomal distribution analysis further revealed that NBS-LRR genes in both Vernicia species were distributed non-randomly, showing clustered arrangements indicative of tandem duplications [57]. However, susceptibility-associated species exhibited more frequent degeneration events within these clusters, suggesting that genomic architecture influences degeneration susceptibility. The orthologous gene pair Vf11G0978-Vm019719 exemplifies this pattern, with the V. fordii allele exhibiting downregulated expression while its V. montana ortholog demonstrated upregulated expression following pathogen challenge [57].

Domain Degeneration in Dendrobium Orchids

Comprehensive analysis of NBS genes across seven plant species, including three Dendrobium orchids, identified 655 NBS genes with extensive degeneration patterns [9]. Phylogenetic reconstruction of CNL-type proteins revealed significant degeneration in branches a and b, with Dendrobium NBS genes exhibiting two prominent characteristics: type changing and NB-ARC domain degeneration [9]. Notably, no TNL-type genes were identified in any orchid species, consistent with the absence of TIR domains in monocots and suggesting lineage-specific degeneration patterns.

In D. officinale, 22 NBS-LRR genes containing both NB-ARC and LRR domains were subjected to detailed structural analysis, revealing considerable variation in gene structure, conserved motifs, and cis-regulatory elements [9]. Salicylic acid treatment experiments identified six NBS-LRR genes with significantly upregulated expression, though only one (Dof020138) demonstrated extensive connectivity within immune signaling networks, suggesting functional divergence among non-degenerated copies.

Table 2: Domain Architecture Variation in Plant Species

Species	Total NBS Genes	NBS-LRR Genes	CNL	TNL	Degenerated Forms	Citation
Arabidopsis thaliana	210	~150	~100	~50	58 truncated proteins	[12]
Capsicum annuum	252	48	2	4	200 without CC/TIR	[24]
Vernicia montana	149	21	9	3	125 partial domains	[57]
Vernicia fordii	90	12	12	0	78 partial domains	[57]
Dendrobium officinale	74	22	10	0	52 partial domains	[9]

Functional Consequences of Paired NLR Systems

Recent research has revealed that some NLRs function not as singletons but as genetically linked pairs that coordinately confer disease resistance. The PmWR183 locus from wild emmer wheat encodes two adjacent NLR proteins (PmWR183-NLR1 and PmWR183-NLR2) that function cooperatively, with neither gene alone conferring resistance but co-expression restoring immunity [58]. This paired configuration creates additional vulnerability to degeneration, as disruption of either component completely abolishes resistance function.

Protein interaction assays demonstrated constitutive association between PmWR183-NLR1 and PmWR183-NLR2, supporting their cooperative role in immune signaling [58]. This interdependence means that degeneration events affecting one partner can disrupt the entire functional unit, representing a potential vulnerability in plant immune systems. Geographical and haplotype analyses revealed that this locus originates from wild emmer and is rare in cultivated wheat, with at least nine haplotypes exhibiting varying degrees of integrity and function [58].

Experimental Methodologies for Studying Domain Degeneration

Genomic Identification and Classification

Standardized protocols for genome-wide identification of NBS-LRR genes are essential for comparative analysis of domain degeneration. The following workflow represents current best practices:

Sequence Retrieval: Obtain complete genome assemblies from relevant databases (NCBI, Phytozome, Plaza) with comprehensive annotation [7] [57].
Domain Identification: Employ HMMER software with PfamScan.pl HMM search script using default e-value (1.1e-50) and background Pfam-A_hmm model to identify NB-ARC domains (PF00931) [7] [57]. Additional associated domains (TIR, CC, LRR) should be identified using Pfam and COILS databases [24].
Architecture Classification: Classify genes based on domain composition into standardized categories: N (NBS only), NL (NBS-LRR), CN (CC-NBS), TN (TIR-NBS), CNL (CC-NBS-LRR), TNL (TIR-NBS-LRR), RNL (RPW8-NBS-LRR) [7] [24].
Degeneration Assessment: Evaluate structural integrity through multiple sequence alignment of conserved motifs (P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, GLPL, MHD) and identification of truncations, insertions, or deletions disrupting domain architecture [55] [9].

Figure 2: Experimental workflow for genomic identification and classification of NBS-LRR genes and degeneration assessment.

Functional Validation Approaches

Once candidate degeneration events are identified, functional validation is essential to confirm biological significance:

Virus-Induced Gene Silencing (VIGS): VIGS provides an efficient approach for functional characterization of NBS-LRR genes. In V. montana, VIGS-mediated silencing of Vm019719 significantly compromised resistance to Fusarium wilt, validating its essential role in immunity [57]. Similarly, silencing of GaNBS in resistant cotton demonstrated its putative role in virus tittering [7]. Standard protocols typically employ Agrobacterium-mediated delivery of tobacco rattle virus (TRV) vectors containing 150-300bp gene-specific fragments.

Heterologous Expression and Biochemical Assays: For NB-ARC domain degradation analysis, biochemical characterization of nucleotide binding and hydrolysis capacity provides direct functional assessment. The NRC1 NB-ARC domain was successfully expressed in E. coli and Sf9 insect cells, purified via immobilised metal ion chromatography and size-exclusion chromatography, and demonstrated to co-purify with ADP [55]. Differential scanning fluorimetry and circular dichroism can assess structural integrity, while enzymatic assays quantify ATP hydrolysis activity.

Protein Interaction Studies: Co-immunoprecipitation and yeast two-hybrid assays determine whether domain degeneration affects protein-protein interactions critical for immune signaling. For paired NLR systems, these methods demonstrated constitutive association between PmWR183-NLR1 and PmWR183-NLR2 [58]. Similarly, the NB-ARC protein RLS1 was shown to function with the cysteine-rich receptor-like secreted protein RMC through direct interaction [59].

Expression and Regulation Analysis

Degeneration events may affect gene expression patterns independently of protein function:

Transcriptomic Profiling: RNA-seq analysis under pathogen challenge and hormone treatments (e.g., salicylic acid) identifies differentially expressed NBS-LRR genes. In D. officinale, SA treatment identified 1,677 differentially expressed genes, including six NBS-LRR genes with significant upregulation [9]. Weighted gene co-expression network analysis (WGCNA) can further connect NBS-LRR genes to specific immune pathways.

Promoter Analysis: Identification of cis-regulatory elements explains expression differences between functional and degenerated alleles. In Vernicia, the resistant Vm019719 promoter contained W-box elements activated by VmWRKY64, while its susceptible ortholog Vf11G0978 contained a deletion in this critical element [57]. This demonstrates how degeneration in regulatory regions can compromise immunity independently of coding sequence integrity.

Table 3: Research Reagent Solutions for Studying Domain Degeneration

Reagent/Resource	Function/Application	Specifications	Citation
HMMER with PfamScan	Domain identification	e-value 1.1e-50, Pfam-A_hmm model	[7] [57]
pOPIN expression vectors	Protein expression	N-terminal 6xHis tag or 6xHis-SUMO tag	[55]
TRV VIGS vectors	Functional validation	150-300bp gene-specific fragments	[7] [57]
OrthoFinder	Evolutionary analysis	DIAMOND for sequence similarity, MCL clustering	[7]
Sf9 insect cells	Protein expression	Baculovirus-mediated expression for difficult proteins	[55]

Domain degeneration in NBS-LRR genes represents a fundamental evolutionary process with significant implications for plant immunity and crop improvement. The patterns and mechanisms documented across diverse species reveal both conserved principles and lineage-specific peculiarities in how NB-ARC domains degrade and LRR domains are lost. These degeneration events directly impact plant health by compromising pathogen recognition and immune signaling capacity, as empirically demonstrated in multiple pathosystems.

Future research directions should prioritize integrating structural biology approaches to characterize degenerate domains at atomic resolution, developing high-throughput screening methods to assess functional consequences of degeneration events, and exploring genome editing applications to resurrect degenerated alleles in susceptible crop varieties. Additionally, investigating the potential adaptive benefits of certain degeneration events may reveal previously unrecognized regulatory functions beyond pathogen recognition.

The methodological framework presented here provides a comprehensive approach for identifying, validating, and characterizing domain degeneration in NBS-LRR genes. As genomic resources continue expanding across diverse plant species, applying these standardized approaches will enable systematic comparison of degeneration patterns and their functional consequences, ultimately informing strategies for enhancing disease resistance in agricultural systems through optimized domain architecture.

Annotation Complexities in Repetitive Regions and Fragmented Genes

The annotation of plant nucleotide-binding site (NBS) genes represents a significant challenge in genomics due to their residence in repetitive genomic regions and their frequent assembly into fragments. These complexities directly impact the accurate determination of domain architecture patterns, which is crucial for understanding plant immune system evolution and function. This technical guide examines the sources of these annotation difficulties, presents quantitative assessments of NBS gene diversity across species, details robust experimental and computational methodologies for overcoming these challenges, and provides visualization frameworks for interpreting results. Within the broader context of domain architecture research, resolving these complexities enables deeper insights into plant adaptation mechanisms and the development of crops with enhanced disease resistance.

Plant NBS-encoding genes constitute one of the largest and most variable gene families in plant genomes, playing critical roles in pathogen recognition and defense activation [7]. The NLR gene family (Nucleotide-binding Leucine-rich Repeat) has undergone remarkable expansion in flowering plants, with repertoire sizes ranging from approximately 25 in the bryophyte Physcomitrella patens to over two thousand in bread wheat (Triticum aestivum) [8]. This dramatic expansion occurs primarily through duplication events, resulting in genes that are frequently embedded in repetitive genomic contexts and exhibit extensive sequence diversity, creating fundamental challenges for accurate genome annotation and domain architecture determination.

The central importance of NBS genes in plant immunity necessitates precise annotation, as they encode key receptors for effector-triggered immunity [36]. Structurally, these genes typically contain three conserved domains: an N-terminal domain (TIR, CC, or RPW8), a central nucleotide-binding site (NBS) domain, and a C-terminal leucine-rich repeat (LRR) region [8]. However, the existence of numerous truncated variants lacking specific domains adds further complexity to annotation efforts [8]. Accurate structural annotation is prerequisite for functional characterization, making the resolution of annotation complexities in repetitive regions and fragmented genes a critical research priority in plant genomics.

Core Challenges in NBS Gene Annotation

Repetitive Regions and Their Impact

Repetitive elements constitute a substantial portion of plant genomes and present significant obstacles to accurate gene annotation. These regions occur in multiple copies throughout the genome, making assembly and annotation particularly challenging because "reads from these different repeats are very similar, and the assembly tools cannot distinguish between them" [60]. This often leads to mis-assemblies where distant genomic regions are incorrectly joined or, more commonly, results in a fragmented assembly where "assembly tools cannot determine the correct assembly of these regions and simply stop extending the contigs at the border of the repeats" [60].

For NBS genes specifically, their tendency to form clustered arrangements on chromosomes exacerbates these challenges. Adjacent NBS pairs separated by relatively few genes often display conserved orientations, suggesting recent duplication events [8]. The high sequence similarity among recently duplicated NBS genes makes resolution difficult during assembly, particularly with short-read technologies. Consequently, repetitive regions can lead to either collapsed representations of diverse NBS genes or false duplication artifacts in genome assemblies, fundamentally compromising downstream domain architecture analyses.

Gene fragmentation in genome assemblies arises from multiple sources, with significant implications for accurately determining complete domain architectures:

High heterozygosity: In diploid organisms, "sequence reads from homologous alleles can be too different to be assembled together and these alleles will then be assembled separately" [60]. For NBS genes, which often exhibit high allelic diversity, this results in either fragmented assemblies or erroneous separate assemblies of alleles as different genes.
Sequencing technology limitations: Technologies with short read lengths struggle to span repetitive elements within and surrounding NBS genes, resulting in truncated gene models [60]. Even long-read technologies may fail to resolve complex repeat structures, leading to assembly breaks that fragment single genes across multiple contigs.
Annotation pipeline limitations: Automated annotation tools may incorrectly predict start/stop codons or splice sites within repetitive regions, leading to truncated or partial gene models that miss critical domains, especially the highly variable LRR regions that are crucial for pathogen recognition specificity [7].

The combination of these factors results in incomplete representation of NBS genes in genome databases, with particular impact on the accurate characterization of rare structural variants and species-specific domain architectures.

Quantitative Landscape of NBS Gene Diversity

Comparative Analysis Across Species

Comprehensive surveys across land plants reveal extraordinary diversity in NBS gene content and composition. A recent study identified 12,820 NBS-domain-containing genes across 34 species spanning from mosses to monocots and dicots [7]. These genes displayed remarkable structural heterogeneity, distributed across 168 distinct classes with both classical and species-specific domain architecture patterns.

Table 1: NBS Gene Family Size Variation Across Plant Species

Species	Family/Group	NBS Gene Count	Notable Features
Asparagus setaceus	Wild asparagus relative	63	Expanded NLR repertoire
Asparagus kiusianus	Wild asparagus	47	Intermediate NLR count
Asparagus officinalis	Garden asparagus	27	Contracted NLR repertoire domestication
Triticum aestivum	Wheat (hexaploid)	>2,000	One of largest known repertoires
Oropetium thomaeum	Poaceae family	Several dozen	Compact NLR repertoire
Arabidopsis thaliana	Brassicaceae	~200	Moderate repertoire size

The quantitative analysis demonstrates a clear trend of NLR repertoire contraction through domestication processes, as evidenced in the Asparagus genus where "gene counts of 63, 47, and 27 NLR genes identified in A. setaceus, A. kiusianus, and A. officinalis, respectively" [8]. This pattern highlights the selective pressures acting on NBS gene content during crop evolution and the importance of accurate annotation for understanding these evolutionary dynamics.

Domain Architecture Diversity

The structural diversity of NBS genes extends beyond simple presence/absence to encompass complex domain architectures:

Table 2: Classification of NBS Domain Architecture Patterns

Architecture Class	Domain Composition	Prevalence	Functional Notes
TNL	TIR-NBS-LRR	Common in dicots	Toll/interleukin-1 receptor domain
CNL	CC-NBS-LRR	Ubiquitous	Coiled-coil domain
RNL	RPW8-NBS-LRR	Less common	RPW8 domain for signaling
NL	NBS-LRR	Variable	Lacking N-terminal domain
TN	TIR-NBS	Truncated variant	Missing LRR domain
Species-specific variants	e.g., TIR-NBS-TIR-Cupin_1	Rare	Novel architectures with potential specialized functions

The study by Hussain et al. (2024) discovered "several classical (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR, etc.) and species-specific structural patterns (TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, Sugar_tr-NBS etc.)" [7], demonstrating the extensive innovation in domain architecture within this gene family. This diversity presents particular annotation challenges, as non-canonical architectures may be misclassified or filtered out in automated annotation pipelines.

Methodological Framework: Annotation and Validation

Genome Annotation Protocol

Accurate genome annotation provides the foundation for NBS gene characterization. The following integrated protocol, adapted from current best practices, addresses the specific challenges of repetitive regions:

Step 1: Repetitive Element Masking

Construct species-specific repetitive elements using RepeatModeler [61]
Mask repetitive elements using RepeatMasker with RepBase libraries [61]
Rationale: "Repetitive elements are enriched throughout the genome. Such repetitive elements can cause non-specific gene hits during annotation. By masking repetitive elements, annotation tools can target gene encoding regions more easily" [61]

Step 2: Evidence-Based Annotation

Utilize the MAKER2 pipeline integrates ab initio gene predictions with experimental evidence [61]
Incorporate RNA-seq data from multiple tissues to provide transcriptomic evidence [61]
Include protein homology evidence from curated databases like UniProtKB/Swiss-Prot [61]

Step 3: Iterative Training

Train ab initio prediction tools like Augustus and SNAP using evidence-based gene models [61]
Perform multiple rounds of training to improve prediction accuracy [61]
Validate assembly and annotation completeness using BUSCO with embryophyta lineage datasets [8]

This comprehensive approach significantly improves the identification of genes within repetitive regions by combining multiple evidence types and specialized masking procedures.

Specific NBS Gene Identification Pipeline

For targeted identification of NBS genes, a specialized pipeline is required:

HMM-based identification: Perform Hidden Markov Model searches using the conserved NB-ARC domain (Pfam: PF00931) as query with stringent E-value cutoff (1e-50) [7] [8]
Homology-based complement: Conduct local BLASTp analyses against reference NLR proteins from model species (E-value ≤ 1e-10) [8]
Domain architecture validation: Validate candidate sequences through comprehensive domain analysis using InterProScan and NCBI's Batch CD-Search [8]
Classification: Categorize genes based on complete domain architecture using Pfam and PRGdb 4.0 databases [8]

This dual-approach methodology ensures comprehensive capture of both canonical and atypical NBS genes while maintaining stringent validation of domain content.

Experimental Validation Approaches

Computational predictions require experimental validation, particularly for genes in problematic genomic regions:

Transcriptomic Validation

Generate RNA-seq data from multiple tissues and stress conditions
Map reads to genome assembly using specialized aligners like STAR [61]
Quantify expression using kallisto to confirm transcriptional activity [61]
"The expression profiling presented the putative upregulation of OG2, OG6, and OG15 in different tissues under various biotic and abiotic stresses" [7]

Functional Validation via VIGS

Design specific constructs targeting candidate NBS genes
Apply Virus-Induced Gene Silencing (VIGS) in resistant plants
Challenge with pathogens and monitor for loss of resistance
"The silencing of GaNBS (OG2) in resistant cotton through virus-induced gene silencing (VIGS) demonstrated its putative role in virus tittering" [7]

Manual Curation

Utilize annotation tools like Apollo for manual inspection and correction of gene models [61]
Visualize genomic context and read mapping using IGV [61]

These validation steps are particularly crucial for verifying genes in repetitive regions, where automated annotation pipelines are most prone to errors.

Computational Toolkit and Workflow Visualization

Essential Bioinformatics Tools

Table 3: Computational Tools for NBS Gene Annotation and Analysis

Tool Category	Specific Tools	Function	Application Context
Genome Annotation	MAKER2, BRAKER2	Pipeline for gene annotation	Integrates multiple evidence types
Repetitive Element Identification	RepeatMasker, RepeatModeler	Identify and mask repetitive elements	Critical for reducing false positives
Domain Identification	HMMER, InterProScan, Pfam	Identify protein domains	Core NBS domain identification
Orthology Analysis	OrthoFinder, DIAMOND	Cluster genes into orthogroups	Evolutionary analysis of NBS genes
Expression Analysis	STAR, kallisto	Align RNA-seq and quantify expression	Experimental validation
Manual Curation	Apollo, IGV	Visualize and manually correct annotations	Essential for problematic regions

The selection of appropriate tools significantly impacts annotation quality, particularly for complex gene families. "Domain-based bioinformatics pipelines exploit conserved structural motifs and architectures such as nucleotide-binding site (NBS), leucine-rich repeats (LRRs), coiled-coil (CC), toll/interleukin-1 receptor (TIR)" [36] and should be selected based on the specific research objectives and genomic context.

Annotation Workflow Visualization

The following diagram illustrates the integrated computational and experimental workflow for accurate NBS gene annotation:

Figure 1: Comprehensive Workflow for NBS Gene Annotation in Repetitive Regions

Research Reagent Solutions

Table 4: Essential Research Reagents for NBS Gene Characterization

Reagent Type	Specific Examples	Function/Application	Technical Notes
Reference Databases	Pfam (PF00931), PRGdb 4.0, UniProtKB/Swiss-Prot	Domain identification and classification	Curated databases essential for accurate domain annotation
Genomic Resources	BUSCO (embryophyta_odb10), RepBase	Assembly and annotation quality assessment	Provides evolutionary context and quality metrics
Software Pipelines	OrthoFinder, MEME suite, PlantCARE	Evolutionary analysis, motif discovery, promoter analysis	Enables comprehensive comparative genomics
Experimental Validation Tools	VIGS constructs, pathogen strains (e.g., Phomopsis asparagi), RNA-seq libraries	Functional characterization of NBS genes	Required for establishing genotype-phenotype relationships
Genomic Materials	Inbred lines for sequencing, multiple tissue types for RNA extraction	Reducing heterozygosity, comprehensive transcriptome profiling	"It is better to sequence haploid tissues" to reduce assembly complexity [60]

The annotation of NBS genes in repetitive regions and the correct assembly of fragmented genes remain significant challenges in plant genomics, with direct implications for understanding domain architecture patterns and their evolution in plant immunity. The complexities inherent to these genomic regions require integrated approaches combining advanced computational methods with experimental validation. As sequencing technologies continue to evolve, particularly with emerging long-read technologies that better span repetitive elements, and as bioinformatics tools become more sophisticated in handling complex gene families, the resolution of these annotation challenges will accelerate. This will enable more accurate comparative genomic studies, facilitate the identification of novel resistance gene candidates, and support targeted breeding efforts for crop improvement. The methodological framework presented here provides a foundation for addressing these persistent challenges while highlighting the need for continued development of specialized tools for complex plant gene families.

The nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family constitutes the largest and most critical class of plant disease resistance (R) genes, enabling plants to recognize pathogens and activate defense responses. However, the remarkable diversity and rapid evolution of these genes often result in low sequence homology between related species, presenting significant challenges for their comprehensive identification in newly sequenced genomes. This technical guide synthesizes current methodologies to address this limitation, framing solutions within the broader context of domain architecture patterns in plant NBS gene research. We present integrated bioinformatics strategies that leverage comparative genomics, machine learning, and functional validation to overcome homology barriers, providing researchers with a robust framework for accurate NBS gene prediction and characterization.

NBS-encoding genes represent one of the largest and most variable gene families in plant genomes, with their protein products playing essential roles in effector-triggered immunity (ETI). During plant-pathogen co-evolution, these genes have developed extraordinary diversity through various mechanisms, including whole-genome duplication (WGD), tandem duplication, and positive selection [62]. This rapid evolution results in substantial sequence divergence, creating a fundamental challenge for traditional homology-based prediction methods that rely on significant sequence similarity.

Recent studies across diverse plant taxa have revealed striking variations in NBS gene content and architecture. For instance, genome-wide analyses have identified 12,820 NBS-domain-containing genes across 34 species from mosses to monocots and dicots, classified into 168 distinct domain architecture patterns [7]. This architectural diversity, while biologically informative, further complicates computational identification, as standard models trained on one lineage may perform poorly when applied to distantly related species.

This whitepaper provides an in-depth technical framework for overcoming these challenges, emphasizing integrative approaches that combine multiple evidence types to achieve comprehensive NBS gene annotation in novel plant genomes.

Domain Architecture Diversity in NBS Genes

Classical and Species-Specific Architectural Patterns

The domain architecture of NBS genes provides critical insights into their evolutionary history and potential functional specialization. While classical architectures like NBS, NBS-LRR, TIR-NBS, and TIR-NBS-LRR are widely distributed, numerous species-specific structural patterns have emerged through extensive comparative analyses.

Table 1: Major Domain Architecture Classes in Plant NBS Genes

Architecture Class	Domain Composition	Phylogenetic Distribution	Functional Role
CNL	CC-NBS-LRR	Universal in angiosperms	Pathogen detection
TNL	TIR-NBS-LRR	Primarily dicots	Pathogen detection
RNL	RPW8-NBS-LRR	Universal in angiosperms	Signaling helper
NL	NBS-LRR	Universal	Pathogen detection
CN	CC-NBS	Universal	Regulatory/Adaptor
TN	TIR-NBS	Primarily dicots	Regulatory/Adaptor
N	NBS	Universal	Regulatory/Adaptor

Recent research has uncovered remarkable architectural diversity, including unconventional patterns such as TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, and Sugar_tr-NBS [7]. These atypical configurations highlight the functional innovation within this gene family and underscore the necessity of domain-based rather than sequence-based identification approaches.

In Fabaceae crops, studies have revealed a preferential co-occurrence of the NB-ARC domain with a specific LRR domain (IPR001611), with classification of identified proteins into seven distinct classes (N, L, CN, TN, NL, CNL, and TNL) showing species-specific clustering within the CN, TN, and CNL classes [11]. This species-specific patterning reflects diversification within plant families and must be accounted for in prediction pipelines.

Evolutionary Patterns Influencing Domain Architecture

The evolutionary history of NBS genes is characterized by repeated cycles of expansion and contraction, with significant variation observed between plant lineages:

In Ipomoea species, the distribution of NBS-encoding genes among chromosomes is non-random and uneven, with 83.13-90.37% of genes occurring in clusters [63].
Brassica species demonstrate how whole genome triplication events are followed by extensive gene loss, with subsequent species-specific gene amplification through tandem duplication [64].
Orchids, particularly Dendrobium species, exhibit significant degeneration of NBS-LRR genes, with type changing and NB-ARC domain degeneration as common evolutionary patterns [9].
Studies in Nicotiana benthamiana identified 156 NBS-LRR homologs representing only 0.25% of annotated genes, with irregular-type NBS-LRR genes lacking LRR domains constituting a substantial portion (66%) of the family [25].

These evolutionary dynamics directly impact domain architecture and must inform the development of prediction strategies for novel genomes.

Integrated Strategies for Overcoming Low Homology

Advanced Bioinformatics Workflows

Table 2: Core Bioinformatics Tools for NBS Gene Identification

Tool Category	Specific Tools	Application	Key Parameters
Domain Search	HMMER, PfamScan, InterProScan	Identifying NBS domains	E-value < 1e-20 for HMMER; Trusted cutoff for Pfam
Motif Discovery	MEME, MAST	Conserved motif identification	Motif count: 10; Width: 6-50 amino acids
Orthology Analysis	OrthoFinder, MCScanX	Identifying homologous groups	E-value: 1e-5; Inflation parameter: 1.5
Synteny Analysis	MCScanX, DiagHunter	Conserved genomic context	E-value: 1e-10; Minimum aligned blocks: 5
Selection Pressure	PAML, KaKs_Calculator	Evolutionary analysis	NG method for Ka/Ks calculation

Figure 1: Integrated workflow for NBS gene identification in novel genomes, combining computational prediction with experimental validation.

Leveraging Domain Architecture Patterns

The strategic exploitation of domain architecture patterns represents a powerful approach to overcome limitations imposed by low sequence homology:

Architecture-Based Hidden Markov Models (HMMs) Developing subfamily-specific HMM profiles for different domain architectures significantly enhances prediction sensitivity. For example, constructing separate HMMs for CNL, TNL, RNL, and truncated variants (CN, TN, N) allows detection of genes that would be missed by a single comprehensive model [7] [25]. This approach proved particularly valuable in Nicotiana benthamiana, where it enabled identification of 156 NBS-LRR homologs comprising 5 TNL-type, 25 CNL-type, 23 NL-type, 2 TN-type, 41 CN-type, and 60 N-type proteins [25].

Cross-Species Transcriptome Integration The Gramene pipeline demonstrates how leveraging transcriptional evidence across related species can overcome limitations in species-specific data [65]. This approach uses:

DNA-to-DNA alignment for species-specific FLcDNAs and ESTs
Translated DNA-to-translated DNA alignment for cross-species FLcDNAs and ESTs
Protein-to-translated DNA alignment for protein sequences This multi-tiered strategy maintains high sensitivity even when working with evolutionarily distant reference data.

Orthogroup-Centric Analysis Identifying orthogroups across multiple species provides evolutionary context that facilitates NBS gene discovery. Research has revealed 603 orthogroups with some core (most common orthogroups; OG0, OG1, OG2, etc.) and unique (highly specific to species; OG80, OG82, etc.) orthogroups with tandem duplications [7]. Expression profiling has demonstrated putative upregulation of specific orthogroups (OG2, OG6, and OG15) in different tissues under various biotic and abiotic stresses, highlighting their functional importance [7].

Experimental Protocols for Validation

Transcriptional Response Profiling

Validating the functional relevance of predicted NBS genes requires assessing their expression patterns under pathogen challenge:

Protocol: Differential Expression Analysis

Experimental Design: Collect tissue from resistant and susceptible cultivars under control conditions and at multiple timepoints post-pathogen inoculation [7] [63]
RNA Sequencing: Perform paired-end sequencing (minimum 30M reads per sample) with appropriate biological replicates
Bioinformatic Processing:
- Quality control (FastQC)
- Read alignment (HISAT2/STAR)
- Expression quantification (featureCounts)
- Differential expression (DESeq2 edgeR)
Validation: Confirm expression patterns of selected candidate genes via qRT-PCR with reference genes

In sweet potato, this approach identified 11 differentially expressed genes (DEGs) in response to stem nematodes and 19 DEGs for Ceratocystis fimbriata pathogen challenge [63]. Similarly, in Dendrobium officinale, transcriptome analysis under salicylic acid treatment identified 1,677 DEGs, including six significantly up-regulated NBS-LRR genes [9].

Functional Validation via Gene Silencing

Protocol: Virus-Induced Gene Silencing (VIGS)

Vector Construction: Clone 200-300 bp gene-specific fragment into TRV-based VIGS vector
Plant Infiltration: Infiltrate 2-3 leaf stage seedlings with Agrobacterium carrying VIGS construct
Challenge Assay: Inoculate silenced plants with target pathogen 2-3 weeks post-VIGS
Phenotypic Assessment: Document disease symptoms and measure pathogen biomass
Molecular Confirmation: Verify gene silencing via qRT-PCR and assess downstream defense markers

This approach successfully validated the role of GaNBS (OG2) in virus resistance in cotton, demonstrating its putative role in virus titer control [7].

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagent Solutions for NBS Gene Studies

Reagent/Tool	Function	Application Example	Key Features
HMMER Suite	Domain identification	Finding NBS domains in novel genomes	Probabilistic models; E-value scoring
OrthoFinder	Orthogroup inference	Identifying conserved NBS genes across species	Species-aware algorithm; Scalable
MEME Suite	Motif discovery	Finding conserved motifs in NBS subfamilies	Expectation maximization; E-value threshold
DESeq2	Differential expression	Identifying pathogen-responsive NBS genes	Negative binomial distribution; Multiple testing correction
TRV VIGS Vectors	Functional validation	Testing NBS gene function in disease resistance	Efficient silencing; Heritable effect
PlantCARE Database	cis-element prediction	Identifying regulatory elements in NBS promoters	Comprehensive plant-specific database

Case Study: NBS Gene Identification in Sugarcane

A comprehensive study in sugarcane illustrates the effective application of these strategies. Researchers identified NBS-LRR genes at a genome-wide level across 23 plant species, with focused analysis on four monocotyledonous grass species: Saccharum spontaneum, Saccharum officinarum, Sorghum bicolor, and Miscanthus sinensis [62]. The methodology incorporated:

Comparative Genomics: Identification of NBS-LRR genes across multiple related species to establish evolutionary patterns
Transcriptome Integration: Analysis of expression data from multiple sugarcane diseases revealed that more differentially expressed NBS-LRR genes were derived from S. spontaneum than from S. officinarum in modern sugarcane cultivars
Allele-Specific Expression: Observation of allele-specific expression of seven NBS-LRR genes under leaf scald infection
Database Development: Construction of a plant NBS-LRR gene database to facilitate subsequent analysis

This integrated approach revealed that whole genome duplication, rather than genome size or total gene count, primarily determines NBS-LRR gene number in sugarcane. Furthermore, it demonstrated a progressive trend of positive selection on NBS-LRR genes and identified 125 NBS-LRR genes responding to multiple diseases [62].

Overcoming the challenge of low homology in NBS gene prediction requires a multifaceted approach that prioritizes domain architecture patterns over simple sequence similarity. By integrating advanced bioinformatics tools with comparative genomics and experimental validation, researchers can achieve comprehensive annotation of this critical gene family in newly sequenced plant genomes.

Future advancements will likely come from several directions:

Machine Learning Applications: Deep learning models trained on diverse domain architectures may improve prediction accuracy
Pan-Genome Analyses: Comprehensive comparisons across multiple individuals of a species will capture NBS gene diversity more completely
Single-Cell Transcriptomics: Resolution of NBS gene expression at cellular levels will provide unprecedented functional insights
Protein Structure Prediction: Advanced folding algorithms like AlphaFold may reveal functional relationships obscured by sequence divergence

As these methodologies mature, they will further empower researchers to decipher the complex evolutionary dynamics of plant immune genes and accelerate the development of disease-resistant crop varieties through molecular breeding programs.

Resolving TIR Domain Absence in Monocots and Its Functional Implications

The nucleotide-binding site leucine-rich repeat (NBS-LRR) gene family represents one of the largest and most critical classes of plant disease resistance (R) genes. These proteins are modular intracellular immune receptors, typically consisting of a variable N-terminal domain, a central nucleotide-binding site (NBS) domain, and C-terminal leucine-rich repeats (LRRs). A fundamental phylogenetic divide exists within this family between Toll/interleukin-1 receptor (TIR) domain-containing (TNL) and coiled-coil (CC) domain-containing (CNL) proteins. Strikingly, TNL genes are predominantly absent from monocot genomes, a distribution pattern with significant functional consequences for their immune signaling pathways. This whitepaper synthesizes current genomic, evolutionary, and molecular evidence to resolve the pattern of TIR domain absence in monocots and explores the implications for disease resistance mechanisms and crop improvement strategies.

Plant NBS-LRR proteins function as key sensors in the effector-triggered immunity (ETI) system, detecting pathogen effector molecules and initiating robust defense responses [66] [1]. Their domain architecture follows a characteristic tripartite structure:

N-terminal domain: Typically a TIR or CC domain, involved in signaling and protein-protein interactions
Central NBS domain: Contains conserved motifs (P-loop, kinase-2, RNBS-A-D) essential for nucleotide binding and acting as a molecular switch
C-terminal LRR domain: Undergoes diversifying selection and mediates protein-protein interactions, often determining recognition specificity [67] [1]

The N-terminal domain fundamentally classifies NBS-LRR proteins into two major subfamilies: TNLs (TIR-NBS-LRR) and CNLs (CC-NBS-LRR). This classification is not merely structural but reflects deep evolutionary divergence with profound functional consequences, including distinct signaling pathways and downstream partners [1]. The puzzling absence of TNLs in monocots, despite their presence in dicots, gymnosperms, and even bryophytes, represents a significant evolutionary anomaly with important functional implications for plant immunity across major crop species.

Evolutionary History and Distribution of TIR Domains in Plants

Genomic Distribution Across Plant Lineages

Comparative genomic analyses reveal a complex evolutionary history of TIR-NBS-LRR genes across the plant kingdom. Evidence indicates that TIR domains and TNL genes were present in early land plants but have been selectively lost in specific lineages.

Table 1: Distribution of NBS-LRR Genes in Selected Plant Genomes

Plant Species	Common Name	Total NLRs	TNLs	CNLs	XNLs*	References
Arabidopsis thaliana	Thale cress	151	94	55	0	[66]
Vitis vinifera	Wine grape	459	97	215	147	[66]
Medicago truncatula	Barrel medic	270	118	152	0	[66]
Oryza sativa	Rice	458	0	274	182	[66]
Zea mays	Maize	95	0	71	23	[66]
Brachypodium distachyon	Brachypodium	212	0	145	60	[66]
Physcomitrella patens	Moss	25	8	9	8	[66]
Selaginella moellendorffii	Spike moss	2	0	NA	NA	[66]

XNLs: NLRs with N-terminal domains other than TIR or CC

The near-total absence of TNL genes in monocots is particularly striking when compared to their abundance in dicot species. Research covering five monocot orders (Poales, Zingiberales, Arecales, Asparagales, and Alismatales) has consistently failed to identify canonical TNL sequences [68]. This distribution pattern suggests that TIR-NBS-LRR sequences, though present in early land plants, have been significantly reduced or lost in monocots and magnoliids [68].

Evolutionary Timeline and Hypotheses

Phylogenetic evidence indicates that TNL genes were present in early land plant ancestors but lost in the monocot lineage. Several hypotheses may explain this evolutionary loss:

Selective disadvantage: TNL-specific signaling components or pathways may have conferred fitness costs in environments where monocots diversified
Genomic reorganization: Large-scale genomic rearrangements in the monocot lineage may have facilitated the loss of TNL clusters
Functional replacement: CNLs and other immune receptors may have expanded to compensate for TNL loss
Energy conservation: TNLs may have imposed metabolic costs that selected for their elimination in specific lineages

The presence of TNLs in basal angiosperms like Amborella trichopoda and Nuphar advena, but their absence in monocots, suggests that the loss occurred after the divergence of monocots from other angiosperms [68]. This evolutionary history has fundamentally shaped the immune signaling apparatus of major cereal crops, including rice, maize, wheat, and sorghum.

Functional Implications of TIR Domain Absence in Monocots

Alternative Signaling Pathways

The absence of TNLs in monocots has profound implications for their immune signaling architecture. In dicots, TNLs typically require the function of EDS1 (ENHANCED DISEASE SUSCEPTIBILITY1) and PAD4 (PHYTOALEXIN DEFICIENT4) for signaling, whereas CNLs often require NDR1 (NON-RACE-SPECIFIC DISEASE RESISTANCE1) [69]. Without TNLs, monocots have necessarily developed alternative signaling networks centered around CNL-mediated immunity.

Recent research has revealed that TIR domains function as NAD+ hydrolases, cleaving NAD+ to produce various nucleotides including cyclic ADP-ribose (cADPR) variants [70]. These nucleotide products serve as secondary messengers that activate downstream immune signaling. Specifically, 2′cADPR generated by TIR domains is converted into pRib-AMP/ADP, which binds to EDS1-PAD4 heterodimers, facilitating the formation of the EDS1-PAD4-ADR1 (EPA) heterotrimeric complex and triggering immune responses [70]. The absence of this entire signaling module in monocots necessitates alternative mechanisms for immune activation.

Hormonal Interactions and Immune Cross-Talk

The absence of TNLs in monocots also affects hormonal cross-talk in immune responses. In dicots, abscisic acid (ABA) has been shown to negatively regulate R gene-mediated resistance, with ABA deficiency promoting nuclear accumulation of R proteins like SNC1 and RPS4, which is essential for their function [69]. This intersection between ABA signaling and R protein localization represents a significant point of divergence between monocots and dicots, as the specific TNL-related components of this regulation would necessarily differ.

Structural and Functional Compensation

Monocots have likely evolved compensatory mechanisms to offset the loss of TNLs:

Expansion of CNL subfamilies: Comparative genomics shows significant expansion of CNL and XNL (NLRs with other N-terminal domains) genes in monocots
Diversification of non-TIR signaling pathways: Monocots may have enhanced or diversified CNL-mediated signaling pathways
Alternative domain architectures: Monocots possess NLRs with N-terminal domains other than TIR or CC (classified as XNLs), which may perform functions analogous to TNLs in dicots

Table 2: Functional Specialization of NBS-LRR Subfamilies in Plants

Feature	TNLs (TIR-NBS-LRR)	CNLs (CC-NBS-LRR)
Distribution	Dicots, gymnosperms, bryophytes	Monocots, dicots, bryophytes
Signaling Components	EDS1, PAD4 required	NDR1 often required
Biochemical Function	NAD+ hydrolase activity producing signaling nucleotides	Diverse functions; some with kinase activity
Downstream Pathways	EPA complex formation	Activation of MAPK cascades
Hormonal Regulation	Antagonized by ABA	Variable regulation by ABA
Temperature Sensitivity	Often temperature-sensitive	Variable temperature sensitivity

Experimental Approaches for Studying TIR Domain Evolution and Function

Degenerate PCR for NBS Gene Discovery

Purpose: To identify and characterize NBS-encoding genes across diverse plant species, particularly non-model organisms without complete genome sequences.

Methodology:

Primer Design: Degenerate primers targeting conserved NBS domain motifs (P-loop, kinase-2, GLPL)
DNA Extraction: High-quality genomic DNA from target plant species
PCR Amplification: Using degenerate primers under optimized cycling conditions
Cloning and Sequencing: PCR products cloned and sequenced to identify unique NBS sequences
Sequence Analysis: Classification into TIR or non-TIR based on conserved motifs, especially the final residue of the kinase-2 domain (aspartic acid in TIR, tryptophan in non-TIR) [68]

Key Considerations:

Multiple primer sets (TIR-specific, non-TIR-specific, general) enhance coverage
Expected fragment size: 500-600 bp covering portion of NBS domain
Phylogenetic analysis to determine evolutionary relationships

Genome-Wide Identification and Phylogenetic Analysis

Purpose: Comprehensive cataloging of NLR genes in sequenced genomes to understand evolutionary patterns.

Methodology:

Sequence Retrieval: Collect annotated NLR genes from genomic databases
Domain Analysis: Identify NB-ARC domain using HMMER/Pfam scans (PF00931)
Classification: Categorize into TNL, CNL, or XNL based on N-terminal domains
Multiple Sequence Alignment: Using MAFFT or ClustalOmega
Phylogenetic Reconstruction: Maximum likelihood or Bayesian methods to infer evolutionary relationships
Orthogroup Analysis: Identify conserved and lineage-specific NLR clusters across species [7]

Applications: This approach revealed the absence of TNLs in monocots and the expansion of specific CNL clades in cereal crops.

Functional Validation Through Virus-Induced Gene Silencing (VIGS)

Purpose: To determine the functional role of specific NBS genes in plant immunity.

Methodology:

Gene Selection: Target candidate NBS genes identified through genomic analyses
Vector Construction: Insert gene-specific fragment into VIGS vector (e.g., TRV-based vectors)
Plant Inoculation: Agroinfiltration of VIGS construct into seedlings
Phenotypic Assessment: Challenge with pathogens or chemicals to evaluate resistance/susceptibility
Molecular Verification: qRT-PCR to confirm gene silencing, biomarker analysis [7]

Case Example: Silencing of GaNBS (OG2) in resistant cotton demonstrated its role in defense against cotton leaf curl virus [7].

Research Reagent Solutions for NBS-LRR Studies

Table 3: Essential Research Reagents for Investigating Plant NBS-LRR Genes

Reagent/Category	Specific Examples	Function/Application
PCR & Cloning	Degenerate primers for NBS domains	Amplification of NBS sequences from diverse species
	TIR-specific primers (targeting RNBS-A-TIR)	Selective amplification of TIR-type NBS sequences
	Non-TIR-specific primers (targeting RNBS-A-nonTIR)	Selective amplification of non-TIR-type NBS sequences
Expression Vectors	pTRV1/pTRV2 (VIGS vectors)	Functional validation through gene silencing
	Gateway-compatible binary vectors	Protein expression and localization studies
Antibodies & Tags	Anti-GFP/HA/FLAG antibodies	Protein detection and localization
	Nuclear localization signal tags	Studying subcellular localization of NBS-LRR proteins
Chemical Reagents	Abscisic Acid (ABA)	Hormonal signaling studies
	Organophosphate pesticides (e.g., fenitrothion)	Inducing chemical sensitivity responses
	NAD+ and analogs	TIR enzymatic activity assays
Pathogen Strains	Pseudomonas syringae strains	Bacterial pathogen challenge assays
	Fusarium graminearum	Fungal pathogen assays

Visualization of NBS-LRR Evolution and Signaling

Evolutionary History of Plant NLR Genes

TIR Domain-Mediated Signaling in Dicots

The absence of TIR domains in monocots represents a significant evolutionary divergence with profound functional implications for plant immunity. Genomic evidence confirms that TNLs, present in early land plants and abundant in dicots, were lost in the monocot lineage, potentially due to selective pressures or genomic reorganization events. This loss has driven the expansion and diversification of CNL genes and alternative signaling pathways in monocots.

Understanding this evolutionary history provides crucial insights for crop improvement strategies. Future research should focus on:

Elucidating compensatory mechanisms in monocot immune signaling networks
Engineering novel resistance specificities by transferring functional TNL genes across phylogenetic boundaries
Exploiting conserved signaling modules for broad-spectrum disease resistance
Investigating the metabolic costs of different NLR types and their impact on plant fitness

The functional conservation of NLR-mediated immunity across plant taxa, despite divergent domain architectures, offers promising avenues for enhancing disease resistance in economically important monocot crops through comparative genomics and interdisciplinary approaches.

The nucleotide-binding site leucine-rich repeat (NBS-LRR) gene family represents the largest and most versatile class of plant resistance (R) proteins, forming a critical component of the plant immune system through effector-triggered immunity (ETI) [16]. These intracellular receptors recognize pathogen-secreted effectors either directly or indirectly, initiating robust defense signaling cascades that frequently culminate in hypersensitive response (HR) and programmed cell death to restrict pathogen spread [16] [36]. The structural architecture of NBS-LRR proteins features a conserved nucleotide-binding site (NBS) domain that binds and hydrolyzes ATP for immune signaling activation, coupled with a C-terminal leucine-rich repeat (LRR) domain responsible for pathogen recognition [16] [36]. Based on N-terminal domain variations, NBS-LRR proteins are classified into major subfamilies: TIR-NBS-LRR (TNL) with Toll/interleukin-1 receptor domains, CC-NBS-LRR (CNL) with coiled-coil domains, and RNL with resistance to powdery mildew 8 domains [16] [25].

Recent genomic studies have revealed striking variation in NBS-LRR family composition across plant species. For instance, comprehensive genome-wide analyses identified 196 NBS-LRR genes in the medicinal plant Salvia miltiorrhiza, with only 62 possessing complete N-terminal and LRR domains [16]. Research in Nicotiana benthamiana revealed 156 NBS-LRR homologs distributed across different subfamilies [25], while studies in three Nicotiana genomes identified 1,226 NBS genes total, with approximately 45.5% containing only the NBS domain [50]. This extensive diversity in domain architecture presents both challenges and opportunities for optimizing functional studies of these crucial immune receptors.

Table 1: NBS-LRR Family Distribution Across Plant Species

Plant Species	Total NBS-LRR Genes	CNL Subfamily	TNL Subfamily	RNL Subfamily	Atypical Members
Salvia miltiorrhiza	196	61	2	1	132
Nicotiana benthamiana	156	25	5	4	122
Nicotiana tabacum	603	Not specified	Not specified	Not specified	Not specified
Arabidopsis thaliana	207	Not specified	Not specified	Not specified	Not specified
Oryza sativa (rice)	505	Not specified	Not specified	Not specified	Not specified

Strategic Approaches for NBS-LRR Gene Identification and Prioritization

Expression-Based Functional Screening

Traditional NLR characterization assumed these immune receptors required tight transcriptional regulation to prevent autoimmunity. However, groundbreaking research demonstrates that functional NLRs consistently exhibit high steady-state expression levels in uninfected plants across both monocot and dicot species [71]. This expression signature provides a powerful filter for prioritizing candidates from large gene families. In proof-of-concept research, scientists exploited this signature by generating a wheat transgenic array of 995 NLRs from diverse grass species, successfully identifying 31 new resistance genes (19 against stem rust, 12 against leaf rust) through large-scale phenotyping [71].

The barley NLR Mla7 exemplifies the critical relationship between expression threshold and function. Transgenic studies revealed that single-copy insertions of Mla7 failed to confer resistance, while higher-order copies (2-4 copies) were required for full resistance to Blumeria hordei and stripe rust, indicating that sufficient expression levels are necessary for functionality [71]. This principle enables researchers to prioritize NBS-LRR candidates based on expression data, significantly accelerating the discovery of functional immune receptors.

Genomic Identification and Classification Pipelines

Robust bioinformatic pipelines form the foundation of NBS-LRR characterization. The standard workflow begins with Hidden Markov Model (HMM) searches using the NB-ARC domain profile (PF00931) from the Pfam database against target genomes or transcriptomes [25] [50]. Following initial identification, domain architecture must be systematically characterized using tools like InterProScan, SMART, and the NCBI Conserved Domain Database to identify TIR, CC, RPW8, and LRR domains [25] [50]. Phylogenetic analysis then classifies candidates into subfamilies and informs functional hypotheses based on clustering with characterized NLRs [16] [25].

Table 2: Bioinformatics Tools for NBS-LRR Identification and Analysis

Tool Category	Specific Tools	Function	Key Parameters
Domain Identification	HMMER v3.1b2, InterProScan, SMART, NCBI CDD	Identify NBS, TIR, CC, LRR domains	E-value < 1*10^-20 for HMMER
Motif Analysis	MEME Suite	Discover conserved protein motifs	Motif count: 10, Width: 6-50 aa
Phylogenetic Analysis	MUSCLE, MEGA11	Construct evolutionary relationships	Bootstrap: 1000 replicates
Selection Pressure	KaKs_Calculator 2.0	Calculate Ka/Ks ratios	Model: Nei-Gojobori
Expression Analysis	Cufflinks, Cuffdiff	Quantify expression and identify DEGs	FPKM normalization

Diagram 1: NBS-LRR Gene Identification and Prioritization Workflow. This flowchart outlines the bioinformatics pipeline for identifying and prioritizing NBS-LRR genes for functional studies, emphasizing the key filtering steps from initial discovery to experimental validation.

Gene Silencing Methodologies for NBS-LRR Functional Analysis

Virus-Induced Gene Silencing (VIGS) Protocols

Virus-induced gene silencing (VIGS) has emerged as a powerful reverse genetics tool for rapid functional analysis of NBS-LRR genes in plants. This method is particularly valuable for species with challenging transformation systems or for high-throughput functional screening. The tobacco rattle virus (TRV)-based VIGS system represents the most widely adopted platform, especially in Nicotiana species, which serve as model plants for plant-pathogen interactions [25].

A standardized VIGS protocol begins with the identification of a unique 150-300 bp gene-specific fragment from the target NBS-LRR sequence, which is then cloned into TRV-derived vectors (TRV1 and TRV2). For NBS-LRR genes, special attention must be paid to selecting fragments with minimal sequence similarity to other NLR family members to ensure target specificity. Agrobacterium tumefaciens strains GV3101 or LBA4404 harboring the TRV vectors are then cultured overnight in Luria-Bertani medium with appropriate antibiotics, harvested, and resuspended in infiltration buffer (10 mM MES, 10 mM MgCl₂, 200 μM acetosyringone, pH 5.6) to an OD₆₀₀ of 1.0-2.0. Equal volumes of TRV1 and TRV2 cultures are mixed and infiltrated into 2-4 week-old plant leaves using a needleless syringe. Silencing efficiency is typically assessed 2-4 weeks post-infiltration through quantitative RT-PCR, with phenotypic analyses conducted following pathogen inoculation [25].

RNA Interference and MicroRNA Regulation

Beyond VIGS, plants have evolved endogenous regulatory networks that target NBS-LRR genes, providing both mechanistic insights and methodological opportunities. The microRNA miR482 represents a key post-transcriptional regulator of NBS-LRR genes in numerous plant species. In apple, miR482 expression is dynamically regulated in response to Alternaria alternata infection, leading to the cleavage of NBS-LRR transcripts and production of phased secondary siRNAs (phasiRNAs) that amplify the silencing effect [72].

This natural regulatory mechanism can be exploited experimentally through artificial microRNA (amiRNA) technology. The design process involves substituting the mature miRNA sequence in a native miRNA precursor (typically miR319a or miR164b) with a 21-nt sequence complementary to the target NBS-LRR gene while maintaining the precursor's secondary structure. The modified precursor is then cloned under the control of a constitutive (35S) or inducible promoter and transformed into plants via Agrobacterium-mediated transformation. This approach offers superior specificity compared to traditional hairpin RNAi constructs, particularly important for distinguishing among closely related NBS-LRR family members [72].

Protein Interaction Assays for Elucidating NBS-LRR Signaling Mechanisms

Yeast Two-Hybrid Systems for Direct Interactions

Yeast two-hybrid (Y2H) analysis provides a powerful platform for identifying direct protein-protein interactions involving NBS-LRR proteins and their pathogen effectors or host partners. The case of the wheat Ym1 protein exemplifies a well-executed Y2H strategy. Ym1, a CC-NBS-LRR protein that confers resistance to wheat yellow mosaic virus (WYMV), was demonstrated to specifically interact with the WYMV coat protein (CP) through Y2H analysis [52].

A detailed Y2H protocol for NBS-LRR proteins involves amplifying coding sequences without stop codons and cloning them into both bait (DNA-binding domain, e.g., pGBKT7) and prey (activation domain, e.g., pGADT7) vectors. For full-length NBS-LRR proteins that may autoactivate or exhibit toxicity in yeast, consider using domain-specific constructs (e.g., CC, NBS, or LRR domains individually). Co-transform bait and prey plasmids into yeast strains (e.g., Y2HGold or AH109) using the lithium acetate/polyethylene glycol method and plate on appropriate dropout media (-Leu/-Trp) to select for transformants. Protein interaction is assessed by growth on stringent dropout media (-Leu/-Trp/-His/-Ade) supplemented with X-α-Gal for colorimetric detection. Critical controls include testing each construct against empty vector counterparts and verifying expression through western blotting [52].

In Planta Protein Interaction Assays

While Y2H identifies direct interactions, in planta assays provide critical validation in a more native biological context. Bimolecular fluorescence complementation (BiFC) represents a particularly valuable technique for visualizing transient NBS-LRR interactions in living plant cells. The Ym1-WYMV CP interaction demonstrated through Y2H was further confirmed using BiFC, which also revealed the nucleocytoplasmic redistribution of Ym1 upon CP interaction—a key process in its activation mechanism [52].

For BiFC assays, full-length or domain-specific NBS-LRR coding sequences are fused to either the N-terminal (YN) or C-terminal (YC) fragments of fluorescent proteins (typically YFP or its variants) in plant expression vectors. The corresponding interaction partner is fused to the complementary fragment. These constructs are then co-expressed in plant systems (often Nicotiana benthamiana leaves via Agrobacterium infiltration) along with a nuclear marker for localization reference. Fluorescence complementation is typically examined 2-3 days post-infiltration using confocal microscopy. For NBS-LRR proteins, special consideration should be given to co-expressing potential helper NLRs (e.g., NRC proteins in Solanaceae) that may be required for proper function and localization [52] [71].

Diagram 2: NBS-LRR-Mediated Immune Signaling Pathway. This diagram illustrates the central role of NBS-LRR proteins in plant immunity, showing how sensor NLRs recognize pathogen effectors and require helper NLRs to activate hypersensitive response and disease resistance.

Advanced Methodologies for Comprehensive NBS-LRR Characterization

High-Throughput Transformation and Phenotyping Platforms

The scale of NBS-LRR gene families demands advanced high-throughput methodologies for comprehensive functional characterization. Recent technological innovations have enabled the creation of transgenic arrays numbering in the hundreds to thousands of NLR genes. A groundbreaking study established a pipeline combining expression-based candidate prioritization with high-efficiency wheat transformation to generate a transgenic array of 995 NLRs from diverse grass species [71].

The core protocol involves Gateway-compatible entry clones of prioritized NBS-LRR genes, which are subsequently recombined into binary expression vectors containing strong constitutive promoters (e.g., maize Ubiquitin promoter for monocots). These constructs are transformed into susceptible plant lines using high-efficiency transformation systems—in wheat, this utilizes Agrobacterium strain AGL1 and immature embryos as explants. Transgenic lines are screened using both molecular markers (PCR, southern blotting for copy number determination) and large-scale pathogen phenotyping. For rust pathogens like Puccinia graminis f. sp. tritici (stem rust) and Puccinia triticina (leaf rust), this involves inoculating T1 transgenic lines with standardized pathogen spores and evaluating disease symptoms 10-14 days post-inoculation. This pipeline successfully identified 31 new resistance NLRs (19 against stem rust, 12 against leaf rust), demonstrating the power of scale in functional NLR characterization [71].

Structural and Localization Studies

Understanding the molecular mechanisms of NBS-LRR function requires detailed structural and subcellular localization analyses. Prediction of subcellular localization using tools like CELLO v.2.5 and Plant-mPLoc represents an important first step, with most NBS-LRR proteins localized to the cytoplasm (121 of 156 in N. benthamiana), while others target the plasma membrane (33) or nucleus (12) [25].

For empirical localization studies, confocal microscopy of fluorescent protein fusions provides high-resolution data. The wheat Ym1 protein demonstrated a nucleocytoplasmic distribution pattern that shifted upon recognition of its cognate viral coat protein, illustrating the dynamic nature of NLR localization during immune activation [52]. For structural insights, recent advances in cryo-electron microscopy have enabled determination of NLR complex structures, such as the LRR-RLP RXEG1 (PDB ID: 7DRC), providing atomic-level information on domain organization and potential activation mechanisms [36].

Table 3: Research Reagent Solutions for NBS-LRR Functional Studies

Reagent Category	Specific Examples	Application	Technical Considerations
Expression Vectors	pUBI:GFP, pCAMBIA1302, Gateway-compatible vectors	Protein localization, overexpression	Select promoters based on expression level requirements
Silencing Vectors	TRV1/TRV2 VIGS vectors, pHELLSGATE RNAi vectors	Gene silencing, functional analysis	Design specific fragments to avoid off-target effects
Agrobacterium Strains	GV3101, LBA4404, AGL1	Plant transformation, transient expression	Use appropriate strains for host species
Yeast Two-Hybrid Systems	pGBKT7/pGADT7, DHFR-based systems	Protein-protein interaction studies	Test for autoactivation with NLR constructs
Confocal Markers	RFP/mCherry nuclear markers, organelle markers	Subcellular localization	Include co-localization markers as references
Pathogen Isolates	Puccinia graminis, WYMV, Alternaria alternata	Phenotypic validation	Maintain virulence characteristics through proper culture

Integrated Workflows and Future Perspectives

The future of NBS-LRR functional studies lies in integrated approaches that combine genomic, computational, and experimental methodologies. Machine learning and deep learning frameworks are increasingly being applied to predict resistance protein functions and identify novel R genes, helping address challenges of data quality and class imbalance in large NBS-LRR datasets [36]. Additionally, the discovery of natural regulatory mechanisms such as miR482-mediated NBS-LRR regulation provides both insights into immune homeostasis and tools for experimental manipulation [72].

As these methodologies continue to evolve, the field moves toward a more comprehensive understanding of how NBS-LRR domain architecture dictates function in plant immunity. The integration of high-throughput functional data with structural information and computational predictions will enable researchers to not only characterize individual NBS-LRR genes but also understand the emergent properties of the entire NLR network within plant immune systems. This systems-level understanding will be crucial for developing novel disease resistance strategies in crop species, ultimately contributing to global food security through improved plant health and reduced yield losses.

Evolution and Efficacy: Validating NBS Architectures Through Cross-Species Comparison

The nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family constitutes the largest class of plant disease resistance (R) genes, encoding intracellular immune receptors that directly or indirectly recognize pathogen effectors to trigger robust defense responses [73]. More than 80% of the over 140 cloned plant R genes belong to this family [73] [74]. Understanding the evolutionary history of these genes—how they have diversified, expanded, and contracted across the angiosperm lineage—is fundamental to deciphering the molecular arms race between plants and their pathogens.

This technical guide examines the phylogenetic footprints of NBS genes within the context of domain architecture patterns, tracing their lineage from ancestral origins to the extensive diversification observed in modern angiosperms. We synthesize recent phylogenomic advances to elucidate the dynamic evolutionary patterns that have shaped the NBS gene repertoire, providing researchers with both theoretical frameworks and practical methodologies for investigating these critical genetic components of plant immunity.

Deep Evolutionary Origins of Angiosperm NBS Genes

The Three Ancient NBS-LRR Classes

Comprehensive phylogenetic analyses of NBS-LRR genes across 22 angiosperm genomes have revealed that these genes are derived from three anciently separated classes: RPW8-NBS-LRR (RNL), TIR-NBS-LRR (TNL), and CC-NBS-LRR (CNL) [73]. This tripartite classification system resolves previous controversies regarding the relationship between these subfamilies and provides a robust framework for understanding NBS gene evolution.

RNL Genes: Characterized by an N-terminal RPW8 domain, this class evolves conservatively and functions primarily in defense signal transduction rather than direct pathogen recognition [73] [74]. RNL genes are further divided into two ancient subclades: ADR1 and NRG1, which act as "helper NBS-LRR" (hNLR) proteins that transduce immune signals downstream of "sensor NBS-LRR" (sNLR) activation [74].
TNL Genes: Defined by an N-terminal Toll/Interleukin-1 receptor (TIR) domain, this class serves as pathogen sensors that directly recognize pathogen effectors [73] [7].
CNL Genes: Featuring an N-terminal coiled-coil (CC) domain, this class also functions primarily in pathogen recognition and represents the most expansive NBS lineage in many angiosperm genomes [73] [7].

Table 1: Fundamental NBS-LRR Gene Classes in Angiosperms

Class	N-Terminal Domain	Primary Function	Evolutionary Pattern	Key Features
RNL	RPW8	Defense signal transduction (helper NLR)	Conservative evolution, low copy numbers	Divided into ADR1 and NRG1 subclades; Ca²⁺-permeable channels
TNL	TIR (Toll/Interleukin-1 Receptor)	Pathogen recognition (sensor NLR)	Early contraction followed by recent expansion	Absent in most monocots; activated conformational changes
CNL	CC (Coiled-Coil)	Pathogen recognition (sensor NLR)	Gradual and continuous expansion	Largest class in most angiosperms; Ca²⁺-permeable channels

Ancestral Lineages and Early Diversification

Reconstruction of ancestral NBS gene states at key divergence nodes of angiosperms has revealed that the common ancestor of investigated angiosperms possessed at least 23 ancestral NBS-LRR lineages [73]. These primordial genes gave rise to the current NBS-LRR diversity through dynamic expansion mechanisms. Further analysis of basal angiosperms provides additional insights into early NBS gene evolution:

The basal angiosperm Amborella trichopoda possesses all three NBS classes, confirming their ancient origin prior to the diversification of extant angiosperms [73].
In Euryale ferox (Nymphaeales), a basal angiosperm, genomic analysis identified 131 NBS-LRR genes, comprising 18 RNLs, 40 CNLs, and 73 TNLs, suggesting substantial NBS diversity early in angiosperm evolution [74].
The common ancestor of three Nymphaeaceae species possessed at least 122 ancestral NBS-LRR lineages, indicating only slight expansion during speciation in this basal lineage [74].

Dynamic Evolutionary Patterns Across Angiosperm Lineages

Differential Evolutionary Trajectories of NBS Classes

The three NBS classes have exhibited remarkably distinct evolutionary patterns throughout angiosperm history, reflecting their specialized functional roles:

RNL Evolutionary Stasis: RNL genes have maintained low copy numbers throughout angiosperm evolution, consistent with their conserved role in defense signal transduction rather than direct pathogen recognition [73]. Their functional constraint limits diversification, as alterations could disrupt essential signaling pathways common to multiple defense responses.
TNL Evolutionary Dynamics: TNL genes experienced prolonged contraction during the early evolution of angiosperms (approximately the first 100 million years), maintaining fewer than 10 copies in early lineages [73]. This evolutionary pattern explains the puzzling absence of TNL genes in monocots and select dicot lineages (e.g., Aquilegia coerulea and some lamiales), as the loss of few TNL genes in early lineages would be evolutionarily plausible [73].
CNL Expansive Radiation: In contrast to TNL genes, CNL genes underwent gradual expansion from approximately 14 ancestral lineages to several dozen copies during early angiosperm evolution [73]. This consistent expansion pattern continues in many modern angiosperm lineages, resulting in CNLs frequently representing the largest NBS class in contemporary species.

Table 2: Evolutionary Patterns of NBS Genes in Major Angiosperm Groups

Plant Group	Representative Species	NBS Gene Count	Dominant Class	Evolutionary Pattern	Key Genomic Features
Basal Angiosperms	Euryale ferox	131	TNL (73 genes)	Slight expansion from ancestral lineages	87 genes in clusters, 44 singletons
Monocots	Dendrobium officinale	74	CNL (10 NBS-LRR genes)	Significant degeneration	No TNL genes; CNL genes mainly in 3 branches
Eudicots	Arabidopsis thaliana	210	CNL	Recent expansion	Tandem arrays and singletons
Solanaceae	Potato (S. tuberosum)	447	CNL	"Consistent expansion"	Tandem arrays on chromosomes
Solanaceae	Tomato (S. lycopersicum)	255	CNL	"Expansion then contraction"	Tandem duplications
Solanaceae	Pepper (C. annuum)	306	CNL	"Shrinking" pattern	Segmental duplications

Lineage-Specific Evolutionary Patterns

Different angiosperm lineages have exhibited distinct evolutionary patterns of NBS genes, reflecting their unique evolutionary histories and ecological adaptations:

Solanaceae Family: Comparative analysis of three Solanaceae species reveals diverse evolutionary trajectories. Potato shows "consistent expansion," tomato exhibits "expansion followed by contraction," and pepper demonstrates a "shrinking" pattern [75]. These differences occur despite all three species sharing a common ancestor with approximately 150 CNL, 22 TNL, and 4 RNL ancestral genes [75].
Monocot Lineages: Monocots display distinctive NBS evolution, including the complete absence of TNL genes in most species [9]. In Orchidaceae species like Dendrobium, NBS-LRR genes have significantly degenerated, with CNL-type genes distributed across three primary phylogenetic branches [9].
Cucurbitaceae Family: Species in this family demonstrate frequent gene losses and limited duplications, resulting in relatively small NBS repertoires (e.g., only 45 NBS-encoding genes in Citrullus lanatus) [75].

The following diagram illustrates the generalized evolutionary workflow of NBS genes across angiosperms, from ancestral lineages to modern species-specific profiles:

Diagram 1: Evolutionary workflow of NBS genes in angiosperms

Genomic Drivers of NBS Gene Diversification

Expansion Mechanisms and Genomic Distribution

The remarkable expansion and diversification of NBS genes across angiosperms have been driven by several genomic mechanisms:

Tandem Duplications: This represents the primary mechanism for NBS gene expansions, particularly for CNL and TNL classes [75]. Tandemly duplicated NBS genes typically cluster at specific chromosomal loci, creating hotspots for rapid evolution of novel pathogen recognition specificities.
Segmental Duplications: Genome-wide duplication events have also contributed to NBS gene expansion, though to a lesser extent than tandem duplications [74]. In Euryale ferox, segmental duplications acted as the major mechanism for CNL and TNL expansions, but not for RNL genes, which were distributed across multiple chromosomes without synteny loci [74].
Ectopic Duplications: RNL gene expansions appear to be driven primarily by ectopic duplications rather than large-scale segmental or tandem duplications [74]. This pattern aligns with the conserved nature and lower copy numbers of RNL genes across angiosperms.

The genomic distribution of NBS genes follows distinct patterns across species. In Euryale ferox, NBS-LRR genes are unevenly distributed across 29 chromosomes, with 87 genes clustered at 18 multigene loci and 44 genes existing as singletons [74]. Similar clustered distributions occur across diverse angiosperm lineages, facilitating the generation of diversity through unequal crossing over and gene conversion.

The Cretaceous-Paleogene (K-Pg) Boundary Expansion

A remarkable finding in NBS gene evolution is the evidence for intensive recent expansions of both TNL and CNL genes beginning at the Cretaceous-Paleogene (K-Pg) boundary approximately 66 million years ago [73]. This period coincided with dramatic environmental changes and the proliferation of pathogenic fungi, suggesting that increased selection pressure from pathogens drove convergent expansions of TNL and CNL genes across diverse angiosperm lineages [73].

This synchronous expansion timing indicates that major geological and ecological events have profoundly shaped the evolutionary trajectory of plant immune genes, creating parallel evolutionary patterns across phylogenetically distant angiosperm lineages facing similar pathogen pressures.

Experimental Approaches for NBS Gene Analysis

Genome-Wide Identification and Classification

Standardized methodologies have been developed for comprehensive identification and classification of NBS genes:

Diagram 2: NBS gene identification workflow

Table 3: Key Research Reagent Solutions for NBS Gene Analysis

Research Reagent/Tool	Specific Application	Function & Importance	Reference/Database
NB-ARC HMM Profile (PF00931)	NBS domain identification	Core conserved domain recognition; initial gene discovery	Pfam Database
COILS Program	CC domain prediction	Identifies coiled-coil domains with threshold of 0.9	EMBnet
MEME Suite	Motif elicitation	Discovers novel amino acid motifs in NBS proteins	MEME Suite
PhyloScape	Phylogenetic visualization	Interactive tree visualization with metadata annotation	http://darwintorrent.cn/PhyloScape
ANNA Database	Angiosperm NLR Atlas	Contains >90,000 NLR genes from 304 angiosperm genomes	http://compbio.nju.edu.cn/app/ANNA/
Angiosperms353 Gene Panel	Phylogenomic analysis	353 nuclear genes for consistent phylogenetic framework	[76]
CDD Database	Domain verification	Confirms conserved domain presence and architecture	NCBI Conserved Domains

Phylogenetic and Evolutionary Analysis

Robust phylogenetic analysis forms the cornerstone of evolutionary investigations into NBS gene lineage:

Sequence Alignment: Extract and align amino acid sequences of NBS domains using ClustalW integrated into MEGA 7.0 with default settings, followed by manual correction [74].
Phylogenetic Reconstruction: Perform maximum likelihood analysis using IQ-TREE after selecting the best-fit model. Support for nodes can be assessed using bootstrap analysis with 1000 replicates [74].
Orthogroup Analysis: Identify orthogroups across multiple species using OrthoFinder v2.5.1, which employs DIAMOND for sequence similarity searches and MCL for clustering [7]. This approach allows identification of core conserved orthogroups and lineage-specific expansions.
Ancestral State Reconstruction: Reconcile gene trees with species trees to infer ancestral NBS lineages at key divergence nodes, enabling estimation of gene duplication and loss events throughout angiosperm evolution [73] [74].

The evolutionary history of NBS genes in angiosperms reveals a complex tapestry of conservation, diversification, and lineage-specific adaptations. The three ancient NBS classes—RNL, TNL, and CNL—have followed distinct evolutionary trajectories shaped by their specialized functions in plant immunity. RNL genes maintaining remarkable conservation as signaling components, while TNL and CNL genes exhibiting dynamic expansions driven primarily by tandem duplications.

The recent expansion of TNL and CNL genes at the K-Pg boundary highlights how major ecological events have shaped the evolutionary dynamics of plant immune systems. Furthermore, the diverse evolutionary patterns observed across angiosperm lineages—from the "consistent expansion" in potato to the "shrinking" pattern in pepper—demonstrate how closely related species can develop distinct NBS genomic architectures through different balances of duplication and loss events.

These phylogenetic footprints of NBS gene evolution not only illuminate the deep history of plant-pathogen interactions but also provide a framework for future research aimed at harnessing plant immunity for agricultural sustainability. Understanding these evolutionary patterns enables more targeted mining of resistance gene resources from diverse angiosperm lineages, facilitating the development of crops with enhanced and durable disease resistance.

The domain architecture of plant nucleotide-binding site and leucine-rich repeat (NBS-LRR or NLR) proteins represents a critical evolutionary innovation in intracellular immunity. These multidomain proteins function as sophisticated pathogen surveillance systems, detecting effector molecules through direct or indirect recognition mechanisms [1]. The domain architecture patterns in plant NBS genes have diversified substantially across plant lineages, creating both challenges and opportunities for transferring disease resistance traits between species.

Cross-species transferability of NLR pairs offers a promising strategy for engineering durable disease resistance in crop species. This approach leverages the conserved NLR architecture - typically featuring an N-terminal signaling domain (CC or TIR), a central nucleotide-binding adapter (NBS), and C-terminal leucine-rich repeats (LRR) - to reconstitute functional immune pathways in non-native hosts [1] [16]. However, successful transfer requires careful consideration of domain-specific coevolution, hierarchical interactions, and lineage-specific adaptations within NLR networks.

This technical guide provides a comprehensive framework for the functional validation of transferred NLR pairs, with emphasis on experimental protocols, validation methodologies, and interpretative frameworks essential for researchers working at the intersection of plant immunity and disease resistance engineering.

Domain Architecture and Molecular Evolution of NLR Proteins

Structural Domains and Functional Specialization

NLR proteins exhibit a characteristic tripartite domain architecture that enables their function as allosteric immune switches:

N-terminal domain: Determines signaling specificity and falls into two major classes - coiled-coil (CC) in CNL-type proteins and Toll/interleukin-1 receptor (TIR) in TNL-type proteins. Cereal species completely lack TNL proteins, representing a major architectural constraint in monocots [1] [16].
Nucleotide-binding domain (NBS): Serves as a molecular switch regulated by nucleotide exchange (ADP/ATP). Contains conserved motifs including P-loop, Walker B, and MHD that coordinate nucleotide-dependent activation [1] [77].
Leucine-rich repeat domain (LRR): Mediates pathogen recognition and autoinhibition. Exhibits the highest sequence diversity, with solvent-exposed residues undergoing diversifying selection for effector binding [1].

Table 1: Major NLR Structural Types and Their Distribution

Structural Type	Domain Architecture	Representative Examples	Plant Lineage Distribution
CNL	CC-NBS-LRR	Sr50 (wheat), RPS2 (Arabidopsis)	All angiosperms
TNL	TIR-NBS-LRR	N (tobacco), L6 (flax)	Dicots only (absent in cereals)
RNL	RPW8-NBS-LRR	ADR1 (Arabidopsis)	Limited to specific lineages
N	NBS only	Multiple variants	All plant species

Lineage-Specific Evolution and Architectural Constraints

The NLR repertoire has undergone dramatic lineage-specific expansion and contraction throughout plant evolution. In the Solanaceae, the NRC (NLR-required for cell death) family has expanded as helper NLRs that form complex networks with sensor NLRs [77]. In contrast, cereal genomes contain only CNL-type NLRs, completely lacking the TNL subfamily found in dicots [1] [16]. Medicinal plants like Salvia miltiorrhiza show further specializations, with dramatic reductions in both TNL and RNL subfamilies compared to model plants [16].

These architectural constraints directly impact cross-species transferability. For example, transferring a TNL-type NLR from dicots to monocots would require complete pathway reconstitution, while CNL transfers between monocots and dicots face fewer architectural barriers.

Experimental Workflows for NLR Pair Validation

Protoplast Transfection for Cell Death Assays

Mesophyll protoplast transfection provides a rapid homologous system for quantifying NLR/AVR recognition in cereal hosts [78]. This method measures cell death through luciferase (LUC) activity as a viability proxy, with diminished LUC signal indicating AVR-specific cell death.

Protocol: Barley and Wheat Protoplast Transfection [78]

Plant Material: Use 3-5 week old barley (Hordeum vulgare) or wheat (Triticum aestivum) plants grown under controlled conditions.
Protoplast Isolation:
- Harvest the youngest fully expanded leaves
- Perform enzymatic digestion with 1.5% cellulose and 0.75% macerozyme
- Expose mesophyll cells through epidermal peeling
- Purify protoplasts through filtration and centrifugation
Plasmid Transfection:
- Transfect with 10-20μg of total plasmid DNA per sample
- Include NLR and AVR effector constructs in 1:1 molar ratio
- Include luciferase reporter construct (35S::LUC) as viability control
- Include empty vector control as reference for normalization
Incubation and Measurement:
- Incubate transfected protoplasts for 24 hours in the dark
- Measure luciferase activity using standard assay systems
- Calculate cell death as: 1 - (LUCsample/LUCempty vector)

This method successfully quantified cell death for the Sr50/AvrSr50 pair in wheat protoplasts and the MLA1/AVRA1 pair in barley protoplasts, demonstrating its utility for both homologous and heterologous validation within cereals [78].

High-Throughput Transgenic Arrays for NLR Screening

Large-scale NLR screening utilizes expression signatures to identify functional receptors, followed by high-efficiency transformation to validate resistance.

Table 2: Quantitative Assessment of NLR Transferability in Wheat

NLR Source	Transgenic Events Tested	Resistance to Pgt	Resistance to Pt	Key Findings
Diverse grass species	995 NLRs	19 NLRs	12 NLRs	High-expression NLRs more likely functional
Barley Mla7	Multiple copy lines	Not tested	Confirmed (Pst)	Required multiple copies for function
Aegilops tauschii Sr genes	Multiple accessions	Sr46, SrTA1662, Sr45	Not tested	Highly expressed in source accessions

Protocol: Wheat Transgenic Array for NLR Validation [79]

Candidate Identification:
- Screen transcriptomes of uninfected plants for highly expressed NLRs
- Prioritize candidates with expression above median gene expression level
- Select NLRs from diverse grass species and wild relatives
Vector Construction:
- Clone NLR genomic sequences (promoter + coding region) into binary vectors
- Use native promoters rather than constitutive promoters to maintain expression regulation
- For multicopy NLRs, include all copies to ensure functional expression
Plant Transformation:
- Use high-efficiency wheat transformation system [79]
- Generate large transgenic arrays (e.g., 995 NLRs in proof-of-concept study)
- Screen for single-copy insertion events where possible
Phenotypic Validation:
- Challenge T1 or T2 plants with relevant pathogen isolates
- For stem rust: Use Puccinia graminis f. sp. tritici (Pgt) isolates
- For leaf rust: Use Puccinia triticina (Pt) isolates
- Include appropriate susceptible and resistant controls

This pipeline successfully identified 31 new resistance NLRs (19 against stem rust, 12 against leaf rust) from 995 tested, demonstrating the efficacy of large-scale NLR transfer [79].

Case Studies in NLR Pair Specialization and Transfer

The Rice Pik NLR Pair: Allelic Specialization

The rice Pik NLR pair exemplifies how coordinated evolution shapes transferability constraints. Pik-1 (sensor) and Pik-2 (helper) form a genetically linked pair with only ~2.5kb separating their start codons [80]. Throughout evolution, these pairs have undergone coordinated specialization:

Effector recognition: Pik-1 alleles differentially recognize AVR-Pik variants through their integrated HMA domain
Pair cooperation: Matching Pik-1/Pik-2 pairs mount effective immunity, while mismatched pairs cause autoimmunity
Specificity determinant: A single amino acid polymorphism in Pik-2 underpins allelic specialization

When allelic variants were experimentally mismatched (e.g., Pikp-1 with Pikm-2), constitutive cell death occurred in Nicotiana benthamiana, demonstrating the functional co-adaptation of these NLR pairs [80]. This case study highlights the importance of transferring matched NLR pairs rather than individual components.

Solanaceae NRCX-NARY Pair: Non-Canonical Regulation

In Nicotiana benthamiana, the NRCX and NARY NLR pair illustrates a non-canonical regulatory mechanism [77]:

Architecture: Head-to-head orientation separated by 18,795bp intergenic region
Domain interaction: Exclusive CC-domain mediated interaction
Motif divergence: NARY contains non-canonical Walker B and MHD motifs but lacks autoactivation capacity
Regulatory function: NRCX knockout causes dwarfism and constitutive immunity, partially rescued by NARY co-silencing

This pair represents a specialized regulatory module within the broader NRC helper network, demonstrating how Solanaceae-specific NLR expansions have created unique architectural constraints for cross-species transfer.

The Scientist's Toolkit: Essential Research Reagents

Table 3: Essential Research Reagents for NLR Transfer Studies

Reagent/Category	Specific Examples	Function/Application	Technical Considerations
Binary Vectors	pCambia series, pGreen	NLR gene expression in plants	Use native promoters for proper regulation
Transformation Systems	Agrobacterium-mediated, biolistic	Plant genetic transformation	Cereals may require specialized protocols
Reporter Constructs	35S::Luciferase, 35S::GUS	Cell viability and transformation efficiency	Luciferase provides quantitative viability data
Pathogen Strains	Puccinia graminis f. sp. tritici, Magnaporthe oryzae	Phenotypic resistance validation	Maintain virulence characterizations
Protoplast Systems	Barley, wheat, N. benthamiana	Rapid cell death assays	Species-specific isolation protocols required
CRISPR/Cas9 Systems	Multiplex gRNA constructs	NLR knockout validation	Essential for testing NLR pair requirements

Interpretation Framework and Technical Considerations

Evaluating Transferability Success

Successful NLR transfer requires meeting multiple criteria beyond simple pathogen resistance:

Specificity retention: Transferred NLRs should maintain race-specific recognition patterns
Network integration: Function within the recipient's NLR network without causing autoimmunity
Growth-defense balance: Not incur substantial fitness costs under non-infection conditions
Stable expression: Maintain function over generations without silencing

The case of barley Mla7 demonstrates that copy number and expression level critically impact functionality. In native barley, Mla7 exists as three identical copies in the haploid genome, and transgenic lines required two or more copies for resistance, indicating threshold expression requirements [79].

Troubleshooting Failed Transfers

Common failure modes and potential solutions include:

Autoactive cell death: Often indicates mismatched NLR pairs or improper expression levels
Lack of recognition: May result from absence of required co-factors or signaling components
Species-specific restrictions: Cereal TNL transfers impossible due to complete absence of TNL subfamily
Network conflicts: Incompatibility with existing NLR networks in recipient species

When transferring NLRs between distant species, complementation with helper NLRs or signaling components from the donor species may be necessary for functionality.

Future Perspectives and Concluding Remarks

The field of NLR transferability is rapidly evolving with several promising directions:

Architecture-informed design: Using domain architecture patterns to predict transfer success
Network engineering: Transferring entire NLR modules rather than individual pairs
Expression optimization: Tuning NLR expression through promoter engineering
Synergistic transfers: Combining NLRs with matching PRRs for enhanced immunity

Cross-species transfer of NLR pairs represents a powerful strategy for crop improvement, particularly as genomic resources from wild relatives and non-model species expand. By respecting the architectural constraints and coevolutionary relationships within NLR pairs, researchers can successfully engineer durable disease resistance across taxonomic boundaries.

The experimental frameworks and validation protocols outlined in this guide provide a foundation for systematic NLR transfer, emphasizing the importance of domain architecture awareness, appropriate validation systems, and interpretation within evolutionary context. As our understanding of NLR network architecture deepens, so too will our ability to rationally design immune systems for crop protection.

Within the broader context of research on domain architecture patterns in plant nucleotide-binding site (NBS) genes, this case study examines how specific architectural configurations of these disease resistance genes correlate with contrasting disease tolerance phenotypes in cotton. The NBS-leucine-rich repeat (LRR) gene family constitutes the largest class of plant resistance (R) proteins, capable of recognizing pathogen-secreted effectors to trigger robust immune responses [16]. In cotton, a crop of immense economic importance, susceptibility to devastating diseases like Verticillium wilt presents a major agricultural challenge. This analysis explores the genomic and structural basis of disease resistance by comparing NBS-encoding genes between tolerant and susceptible cotton accessions, providing insights that may accelerate disease-resistant cotton breeding.

Background on NBS-LRR Genes in Plant Immunity

Domain Architecture and Classification

NBS-LRR proteins, also referred to as NLRs, function as intracellular immune receptors in plant effector-triggered immunity (ETI) [16]. These proteins typically exhibit a modular structure characterized by three core domains:

N-terminal domain: Determines protein-protein interactions and signaling pathways. Based on this domain, NLRs are classified into:
- TNLs: Contain a Toll/Interleukin-1 Receptor (TIR) domain
- CNLs: Contain a Coiled-Coil (CC) domain
- RNLs: Contain a Resistance to Powdery Mildew 8 (RPW8) domain
Central NBS/ NB-ARC domain: Binds and hydrolyzes nucleotides (ATP/GTP), functioning as a molecular switch for immune activation [16] [81]. This domain contains conserved motifs including P-loop, kinase-2, kinase-3a, GLPL, and MHDL [81].
C-terminal LRR domain: Facilitates pathogen recognition through its variable leucine-rich repeats [16].

Beyond these typical architectures, plants also contain numerous atypical NBS-encoding genes that lack complete domains, classified as NL (NBS-LRR), TN (TIR-NBS), CN (CC-NBS), or N (NBS only) subtypes [16].

Functional Mechanisms in Disease Resistance

The NBS-LRR proteins operate as a critical component of the plant immune system, recognizing specific pathogen effectors and initiating defense signaling cascades [16]. This recognition often triggers a hypersensitive response (HR) and programmed cell death (PCD) at infection sites, effectively limiting pathogen spread [16]. Recent studies have revealed that the two layers of plant immunity, PTI (PAMP-triggered immunity) and ETI, can act synergistically to enhance immune responses rather than functioning independently [16].

Materials and Methods

Plant Materials and Phenotypic Data

This comparative analysis utilizes contrasting cotton accessions with well-documented disease responses:

Tolerant/Resistant Accessions: Gossypium raimondii (D5 genome, nearly immune to Verticillium wilt), G. barbadense (allotetraploid, resistant to Verticillium wilt), and Mac7 (tolerant G. hirsutum accession with resistance to cotton leaf curl disease [CLCuD]) [82] [7].
Susceptible Accessions: G. arboreum (A genome, susceptible to Verticillium wilt), Coker 312 (susceptible G. hirsutum accession vulnerable to CLCuD), and standard G. hirsutum (often susceptible to Verticillium dahliae) [82] [7].

Genomic Identification of NBS-Encoding Genes

Step 1: Sequence Retrieval

Obtain complete genome sequences and annotation files for target cotton species from databases such as NCBI, Phytozome, CottonFGD, or Cottongen [7].

Step 2: HMMER Search

Perform Hidden Markov Model (HMM) searches against proteome datasets using the NB-ARC domain (Pfam: PF00931) as a query with HMMER software (e.g., HMMER 3.1b2) [82].
Apply stringent E-value cutoffs (e.g., 1×10⁻¹⁰ to 1×10⁻⁵) to identify candidate NBS-encoding genes [81] [8].

Step 3: Domain Architecture Analysis

Validate candidate genes and classify domain architecture using InterProScan and NCBI's Batch CD-Search [81] [8].
Identify additional domains (TIR, CC, RPW8, LRR) using Pfam database and SMART motif analysis [81].
Categorize genes into structural classes (TNL, CNL, RNL, TN, CN, NL, N, etc.) based on domain composition [82].

Comparative Genomic and Phylogenetic Analyses

Step 4: Chromosomal Distribution and Gene Clustering

Map physical locations of NBS genes on chromosomes using annotation data.
Identify gene clusters using BEDTools with criteria of ≤8 genes separating adjacent NBS genes [8].
Perform statistical significance testing (χ² tests) against random distribution expectations [8].

Step 5: Phylogenetic Reconstruction

Perform multiple sequence alignments of NBS protein sequences using Clustal Omega or MAFFT [8].
Construct phylogenetic trees using Maximum Likelihood method (e.g., MEGA software) with bootstrap validation (1000 replicates) [81] [8].
Classify NBS genes into clades and compare with orthologs from model plants.

Step 6: Synteny and Orthology Analysis

Identify orthogroups across species using OrthoFinder v2.5.1 with DIAMOND for sequence similarity searches and MCL for clustering [7].
Analyze syntenic relationships between diploid and allotetraploid cotton NBS genes.

Expression and Functional Validation

Step 7: Transcriptomic Profiling

Analyze RNA-seq data from susceptible and tolerant accessions under pathogen challenge and abiotic stresses.
Calculate FPKM values and perform differential expression analysis.
Validate expression patterns of selected NBS genes using quantitative PCR (qPCR) [7].

Step 8: Functional Validation via VIGS

Design virus-induced gene silencing (VIGS) constructs targeting candidate NBS genes.
Infect resistant cotton plants with VIGS vectors and challenge with pathogens.
Monitor disease progression and quantify pathogen titers to confirm gene function [7].

Results and Analysis

Genomic Distribution and Quantitative Variation in NBS Genes

Comprehensive identification of NBS-encoding genes across four cotton species reveals significant quantitative differences between susceptible and tolerant accessions.

Table 1: NBS-Encoding Gene Counts in Cotton Genomes

Cotton Species	Ploidy	Disease Response	Total NBS Genes	CNL	TNL	RNL	Other Types
G. raimondii (D5)	Diploid	Tolerant (Verticillium)	365 [82]	29.32% [82]	19.45% [82]	~1% [82]	50.23% [82]
G. arboreum (A2)	Diploid	Susceptible (Verticillium)	246 [82]	32.52% [82]	2.85% [82]	~1% [82]	63.63% [82]
G. hirsutum (TM-1)	Allotetraploid	Susceptible (Verticillium)	588 [82]	~45% [83]	~3% [83]	~1% [82]	~51% [82]
G. barbadense	Allotetraploid	Tolerant (Verticillium)	682 [82]	~35% [82]	~20% [82]	~1% [82]	~44% [82]

The data reveals a striking disparity in TNL representation between tolerant and susceptible cotton genotypes. Tolerant accessions (G. raimondii and G. barbadense) possess substantially higher proportions of TNL-type genes (19.45% and ~20%, respectively) compared to susceptible accessions (G. arboreum and G. hirsutum; 2.85% and ~3%, respectively) [82]. This represents an approximately 7-fold difference in TNL percentages, suggesting a potential significance of TNL genes in Verticillium wilt resistance [82].

Structural Architecture Diversity

Analysis of domain architecture reveals distinct structural patterns between susceptible and tolerant cotton accessions.

Table 2: Comparative Analysis of NBS Domain Architecture in Tolerant vs. Susceptible Cotton

Architectural Feature	Tolerant Accessions	Susceptible Accessions	Functional Implications
TNL Proportion	Higher (19.45% in G. raimondii, ~20% in G. barbadense) [82]	Lower (2.85% in G. arboreum, ~3% in G. hirsutum) [82]	TNL genes may recognize Verticillium effectors and activate stronger immune responses
CN/CNL Proportion	Lower (29.32% CNL in G. raimondii, ~35% CNL in G. barbadense) [82]	Higher (32.52% CNL in G. arboreum, ~45% CNL in G. hirsutum) [82]	Altered recognition specificities in susceptible genotypes
Exon Number	Higher average exons per NBS gene [82]	Lower average exons per NBS gene [82]	Potential impact on alternative splicing and functional diversity
Gene Clustering	Tendency for chromosomal clustering [81]	Tendency for chromosomal clustering [81]	Facilitates rapid evolution through unequal crossing over
Atypical NBS	Present (N, TN, CN, NL types) [16]	Present (N, TN, CN, NL types) [16]	Possible regulatory functions or degenerated resistance genes

The structural analysis indicates that susceptible accessions (G. arboreum and G. hirsutum) possess a greater proportion of CN, CNL, and N genes with a correspondingly lower proportion of NL, TN, and TNL genes compared to tolerant accessions (G. raimondii and G. barbadense) [82]. The most substantial difference was observed in TNL genes, suggesting their potential significance in Verticillium wilt resistance [82].

Phylogenetic and Evolutionary Relationships

Phylogenetic analysis of NBS-encoding genes from tolerant and susceptible cotton accessions reveals distinct evolutionary patterns. TNL genes from tolerant accessions (G. raimondii and G. barbadense) form closely related clades, suggesting conservation of specific TNL lineages associated with resistance [82]. Furthermore, asymmetric evolution of NBS-encoding genes is evident in allotetraploid cottons, with G. hirsutum inheriting more NBS genes from its susceptible progenitor (G. arboreum), while G. barbadense inherited more NBS genes from its tolerant progenitor (G. raimondii) [82].

Orthogroup analysis across land plants has identified core orthogroups (OGs) that are conserved across species, as well as species-specific OGs [7]. In cotton, specific orthogroups (OG2, OG6, and OG15) show upregulated expression in tolerant accessions under biotic stress, suggesting their potential role in disease resistance [7].

Expression Profiling and Functional Validation

Transcriptomic analyses reveal differential expression patterns of NBS genes between tolerant and susceptible cotton accessions under pathogen challenge. In a study comparing CLCuD-tolerant (Mac7) and susceptible (Coker 312) G. hirsutum accessions, specific NBS genes showed pronounced upregulation only in the tolerant genotype following viral infection [7].

Functional validation through virus-induced gene silencing (VIGS) demonstrated that silencing a specific NBS gene (GaNBS from OG2) in resistant cotton led to increased viral titers, confirming its functional role in antiviral defense [7]. Genetic variation analysis between these accessions identified numerous unique variants in NBS genes, with the tolerant Mac7 accession containing 6583 unique variants compared to 5173 in susceptible Coker312 [7].

Discussion

Evolutionary Implications of NBS Architecture Patterns

The comparative analysis of NBS domain architecture between susceptible and tolerant cotton accessions reveals significant evolutionary patterns. The preferential retention of TNL-class genes in tolerant genotypes suggests that these genes may play a disproportionate role in recognizing Verticillium effectors and activating effective immune responses [82]. The dramatic contraction of TNL genes in susceptible cultivated cottons may reflect a consequence of domestication bottlenecks and artificial selection for agronomic traits, potentially at the expense of disease resistance [8].

The finding that G. hirsutum inherited more NBS-encoding genes from its susceptible progenitor (G. arboreum), while G. barbadense inherited more from its tolerant progenitor (G. raimondii), provides a genomic explanation for their contrasting disease responses [82]. This asymmetric evolution of NBS-encoding genes highlights how polyploidization can shape the disease resistance profiles of crops through selective retention or loss of specific resistance gene classes from progenitor genomes.

Molecular Mechanisms of Resistance and Susceptibility

The association between TNL abundance and Verticillium tolerance suggests several molecular mechanisms. TNL-type proteins typically activate immune signaling through specific pathways involving EDS1 and PAD4 proteins, which may provide more effective defense against vascular pathogens like Verticillium dahliae [16]. The reduction in TNL genes in susceptible accessions may compromise these specific signaling pathways, rendering plants vulnerable to infection.

Gene duplication events and tandem clustering of NBS genes, particularly in tolerant accessions, facilitate the generation of functional diversity through sequence exchange and diversifying selection [81]. This creates a reservoir of genetic variation enabling rapid adaptation to evolving pathogen populations. Susceptible accessions may have lost specific clusters containing critical resistance genes or possess reduced diversity within conserved clusters.

Applications for Disease-Resistant Cotton Breeding

The findings from this comparative analysis have direct applications for cotton breeding programs:

Marker Development: NBS gene-derived markers, particularly from TNL-rich genomic regions, can serve as molecular markers for selecting Verticillium-tolerant genotypes.
Pyramiding R Genes: Strategic combination of complementary NBS architectures (TNL, CNL, RNL) from different resistant sources may provide broader and more durable resistance.
Genome Editing: CRISPR-based approaches could be employed to restore or modify specific NBS genes in susceptible elite cultivars [84].
Wild Species Introgression: Targeted introgression of NBS-rich regions from wild relatives into cultivated cotton could enhance disease resistance.

Visualizations

Workflow for Comparative NBS Architecture Analysis

NBS Domain Architecture in Tolerant vs. Susceptible Cotton

Table 3: Essential Research Resources for Comparative NBS Gene Analysis

Resource Category	Specific Tools/Reagents	Application/Function
Genomic Databases	CottonFGD (https://cottonfgd.net/), Cottongen (https://www.cottongen.org/), NCBI Genome Data	Access to genome sequences, annotations, and variation data for cotton species
Bioinformatics Tools	HMMER v3.1b2, InterProScan, OrthoFinder v2.5.1, MEME Suite, PlantCARE	Domain identification, orthogroup analysis, motif discovery, promoter element prediction
Experimental Validation	Virus-Induced Gene Silencing (VIGS) vectors, qPCR reagents, RNA-seq libraries	Functional characterization of NBS genes, expression validation, transcriptome profiling
Reference Databases	Pfam (PF00931), PRGdb 4.0, Plant GARDEN	Domain annotation, resistance gene references, wild relative genomic data
Cotton Germplasm	G. raimondii (D5, tolerant), G. arboreum (A2, susceptible), G. hirsutum (TM-1, susceptible), G. barbadense (tolerant), Mac7 (tolerant), Coker 312 (susceptible)	Comparative phenotypic and genotypic analyses

This case study demonstrates that contrasting disease responses in cotton accessions correlate with significant differences in the domain architecture of NBS-encoding resistance genes. Tolerant genotypes are characterized by an enrichment of TNL-type genes, while susceptible accessions show a marked reduction in this gene class. The asymmetric evolution of NBS-encoding genes in allotetraploid cottons, with preferential retention from specific progenitors, provides a genomic basis for observed disease resistance patterns. These findings advance our understanding of domain architecture patterns in plant NBS genes and provide a framework for targeted breeding of disease-resistant cotton varieties through marker-assisted selection, genomic introgression, and potentially gene editing approaches. Future research should focus on functional characterization of specific TNL genes from tolerant accessions and their incorporation into elite cotton cultivars.

The innate immune system of plants represents a sophisticated defense network, capable of recognizing pathogens and activating coordinated resistance mechanisms. Central to this system are the nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, which constitute the largest family of plant resistance (R) genes and play a pivotal role in effector-triggered immunity (ETI) [28] [36]. These intracellular immune receptors recognize pathogen-secreted effectors either directly or indirectly, initiating signaling cascades that often culminate in a hypersensitive response (HR) and localized programmed cell death to restrict pathogen spread [28] [25]. The domain architecture of NBS-LRR proteins typically includes a conserved NBS (NB-ARC) domain that binds and hydrolyzes nucleotides, a C-terminal LRR domain responsible for pathogen recognition, and variable N-terminal domains that determine their classification into distinct subfamilies [28] [85].

The signaling molecule salicylic acid (SA) serves as a critical hormone in plant defense, particularly against biotrophic and hemibiotrophic pathogens. SA accumulation is associated with the establishment of systemic acquired resistance (SAR), a prolonged defense state that protects uninfected tissues against subsequent pathogen challenges [86]. Exogenous application of SA can prime plant defense systems, enhancing antimicrobial activity and reducing viral symptoms through the induction of pathogen-related proteins [86]. Within this defense signaling network, certain NBS-LRR genes exhibit responsive expression patterns to SA treatment, positioning them as key components in the regulation of plant immunity. This technical guide explores the experimental validation of SA-responsive NBS-LRR genes, their integration into defense pathways, and the implications of their domain architectures for immune function.

Domain Architecture Patterns in Plant NBS-LRR Genes

Structural Classification and Phylogenetic Distribution

The NBS-LRR gene family exhibits remarkable structural diversity, with members classified based on their N-terminal domain organization into three major subfamilies:

TNL subfamily: Characterized by an N-terminal Toll/interleukin-1 receptor (TIR) domain
CNL subfamily: Features an N-terminal coiled-coil (CC) domain
RNL subfamily: Contains a resistance to powdery mildew 8 (RPW8) domain [28] [85]

This classification system reflects fundamental differences in signaling mechanisms and evolutionary history. Phylogenetic analyses reveal that the proportions of these subfamilies vary significantly across plant species, suggesting distinct evolutionary paths. For instance, gymnosperms like Pinus taeda exhibit expansion of the TNL subfamily (comprising 89.3% of typical NBS-LRRs), while monocotyledonous species such as Oryza sativa have completely lost TNL and RNL subfamilies [28]. Medicinal plants like Salvia miltiorrhiza show marked reduction in TNL and RNL members, with 61 CNLs and only 1 RNL identified among 62 typical NLRs [28]. Similar patterns occur in orchids, where no TNL-type genes were identified across six species, indicating TIR domain degeneration is common in monocots [9].

Table 1: NBS-LRR Subfamily Distribution Across Plant Species

Plant Species	Total NBS-LRR Genes	CNL	TNL	RNL	References
Arabidopsis thaliana	207	61	140	6	[28]
Oryza sativa (rice)	505	505	0	0	[28]
Salvia miltiorrhiza	62	61	0	1	[28]
Nicotiana benthamiana	156	25	5	4	[25]
Akebia trifoliata	73	50	19	4	[85]
Dendrobium officinale	10	10	0	0	[9]

Conserved Motifs and Domain Functionality

Protein motif analyses consistently identify conserved domains within NBS-LRR proteins that define their functional capabilities. The NBS (NB-ARC) domain contains characteristic motifs including P-loop, kinase-2, and GLPL motifs that facilitate nucleotide binding and conformational changes [87] [25]. The LRR domain typically consists of multiple leucine-rich repeats that form a solenoid structure capable of protein-protein interactions and pathogen recognition [86] [28].

Studies across multiple species confirm that the "Pkinase domain" and "LRR domains" are conserved in most R-proteins, though variations occur in atypical NBS-LRRs that may lack complete N-terminal or LRR domains [86] [28]. In grass pea, researchers identified ten conserved motifs with lengths ranging from 16 to 30 amino acids, including distinct TIR-1 and TIR-2 domains in TNL proteins, and RX-CCLike domains in CNL proteins [87]. These conserved structural elements enable NBS-LRR proteins to function as molecular switches within defense signaling pathways.

SA-Mediated Defense Signaling Pathways

Salicylic acid serves as a central regulator in plant immune responses, orchestrating a complex signaling network that connects pathogen recognition to defense activation. The SA signaling pathway integrates with NBS-LRR-mediated immunity through multiple connection points.

Figure 1: SA-Mediated Defense Signaling Pathways Integrating NBS-LRR Recognition

As illustrated in Figure 1, pathogen invasion triggers two layered immune responses. PAMP-Triggered Immunity (PTI) represents the first line of defense, activated when pattern recognition receptors at the cell surface detect conserved pathogen molecules [28] [9]. Successful pathogens deliver effector proteins into plant cells to suppress PTI, which in turn activates Effector-Triggered Immunity (ETI) mediated primarily by NBS-LRR proteins [28] [36]. ETI activation often leads to the hypersensitive response (HR), characterized by localized cell death that confines pathogens to infection sites [25] [36].

Both PTI and ETI can stimulate SA accumulation, though ETI typically induces stronger and more sustained SA production [86]. Increased SA levels activate the expression of pathogenesis-related (PR) proteins with antimicrobial activity and establish systemic acquired resistance (SAR), enhancing defensive capacity in uninfected tissues [86]. Recent research indicates that PTI and ETI function synergistically rather than independently, with SA serving as a key integrator of these defense signals [28].

Experimental Validation of SA-Responsive NBS-LRR Genes

Expression Profiling Methodologies

Transcriptome Sequencing and Analysis

Comprehensive identification of SA-responsive NBS-LRR genes begins with transcriptome profiling under controlled SA treatment conditions. The standard workflow includes:

Plant Material Preparation: Uniform plant materials (e.g., leaves, roots) are collected and divided into experimental and control groups.
SA Treatment: Experimental groups receive precise SA concentrations (typically 0.5-2.0 mM) via foliar spray or root drench, while controls receive solvent only [86] [9].
RNA Extraction: High-quality total RNA is extracted from tissues collected at multiple timepoints (e.g., 0, 6, 12, 24, 48 hours post-treatment) using validated protocols.
Library Preparation and Sequencing: RNA-seq libraries are prepared and sequenced on platforms such as Illumina to generate 100-150 bp paired-end reads.
Bioinformatic Analysis: Reads are aligned to reference genomes, transcript abundance is quantified, and differential expression analysis identifies significantly regulated NBS-LRR genes (commonly defined as |log2FC| > 1 and FDR < 0.05) [9].

In Dendrobium officinale, this approach identified 1,677 differentially expressed genes (DEGs) from SA-treated samples, including six NBS-LRR genes that showed significant up-regulation [9]. Similar studies in blackgram demonstrated that SA priming alters NBS-LRR expression patterns upon pathogen challenge, enhancing immunity against yellow mosaic disease [86].

Quantitative RT-PCR Validation

Transcriptome findings require validation through quantitative reverse transcription PCR (qRT-PCR), which provides precise measurement of expression changes for specific NBS-LRR genes. The standard protocol includes:

RNA Quality Verification: Assess RNA integrity using agarose gel electrophoresis or bioanalyzer systems.
cDNA Synthesis: Convert 1-2 μg of total RNA to cDNA using reverse transcriptase with oligo(dT) and random primers.
Primer Design: Design gene-specific primers (18-22 bp, Tm ~60°C, amplicon size 80-200 bp) for target NBS-LRR genes and reference genes (e.g., Actin, EF1α, GAPDH).
qPCR Amplification: Perform reactions in technical triplicates using SYBR Green or TaqMan chemistry on real-time PCR systems.
Data Analysis: Calculate relative expression using the 2^(-ΔΔCt) method with normalization to reference genes [87] [9].

In grass pea, researchers selected nine LsNBS genes for qPCR validation under salt stress conditions, revealing that most showed upregulation at 50 and 200 μM NaCl, though LsNBS-D18, LsNBS-D204, and LsNBS-D180 showed reduced or drastic downregulation [87].

Table 2: Experimentally Validated SA-Responsive NBS-LRR Genes

Plant Species	NBS-LRR Gene	Subfamily	Expression Response to SA	Proposed Function	References
Vigna mungo (Blackgram)	VrNBS_TNLRR-8	TNL	Significant up-regulation	YMD resistance	[86]
Vigna mungo (Blackgram)	VrLRR_RLK-20	RLK	Significant up-regulation	YMD resistance	[86]
Dendrobium officinale	Dof020138	CNL	Significant up-regulation	ETI system, multiple pathways	[9]
Dendrobium officinale	Dof013264	CNL	Significant up-regulation	ETI system	[9]
Dendrobium officinale	Dof020566	CNL	Significant up-regulation	ETI system	[9]
Salvia miltiorrhiza	SmNBS35/49/51	CNL	Up-regulated (cluster with RPH8A)	Hypersensitive response	[28]
Salvia miltiorrhiza	SmNBS55/56	CNL	Up-regulated (cluster with RPM1)	Pseudomonas resistance	[28]

Promoter Analysis and cis-Element Identification

The SA responsiveness of NBS-LRR genes is often reflected in their promoter architectures. Bioinformatic analyses of promoter regions (typically 1.5 kb upstream of translation start sites) reveal enrichment of SA-related cis-acting elements:

TCA-elements: Responsive to SA
W-box motifs: Binding sites for WRKY transcription factors
TGA-elements: Auxin-responsive elements
G-box motifs: Involved in various stress responses [9] [25]

In Nicotiana benthamiana, promoter analysis of 156 NBS-LRR genes detected 29 shared kinds of cis-elements and 4 kinds unique to irregular-type NBS-LRR genes, indicating potential upstream regulation factors [25]. Similarly, analysis in Dendrobium officinale revealed an abundance of cis-acting elements related to plant hormones and abiotic stress in NBS-LRR promoters [9]. These elements enable fine-tuned transcriptional responses to SA signaling and other hormonal cues, allowing coordinated regulation of defense pathways.

The Scientist's Toolkit: Essential Research Reagents and Materials

Successful investigation of SA-responsive NBS-LRR genes requires specialized reagents and methodologies. The following table summarizes essential research tools for experimental validation:

Table 3: Research Reagent Solutions for SA-Responsive NBS-LRR Studies

Reagent/Material	Specification	Application	Function	References
Salicylic Acid	0.5-2.0 mM in appropriate solvent	Plant treatment	Defense pathway induction	[86] [9]
TRIzol Reagent	Phenol-guanidine isothiocyanate	RNA extraction	Maintains RNA integrity	[87] [9]
Reverse Transcriptase	M-MLV or similar	cDNA synthesis	First-strand cDNA generation	[87]
SYBR Green Master Mix	Optimized for qPCR	qRT-PCR	Fluorescent detection of amplicons	[87] [9]
HMM Profile	PF00931 (NB-ARC)	Bioinformatics	NBS domain identification	[28] [25]
MEME Suite	Version 5.4.1	Bioinformatics	Conserved motif discovery	[25] [85]
PlantCARE Database	Online tool	Bioinformatics	cis-element prediction	[25] [85]

Integrated Workflow for NBS-LRR Gene Analysis

A comprehensive approach to characterizing SA-responsive NBS-LRR genes incorporates both bioinformatic and experimental methodologies. The integrated workflow spans from initial genome mining to functional validation.

Figure 2: Integrated Workflow for SA-Responsive NBS-LRR Gene Analysis

As depicted in Figure 2, the analytical pipeline begins with comprehensive genome mining using hidden Markov models (HMM) based on the NB-ARC domain (PF00931) to identify NBS-encoding genes [28] [25] [85]. Subsequent classification based on N-terminal domains (TIR, CC, RPW8) and C-terminal LRR domains organizes genes into subfamilies, while motif analysis reveals conserved structural elements [25] [85]. Promoter analysis identifies cis-regulatory elements that potentially mediate SA responsiveness [9] [25].

The experimental phase incorporates SA treatment followed by transcriptome sequencing to identify differentially expressed NBS-LRR genes [86] [9]. qRT-PCR validation confirms expression patterns of candidate genes [87] [9]. Functional characterization may include pathway analysis through co-expression networks (e.g., WGCNA), which in Dendrobium officinale revealed that the SA-responsive gene Dof020138 connects to pathogen recognition pathways, MAPK signaling, plant hormone signal transduction, and energy metabolism pathways [9].

The integration of NBS-LRR genes into SA-mediated defense pathways represents a crucial mechanism in plant immunity. Through comprehensive genome-wide analyses and expression validation studies, researchers have identified specific NBS-LRR genes that respond to SA induction across diverse plant species. These SA-responsive genes typically display promoter architectures enriched in defense-related cis-elements and encode proteins with characteristic domain arrangements that enable their function as intracellular immune receptors.

The experimental methodologies outlined in this technical guide—from transcriptome sequencing under SA treatment conditions to qRT-PCR validation and promoter analysis—provide a robust framework for identifying and characterizing additional SA-responsive NBS-LRR genes. The conserved domain architecture of these proteins, particularly the NB-ARC and LRR domains, facilitates their roles in pathogen recognition and defense signaling. As research progresses, the manipulation of SA-responsive NBS-LRR genes through breeding or biotechnology offers promising avenues for enhancing disease resistance in crop plants, potentially reducing yield losses and decreasing dependence on chemical pesticides.

Future investigations should focus on elucidating the precise molecular mechanisms through which SA regulates NBS-LRR expression and activity, and how different NBS-LRR subfamilies integrate SA signals with other defense hormones. Such research will further illuminate the sophisticated networks underlying plant immunity and provide additional tools for crop improvement strategies.

The co-evolutionary arms race between plants and their pathogens represents one of the most dynamic processes in molecular evolution, driving exceptional genetic diversity in host immune systems. This conflict centers largely on plant nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, which function as specialized pathogen sensors. These proteins evolve under intense diversifying selection that preferentially targets specific functional domains, creating structural variation that determines pathogen recognition capabilities. This technical review examines the molecular mechanisms and evolutionary forces shaping NBS-LRR gene diversity, with particular emphasis on domain architecture patterns and their functional consequences. We integrate genomic analyses, experimental methodologies, and structural predictions to provide researchers with a comprehensive framework for studying plant-pathogen coevolution.

Plant-pathogen interactions follow an evolutionary arms race model wherein advances in pathogen virulence mechanisms select for corresponding adaptations in host defense systems [88]. This dynamic creates strong selective pressures that drive molecular evolution at an accelerated pace, particularly in genes encoding pathogen recognition proteins. The majority of plant disease resistance (R) genes encode NBS-LRR proteins, which constitute one of the largest and most variable gene families in plant genomes [12]. These proteins function as intracellular immune receptors that detect pathogen effector molecules either directly or through their effects on host proteins [12].

The evolutionary conflict between plants and pathogens manifests primarily through two interconnected recognition systems: PAMP-triggered immunity (PTI) and effector-triggered immunity (ETI). PTI represents the first layer of induced defense, activated upon recognition of pathogen-associated molecular patterns (PAMPs) by surface-localized pattern recognition receptors (PRRs) [88]. In response, pathogens have evolved effector proteins that suppress PTI, leading to the evolution of ETI, where NBS-LRR proteins recognize specific pathogen effectors or their cellular effects [88]. This zig-zag model of escalating defense and counter-defense establishes the fundamental framework for understanding the diversifying selection pressures operating on plant immune receptors.

NBS-LRR Domain Architecture and Classification

NBS-LRR proteins are characterized by a conserved tripartite domain structure that facilitates their role in pathogen sensing and defense activation. These large proteins (860-1,900 amino acids) contain distinct functional domains joined by linker regions [12].

Structural Domains and Their Functions

Amino-terminal domain: This variable domain determines protein-protein interactions and signaling pathway specificity. Two major classes exist: TIR (Toll/interleukin-1 receptor) domains with similarity to Drosophila Toll and mammalian interleukin-1 receptors, and CC (coiled-coil) domains that form helical structures [12] [89].
NBS (Nucleotide-Binding Site) domain: Also called NB-ARC (nucleotide binding adaptor shared by APAF-1, R proteins, and CED-4), this domain contains conserved motifs characteristic of the STAND family of ATPases [12]. It functions as a molecular switch, with ATP binding and hydrolysis regulating conformational changes that control downstream signaling [12].
LRR (Leucine-Rich Repeat) domain: This carboxy-terminal region consists of tandemly arrayed repeats that typically form a solenoid structure with a solvent-exposed surface, facilitating protein-protein interactions [12]. The LRR domain is primarily responsible for pathogen recognition specificity [48] [89].

Table 1: Major Classes of Plant NBS-LRR Proteins

Class	N-terminal Domain	Signaling Pathway	Phylogenetic Distribution	Representative Genes
TNL	TIR (Toll/Interleukin-1 Receptor)	EDS1/PAD4-dependent	Dicots only (absent from cereals)	L (flax), RPP1 (Arabidopsis)
CNL	CC (Coiled-Coil)	NRC-dependent	All angiosperms	RPS2 (Arabidopsis), I2 (tomato)
RNL	RPW8-like CC	Helper function	Limited subclade	ADR1 (Arabidopsis)

Genomic Organization and Phylogenetic Distribution

NBS-LRR encoding genes are numerous and ancient in origin, with approximately 150 members in Arabidopsis thaliana, over 400 in rice (Oryza sativa), and potentially more in larger plant genomes [12]. These genes are frequently organized in complex clusters resulting from both segmental and tandem duplications [12] [89]. Phylogenetic analyses reveal that TNLs are completely absent from cereal genomes, suggesting lineage-specific loss or diversification [12]. Different plant families show distinct patterns of NBS-LRR gene amplification, with species-specific expansions observed in legumes, Solanaceae, and Asteraceae [12].

Molecular Evolution of NBS-LRR Genes

Evolutionary Mechanisms and Selection Patterns

NBS-LRR genes evolve through a birth-and-death process characterized by repeated gene duplication, sequence diversification, and pseudogenization [12] [89]. This evolutionary dynamic creates heterogeneous rates of evolution even within individual gene clusters. Genomic studies in lettuce and coffee have identified two evolutionary patterns: Type I genes evolve rapidly with frequent sequence exchange between paralogs, while Type II genes evolve slowly with conserved orthology relationships [89].

The different domains of NBS-LRR proteins experience distinct selective pressures. The NBS domain evolves under purifying selection that maintains conserved structural motifs required for nucleotide binding and hydrolysis [12] [89]. In contrast, the LRR domain shows evidence of diversifying selection, particularly at codons encoding solvent-exposed residues that potentially interact with pathogen effectors [48] [12] [89]. This pattern of heterogeneous selection maximizes recognition diversity while preserving signaling functionality.

Table 2: Evolutionary Forces Acting on NBS-LRR Gene Domains

Protein Domain	Primary Evolutionary Force	Functional Constraint	Evidence
Amino-terminal (TIR/CC)	Purifying selection with episodic diversification	Protein-protein interactions in signaling	Moderate sequence conservation with lineage-specific variation
NBS (NB-ARC)	Strong purifying selection	Nucleotide binding and hydrolysis	Conserved motifs across plant lineages
LRR	Diversifying selection on solvent-exposed residues	Pathogen recognition specificity	Elevated ω (dN/dS) ratios in β-sheet residues
Linker regions	Relaxed selection	Structural flexibility	High sequence divergence

Gene Duplication and Sequence Exchange

Multiple genetic mechanisms generate variation in NBS-LRR gene clusters:

Unequal crossing-over: Increases or decreases copy number within tandem arrays [12]
Ectopic recombination: Exchanges sequences between non-allelic genes [48]
Gene conversion: Transfers sequence patches between paralogs, creating chimeric genes [89]
Domain shuffling: Recombines functional modules between genes

These processes create substantial variation in LRR number and sequence. With approximately 14 LRRs per protein and multiple sequence variants for each repeat, the potential for recognition diversity is enormous - exceeding 9×10¹¹ variants in Arabidopsis alone [12].

Experimental Analysis of Diversifying Selection

Genomic Approaches for Detecting Selection

Comparative genomic analysis provides powerful tools for identifying diversifying selection in NBS-LRR genes. The following workflow outlines a standard approach:

Protocol 1: Detection of Diversifying Selection in NBS-LRR Genes

Sequence Acquisition and Alignment
- Obtain NBS-LRR coding sequences from genomic or transcriptomic data
- Perform multiple sequence alignment using codon-aware algorithms (e.g., PRANK, MACSE)
- Visually inspect alignment quality and adjust manually if necessary
Selection Analysis using CodeML (PAML package)
- Calculate non-synonymous (dN) to synonymous (dS) substitution rate ratios (ω = dN/dS)
- Compare nested models: M1a (nearly neutral) vs. M2a (positive selection); M7 (beta) vs. M8 (beta+ω)
- Identify positively selected sites using Bayes Empirical Bayes analysis
- Apply false discovery rate correction for multiple testing
Structural Mapping of Selected Sites
- Map positively selected residues to protein models using homology modeling
- Determine if selected sites cluster in solvent-exposed regions of LRR domains
- Corrogate with functional data from mutagenesis studies

Functional Validation of Selected Variants

Site-directed mutagenesis provides critical experimental validation of computationally identified selection sites. The following protocol tests the functional significance of positively selected residues:

Protocol 2: Functional Analysis of Positively Selected Sites

Mutagenesis Construct Design
- Select candidate residues with significant evidence of positive selection
- Design mutagenic primers to alter selected codons (alanine substitutions recommended)
- Use overlap extension PCR or commercial mutagenesis kits
Transient Expression Assays
- Clone wild-type and mutant R genes into appropriate binary vectors
- Transform into susceptible plant genotypes via Agrobacterium infiltration
- Challenge with pathogen isolates differing in corresponding Avr genes
- Quantify cell death response and defense marker gene expression
Protein Interaction Studies
- Express wild-type and mutant LRR domains as recombinant proteins
- Perform yeast two-hybrid or surface plasmon resonance with pathogen effectors
- Compare binding affinities between variants

The Scientist's Toolkit: Key Research Reagents

Table 3: Essential Reagents for Studying NBS-LRR Evolution

Reagent/Category	Specific Examples	Function/Application	Technical Notes
Reference Genomes	Arabidopsis Col-0, Rice Nipponbare, Barley MorexV3	Comparative genomics, gene family identification	High-quality assemblies essential for repetitive NBS-LRR regions
Selection Analysis Software	PAML (CodeML), HyPhy, Datamonkey	Detection of diversifying selection	CodeML allows site-specific, branch-specific, and branch-site tests
Structural Prediction Tools	I-TASSER, Phyre2, AlphaFold2	Protein structure modeling from sequence	Mapping selected sites to structural models
Heterologous Expression Systems	Nicotiana benthamiana, Yeast two-hybrid	Functional characterization of R genes	N. benthamiana useful for transient expression assays
Pathogen Isolates	Characterized Pseudomonas syringae, Hyaloperonospora arabidopsidis	Phenotypic validation of R gene function	Differing Avr gene profiles enable specificity testing
Mutagenesis Platforms	CRISPR-Cas9, Site-directed mutagenesis kits	Functional validation of selected sites	CRISPR enables genome editing in diverse plant species

Case Study: Evolution of the Coffee SH3 Resistance Locus

The coffee SH3 locus, which confers resistance to coffee leaf rust (Hemileia vastatrix), provides an exemplary case study of NBS-LRR evolution. Comparative analysis of the SH3 region in three coffee genomes (C. arabica subgenomes Ca and Ea, and C. canephora genome Cc) revealed 5, 3, and 4 R genes, respectively, all belonging to the CNL class [89]. These genes shared >95% identity but no orthologs were found in syntenic regions of other eudicots, indicating lineage-specific expansion [89].

Molecular evolutionary analysis demonstrated that the SH3-CNL family evolves under a birth-and-death model, with duplication/deletion events shaping the locus over time [89]. Gene conversion between paralogs and inter-subgenome sequence exchanges contribute to diversification, while positive selection acts on solvent-exposed residues of the LRR domain [89]. This case illustrates how multiple evolutionary mechanisms operate concurrently to generate recognition diversity at a single resistance locus.

The study of diversifying selection pressures in plant-pathogen arms races has revealed fundamental principles of molecular evolution while providing practical insights for crop improvement. The domain architecture of NBS-LRR genes represents an evolutionary compromise between structural conservation for signaling functionality and hypervariability for pathogen recognition. Future research should focus on integrating evolutionary genomics with functional studies to predict recognition specificities from sequence variation and engineer broad-spectrum resistance. The development of genome editing technologies now enables direct manipulation of NBS-LRR genes, potentially allowing researchers to accelerate the evolutionary process to create durable disease resistance in crop plants.

Conclusion

The intricate domain architecture of NBS genes forms the cornerstone of the plant immune system, exhibiting remarkable diversity through 168 documented classes and species-specific patterns. This structural complexity, driven by continuous evolutionary innovation, provides a vast genetic toolkit for pathogen recognition. Advances in deep learning and comparative genomics are now enabling researchers to navigate this complexity, overcoming historical challenges in gene annotation and validation. The successful transfer of functional NLR pairs across taxonomic boundaries demonstrates the potential for engineering broad-spectrum, durable disease resistance in crops. Future research must focus on elucidating the molecular mechanisms of non-canonical NBS architectures, leveraging AI-driven prediction tools for genome-wide resistance gene discovery, and translating this knowledge into practical breeding solutions to enhance global food security.