Evolutionary Dynamics of NBS-LRR Genes: Gains, Losses, and Adaptive Innovation in Plant Immunity

Skylar Hayes Nov 27, 2025 197

This article provides a comprehensive analysis of the evolutionary patterns of Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) genes, the largest family of plant disease resistance genes.

Evolutionary Dynamics of NBS-LRR Genes: Gains, Losses, and Adaptive Innovation in Plant Immunity

Abstract

This article provides a comprehensive analysis of the evolutionary patterns of Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) genes, the largest family of plant disease resistance genes. We explore the foundational principles of NBS-LRR classification and distribution across plant lineages, revealing significant lineage-specific expansions and contractions. The review covers advanced methodologies for gene family identification and functional validation, including genome-wide screens and virus-induced gene silencing. We address key challenges in studying these dynamic genes and present comparative analyses of distinct evolutionary patterns across species. Synthesizing findings from recent studies on medicinal plants, crops, and trees, this resource is tailored for researchers and scientists seeking to understand plant-pathogen co-evolution and apply these insights to disease resistance breeding and sustainable agriculture.

The Plant Immune Repertoire: Understanding NBS-LRR Diversity and Evolutionary Origins

Structural Architecture and Functional Classification of NBS-LRR Proteins

Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) proteins represent the largest class of disease resistance (R) genes in plants, playing a pivotal role in the innate immune system by conferring resistance to diverse pathogens including bacteria, fungi, viruses, oomycetes, and nematodes [1] [2]. These proteins function as intracellular immune receptors that detect pathogen effector proteins and initiate effector-triggered immunity (ETI), often culminating in a hypersensitive response (HR) to limit pathogen spread [3] [2]. The NBS-LRR gene family is one of the largest and most variable gene families in plants, with significant structural diversity and evolutionary dynamics driven by constant selective pressure from rapidly evolving pathogens [4] [2]. This technical guide examines the structural architecture and functional classification of NBS-LRR proteins within the broader context of NBS gene loss and gain across plant lineages, providing researchers and drug development professionals with a comprehensive framework for understanding this critical component of plant immunity.

Structural Architecture of NBS-LRR Proteins

Core Domain Organization

NBS-LRR proteins are large, multi-domain proteins typically ranging from approximately 860 to 1,900 amino acids in length [2]. They share a characteristic tripartite domain architecture consisting of:

Variable N-terminal domain: Involved in signaling and protein-protein interactions
Central Nucleotide-Binding Site (NBS) domain: Responsible for ATP/GTP binding and hydrolysis
C-terminal Leucine-Rich Repeat (LRR) domain: Mediates pathogen recognition specificity

These proteins belong to the STAND (signal transduction ATPases with numerous domains) family of ATPases, functioning as molecular switches in disease signaling pathways [1] [2]. Plant NBS-LRR proteins exhibit similarity in domain organization to mammalian NOD-LRR proteins, though this appears to be the result of convergent evolution rather than shared ancestry [2].

N-terminal Domain Variants

The N-terminal domain displays significant structural variation that forms the basis for primary classification of NBS-LRR proteins:

Table 1: Major N-terminal Domain Types in NBS-LRR Proteins

Domain Type	Key Features	Signaling Pathway	Phylogenetic Distribution
TIR (Toll/Interleukin-1 Receptor)	~175 amino acids with four conserved motifs; predicted α/β structure	EDS1-dependent [2]	Absent in monocots; present in most dicots [5] [2]
CC (Coiled-Coil)	Coiled-coil motif common but not always present in first 175 amino acids	EDS1-independent [2]	Universal across angiosperms [2]
RPW8 (Resistance to Powdery Mildew 8)	Found in RNL subclass; functions downstream in signaling	Acts as signal transducer for TNLs and CNLs [6]	Less common; identified in specific lineages [6]

The TIR domain is thought to be involved in protein-protein interactions, potentially with guarded host proteins or downstream signaling components [2]. Polymorphism in the TIR domain of the flax TNL protein L6 affects pathogen recognition specificity, highlighting its functional importance [2].

Nucleotide-Binding Site (NBS) Domain

The central NBS domain (also called NB-ARC domain) contains several highly conserved motifs that facilitate nucleotide binding and hydrolysis:

Table 2: Conserved Motifs in the NBS Domain

Motif Name	Conserved Sequence	Functional Role	Subfamily Variations
P-loop	GxGKT/S	Phosphate binding loop for ATP/GTP binding	GIGKST in nTNLs; GIGKTE in TNLs [5]
RNBS-A	V/VLLEVIGxIxNxND	Nucleotide binding	Distinct sequences in TNL vs. non-TNL [5] [2]
Kinase-2	KGPRxLVLVDDVWx	Catalytic activity	KGPRYLVVVDDIWRID in nTNLs [5]
RNBS-B	NGSRILLxTRxTxVxxYxS	Unknown function	NGSRILLTTRETKVAMYAS in nTNLs [5]
RNBS-C	LxLxLxWGxLx	Structural stability	LLNLENGWKLLRDKVF in nTNLs [5]
GLPL	CxGLPLA	Domain packing and activation	CQGLPL in nTNLs [5]

Specific binding and hydrolysis of ATP has been experimentally demonstrated for the NBS domains of tomato CNLs I2 and Mi [2]. ATP hydrolysis is thought to induce conformational changes that regulate downstream signaling, with the NBS domain functioning as a molecular switch between inactive (ADP-bound) and active (ATP-bound) states [1] [2].

Leucine-Rich Repeat (LRR) Domain

The C-terminal LRR domain is characterized by:

Variable number of LRR repeats (approximately 14 on average)
Solvent-exposed β-sheets that form a potential binding surface
High sequence diversity, especially in solvent-exposed residues
Involvement in protein-protein interactions and pathogen recognition specificity [2]

The LRR domain displays signatures of diversifying selection with elevated ratios of non-synonymous to synonymous nucleotide substitutions, particularly in solvent-exposed residues, consistent with its role in pathogen recognition [2] [3]. Unequal crossing-over and gene conversion have generated variation in LRR number and position, contributing to the extensive diversity of recognition specificities [2].

Functional Classification and Phylogenetics

Major Subfamilies and Distribution

NBS-LRR genes are classified into distinct subfamilies based on their N-terminal domains and domain architecture:

Table 3: NBS-LRR Gene Subfamily Distribution Across Plant Species

Plant Species	Total NBS-LRR Genes	TNL Genes	CNL/nTNL Genes	Other/Truncated	Key Evolutionary Pattern
Capsicum annuum (pepper)	252	4	248 (nTNL)	200 lack both CC and TIR	"Shrinking" pattern [5] [6]
Vernicia fordii (tung tree)	90	0	90 (49 with CC)	66 without LRR	TNL loss in eudicot [3] [7]
Vernicia montana (tung tree)	149	12	137 (98 with CC)	125 without LRR	Retention of TNLs [3] [7]
Fragaria spp. (strawberry)	1134 across 6 species	Variable TNLs	Variable non-TNLs	Multiple domain combinations	Lineage-specific duplication [8]
Arachis hypogaea (peanut)	713	229	118 CC, 26 with both TIR & CC	348 with LRR domains	LRR domain loss [9]
Arabidopsis thaliana	~150	~62 TNL	~88 CNL	21 TN, 5 CN	Reference genome [2]

The distribution of NBS-LRR subfamilies varies significantly across plant lineages. TNL genes are completely absent from monocot genomes and have been lost independently in some eudicot lineages, including Vernicia fordii and Sesamum indicum [3] [2]. Comparative analyses have revealed a greater prevalence of nTNL genes in angiosperms, with significant losses of TNL genes in monocots [5].

Structural Classification Based on Domain Architecture

NBS-LRR genes display diverse domain architectures beyond the typical TNL and CNL structures:

In pepper (Capsicum annuum):

N-type: Contains only NB-ARC domain (172 genes)
NL-type: NB-ARC + LRR_8 domains (11 genes)
NLL-type: NB-ARC + two LRR_8 domains (2 genes)
NN-type: Two NB-ARC domains (8 genes)
NLN-type: NB-LRR + NB-ARC domains (7 genes)
NLNLN-type: NB-LRR + NB-LRR + NB-ARC domains (1 gene)
TN-type: TIR + NB-ARC domains (4 TNL genes) [5]

In tung trees (Vernicia spp.):

CC-NBS-LRR, NBS-LRR, CC-NBS, and NBS in susceptible V. fordii
Additional TIR-containing types (TIR-NBS-LRR, CC-TIR-NBS, TIR-NBS) in resistant V. montana [3] [7]

This diversity in domain architecture reflects the dynamic evolution of resistance genes and their functional specialization across plant lineages.

Genomic Distribution and Evolutionary Dynamics

Chromosomal Organization and Gene Clusters

NBS-LRR genes are frequently organized in clusters throughout plant genomes, resulting from both segmental and tandem duplications [5] [2]. In pepper, 54% of NBS-LRR genes form 47 gene clusters distributed unevenly across all chromosomes [5]. Similarly, non-random distribution with clustering is observed in tung trees, with concentrations on specific chromosomes (V. fordii: Vfchr2, Vfchr3, Vfchr9; V. montana: Vmchr2, Vmchr7, Vmchr11) [3].

These clusters represent hotspots for resistance gene evolution, driven by tandem duplications and genomic rearrangements that generate diversity through unequal crossing-over, sequence exchange, and gene conversion [5] [2]. This clustered organization facilitates the birth-and-death evolution model characterized by gene duplication and density-dependent purifying selection [2].

Evolutionary Patterns Across Plant Lineages

Different plant families exhibit distinct evolutionary patterns of NBS-LRR genes:

In Rosaceae species:

Rubus occidentalis, Potentilla micrantha, Fragaria iinumae, and Gillenia trifoliata: "First expansion and then contraction"
Rosa chinensis: "Continuous expansion"
F. vesca: "Expansion followed by contraction, then further expansion"
Prunus species and Maleae species: "Early sharp expanding to abrupt shrinking" [6]

In Solanaceae:

Potato: "Consistent expansion" pattern
Tomato: "Expansion followed by contraction" pattern
Pepper: "Shrinking" pattern [6]

In Fabaceae:

Medicago truncatula, pigeon pea, common bean, and soybean: "Consistently expanding" pattern [6]

These diverse evolutionary patterns reflect varying selective pressures from pathogen communities and different genomic evolutionary mechanisms across plant lineages.

Selective Pressures and Evolutionary Rates

Different NBS-LRR subfamilies experience distinct selective pressures:

TNL vs. non-TNL evolution: TNLs show significantly higher Ks (synonymous substitutions) and Ka/Ks (nonsynonymous to synonymous substitution ratios) values than non-TNLs, indicating more rapid evolution driven by stronger diversifying selection [8]
Domain-specific selection: LRR domains experience diversifying selection, particularly in solvent-exposed residues, while NBS domains are subject to purifying selection [2]
Type I vs. Type II genes: Type I genes evolve rapidly with frequent gene conversions, while Type II genes evolve slowly with rare gene conversion events [2]

These differential evolutionary rates contribute to the functional diversification of NBS-LRR genes and their adaptation to recognize specific pathogens.

Experimental Methodologies for NBS-LRR Gene Analysis

Genome-Wide Identification Protocols

Step 1: Initial Gene Identification

Perform BLAST searches against whole-genome coding sequences using NB-ARC domain (PF00931) as query with E-value ≤ 10⁻⁴ [8]
Conduct HMMER searches using NB-ARC HMM profiles from Pfam against whole-genome protein sequences [8] [6]
Merge results from both approaches and eliminate redundancies

Step 2: Domain Validation and Classification

Verify NB-ARC domain presence using Pfam analysis (E-value 10⁻⁴) [6]
Identify LRR motifs using SMART protein motif analysis to improve accuracy [8]
Classify N-terminal domains (TIR, CC, RPW8) using Pfam, COILS, and NCBI-CDD [8] [6]

Step 3: Structural and Phylogenetic Analysis

Extract NBS domain sequences for multiple sequence alignment using MUSCLE or MAFFT [8] [4]
Construct phylogenetic trees using Maximum Likelihood methods (FastTree, MEGA) with 1000 bootstrap replicates [8] [4]
Analyze conserved motifs using MEME suite with parameters set to identify 10 motifs [6]

Functional Characterization Methods

Expression Profiling

Analyze RNA-seq data from databases (IPF, CottonFGD, Cottongen) under biotic and abiotic stresses [4]
Calculate FPKM values and categorize expression patterns into tissue-specific, abiotic stress-specific, and biotic stress-specific profiles [4]
Compare expression between susceptible and resistant varieties to identify candidate genes

Functional Validation via VIGS

Design specific gene fragments (300-500 bp) for cloning into virus-induced gene silencing (VIGS) vectors
Infect plants with Agrobacterium tumefaciens containing VIGS constructs
Challenge silenced plants with pathogens and assess disease symptoms
Quantify pathogen biomass and monitor expression of defense marker genes [3] [7]

Genetic Variation Analysis

Identify sequence variants between resistant and susceptible accessions
Map variants to protein domains to identify potential functional polymorphisms
Correlate specific variants with disease resistance phenotypes

Signaling Pathways and Immune Mechanisms

NBS-LRR Signaling Pathways

The diagram illustrates the core signaling mechanisms of NBS-LRR proteins. According to the "guard hypothesis," NBS-LRR proteins monitor plant host proteins for modifications by pathogen effector proteins [5]. Upon effector recognition, typically through detection of changes in the guarded protein, the NBS domain undergoes conformational changes through ATP/GTP binding and hydrolysis, switching from inactive (ADP-bound) to active (ATP-bound) states [1] [2]. This activation triggers downstream signaling through distinct pathways for TNL and CNL subfamilies, ultimately leading to defense activation including hypersensitive response and programmed cell death [3] [2].

Research Reagent Solutions and Essential Materials

Table 4: Essential Research Tools for NBS-LRR Gene Analysis

Reagent/Resource	Specific Examples	Function/Application	Key Features
Database Resources	Pfam (PF00931), NCBI-CDD, SMART	Domain identification and validation	Curated HMM profiles for NB-ARC, LRR, TIR, CC domains [8] [6]
Bioinformatics Tools	HMMER, MEME suite, COILS, OrthoFinder	Motif finding, coiled-coil prediction, orthogroup analysis	Identifies conserved motifs, protein families, evolutionary relationships [8] [4] [6]
Sequence Analysis Software	MUSCLE, MAFFT, MEGA, FastTree	Multiple sequence alignment, phylogenetic reconstruction	Evolutionary analysis, tree building with bootstrap support [8] [4]
Genomic Databases	Strawberry GARDEN, Rosaceae GDR, Phytozome	Genome sequences and annotations	Species-specific genomic data for comparative analyses [8] [6]
Expression Databases	IPF, CottonFGD, Cottongen, NCBI BioProject	RNA-seq data for expression profiling	Tissue-specific, stress-responsive expression patterns [4]
Functional Validation Tools	VIGS vectors, Agrobacterium tumefaciens	Gene silencing and functional characterization	Determining gene function in plant-pathogen interactions [3] [7] [4]

The structural architecture and functional classification of NBS-LRR proteins reveals a highly dynamic and evolutionarily sophisticated plant immune receptor system. The modular domain structure, with variable N-terminal domains, conserved NBS domains, and diverse LRR domains, provides both structural stability and recognition flexibility. The extensive genomic clustering of NBS-LRR genes and their birth-and-death evolution model enables rapid adaptation to changing pathogen populations. Distinct evolutionary patterns across plant lineages, including lineage-specific gene duplications and losses, reflect different pathogenic pressures and evolutionary strategies. The functional specialization between TNL and CNL subfamilies, with their distinct signaling pathways, further highlights the complexity of this immune receptor system. Continuing research on NBS-LRR gene loss and gain across plant lineages provides crucial insights into plant-pathogen coevolution and offers potential strategies for enhancing crop disease resistance through marker-assisted breeding and biotechnological approaches.

Phylogenetic Distribution of TNL, CNL, and RNL Subfamilies Across Plant Lineages

The nucleotide-binding site leucine-rich repeat (NBS-LRR) gene family, also referred to as NLRs, constitutes the largest and most prominent class of plant disease resistance (R) genes, playing a critical role in effector-triggered immunity (ETI) by recognizing pathogen-secreted effectors and initiating robust immune responses [10] [11]. These intracellular immune receptors are characterized by a conserved central NB-ARC (nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4) domain and a C-terminal leucine-rich repeat (LRR) region. Based on their N-terminal domain structures, NLR genes are phylogenetically divided into three principal subfamilies: TNL (Toll/Interleukin-1 Receptor domain), CNL (Coiled-Coil domain), and RNL (RPW8 domain) [12] [10]. The distribution and abundance of these subfamilies vary tremendously across plant lineages, shaped by a complex interplay of evolutionary pressures including pathogen co-evolution, ecological adaptation, and genomic constraints [12]. This in-depth technical guide synthesizes current research to elucidate the patterns of NLR gene loss and gain across the plant kingdom, providing a framework for understanding the evolutionary dynamics of plant innate immunity.

Results and Discussion

Evolutionary Dynamics and Genomic Distribution of NLR Subfamilies

The NLR gene family exhibits remarkable lineage-specific expansion and contraction, with copy numbers differing up to 66-fold among closely related species due to rapid gene loss and gain [12]. Genomic analyses reveal that NLR genes are often distributed unevenly across chromosomes, frequently forming clusters in specific genomic regions, which facilitates the generation of diversity through recombination and unequal crossing-over [3] [13]. Duplication mechanisms play a crucial role in NLR evolution, with studies in maize revealing subtype-specific preferences: canonical CNL genes largely originate from dispersed duplications, while N-type genes are enriched in tandem duplications [14]. Evolutionary rate analysis further demonstrates that whole-genome duplication (WGD)-derived genes undergo strong purifying selection (low Ka/Ks), whereas tandem and proximal duplications show signs of relaxed or positive selection, driving functional diversification [14].

Table 1: NLR Subfamily Distribution Across Major Plant Lineages

Plant Lineage	Species Example	TNL	CNL	RNL	Total NLRs	Key Features
Eudicots	Arabidopsis thaliana	Present (~40 TNLs)	Present (~61 CNLs)	Present (1 RNL)	207 [10] [15]	Balanced subfamily representation
Monocots	Oryza sativa (Rice)	Absent [11]	505 [10]	Present	505 [10]	Complete TNL loss
Tung Trees	Vernicia fordii	Absent [3]	90 (54.4% with CC)	Not reported	90 [3]	TNL loss in susceptible cultivar
	Vernicia montana	12 (8.1%) [3]	149 (65.8% with CC)	Not reported	149 [3]	Retention of TNL in resistant cultivar
Orchids	Dendrobium officinale	Absent [11]	10 CNL-type	12 non-TNL	74 [11]	TIR domain degeneration common in monocots
Conifers	Picea mariana	Present	Present	Highly diversified	725 [16]	Most diverse RNL repertoire
Salvia	Salvia miltiorrhiza	2 (marked reduction)	61 CNL	1 RNL	196 [10]	Notable TNL/RNL degeneration
Asparagus	Asparagus officinalis	Not specified	Not specified	Not specified	27 [17]	Domesticated (NLR contraction)
	Asparagus setaceus	Not specified	Not specified	Not specified	63 [17]	Wild relative (expanded NLR)
Akebia	Akebia trifoliata	19 TNL	50 CNL	4 RNL	73 [13]	Relatively balanced subfamilies

Lineage-Specific Patterns of NLR Subfamily Distribution

TNL Subfamily: Loss in Monocots and Selective Contraction

The TNL subfamily demonstrates the most striking phylogenetic pattern, characterized by its complete absence in monocotyledonous plants. Systematic analyses across numerous species confirm that no TNL-type genes exist in monocots such as rice (Oryza sativa), wheat (Triticum aestivum), and maize (Zea mays) [10] [11]. This fundamental distinction between monocots and dicots extends to other plant lineages, with the TNL loss also reported in certain eudicots including sesame (Sesamum indicum) and the susceptible tung tree cultivar (Vernicia fordii) [3]. Research suggests that TNL loss may be potentially driven by deficiencies in the NRG1/SAG101 pathway, essential components of TNL signaling [11]. The ANNA (angiosperm NLR atlas) database further reveals a co-evolutionary pattern between NLR subclasses and plant immune pathway components, suggesting that immune pathway deficiencies may indeed drive TNL loss [12].

CNL Subfamily: The Dominant NLR Class

The CNL subfamily represents the most widespread and numerous NLR class across land plants. In monocots, which lack TNLs entirely, CNLs constitute the predominant NLR type, comprising 100% of the typical NBS-LRR genes in species like rice [10]. Even in eudicots that retain TNLs, CNLs often represent the majority of NLR genes, as observed in Akebia trifoliata (50 CNLs vs. 19 TNLs) and Salvia miltiorrhiza (61 CNLs vs. 2 TNLs) [10] [13]. CNLs demonstrate remarkable functional diversity, with specific members directly recognizing pathogen effectors. For example, the rice CNL protein Pita recognizes the effector AVR-Pita of the rice blast fungus through its LRR domain, activating immune signaling pathways [10].

RNL Subfamily: Conserved Helpers with Lineage-Specific Diversification

The RNL subfamily, while typically the smallest in most angiosperms, functions as crucial helper proteins acting downstream of sensor NLRs (both TNLs and CNLs) in immune signaling [16]. RNLs are subdivided into two conserved subclades based on homology: NRG1 (N-required gene 1) and ADR1 (activated disease resistance gene 1) [13]. Interestingly, conifers possess an exceptionally diverse and numerous RNL repertoire unparalleled in other land plants, with four distinct RNL groups identified, two of which differ from angiosperms [16]. This RNL expansion in conifers may represent an evolutionary adaptation to their long lifespan and persistent exposure to pathogens. Furthermore, conifer RNLs show responsiveness to abiotic stress, with several RNL sequences upregulated in response to drought, suggesting potential dual functionality in biotic and abiotic stress response [16].

Association Between NLR Repertoire and Ecological Adaptation

Comparative genomic analyses reveal significant associations between NLR gene content and ecological adaptation strategies. The ANNA database demonstrates that NLR contraction is particularly associated with adaptations to specialized lifestyles such as aquatic, parasitic, and carnivorous habits [12]. This convergent NLR reduction in aquatic plants notably resembles the lack of NLR expansion observed in green algae before the colonization of land, suggesting that reduced pathogen pressure in aquatic environments may relax selection maintaining expanded NLR repertoires [12]. Similarly, domestication processes often lead to NLR contraction, as observed in garden asparagus (Asparagus officinalis), which possesses only 27 NLR genes compared to 63 in its wild relative Asparagus setaceus [17]. This reduction in the NLR repertoire during domestication is frequently accompanied by increased disease susceptibility, highlighting the trade-off between immunity and selection for agronomic traits.

Table 2: Methodologies for NLR Gene Identification and Characterization

Method Category	Specific Technique	Application	Key Parameters
Gene Identification	HMMER/HMM Search [3] [17]	Identify NLR genes using conserved NB-ARC domain	Pfam PF00931 (NB-ARC), E-value ≤ 1e-5 [17]
	BLASTp Analysis [17] [13]	Cross-species NLR identification	E-value cutoff 1e-10 [17], reference NLR sequences
Domain Characterization	InterProScan [17]	Protein domain analysis	Multiple database search
	NCBI CD-Search [17] [13]	Conserved domain identification	E-value 1e-5 [17]
	MEME Suite [17] [13]	Conserved motif prediction	Motif count: 10, width: 6-50 aa [13]
Classification	Pfam/PRGdb 4.0 [17]	Subfamily classification	TIR (PF01582), RPW8 (PF05659), LRR (PF08191)
	Coiled-coil prediction [13]	CC domain identification	Threshold 0.5
Evolutionary Analysis	OrthoFinder [17]	Orthologous group clustering	Normalized BLAST bit scores
	MCScanX [17]	Synteny and collinearity analysis	Gene positional information
	MEGA [17]	Phylogenetic tree construction	Maximum likelihood, JTT model, 1000 bootstraps

Materials and Methods

Genome-Wide Identification of NLR Genes

The standard workflow for comprehensive NLR identification involves a dual approach combining Hidden Markov Model (HMM)-based searches and homology-based methods. First, HMM searches are performed using the conserved NB-ARC domain (Pfam: PF00931) as query against the target proteome [3] [17]. Simultaneously, local BLASTp analyses are conducted using reference NLR protein sequences from well-characterized species such as Arabidopsis thaliana, Oryza sativa, and other relevant taxa, applying a stringent E-value cutoff of 1e-10 [17]. Candidate sequences identified through both methods are subsequently validated through rigorous domain architecture analysis using tools like InterProScan and NCBI's Batch CD-Search, retaining only sequences containing the NB-ARC domain (E-value ≤ 1e-5) as bona fide NLR genes [17]. Final classification is performed by querying the Pfam and PRGdb 4.0 databases, with genes categorized based on their complete domain architecture [17].

Structural and Phylogenetic Analysis

For structural characterization, conserved motifs within NBS domains can be predicted using the MEME suite with the motif number typically set to 10 while maintaining default parameters [17] [13]. Gene structures are subsequently analyzed through GSDS 2.0 (Gene Structure Display Server), and promoter regions (typically 2000 bp upstream of the initial codon) are examined for cis-regulatory elements using PlantCARE [17] [11]. Phylogenetic analysis involves consolidating protein sequences of candidate NLR genes from multiple species, performing multiple sequence alignment using Clustal Omega, and constructing phylogenetic trees using the maximum likelihood method based on the JTT matrix-based model implemented in MEGA software [17]. Bootstrap analysis with 1000 replicates provides statistical support for tree nodes [17].

Expression and Functional Analysis

Expression patterns of NLR genes can be investigated using available transcriptome data under various conditions, including pathogen infection, hormone treatment, and across different tissues or developmental stages [17] [11] [13]. For functional validation, Virus-Induced Gene Silencing (VIGS) has been successfully employed, as demonstrated in tung trees where silencing of specific NLR genes confirmed their role in Fusarium wilt resistance [3]. Additionally, co-expression networks (WGCNA) can identify NLR genes connected to specific immune pathways, such as MAPK signaling, plant hormone signal transduction, and biosynthetic pathways [11].

Diagram Title: Comprehensive Workflow for NLR Gene Family Analysis

Table 3: Essential Research Resources for NLR Studies

Resource Category	Specific Tool/Resource	Function/Application	Access/Reference
Databases	ANNA (Angiosperm NLR Atlas) [12]	Comparative NLR genomics across 300+ angiosperms	https://biobigdata.nju.edu.cn/ANNA/
	Pfam Database	Protein domain family identification	http://pfam.xfam.org/
	PRGdb 4.0 [17]	Plant Resistance Gene database	http://prgdb.org/prgdb4/plants/
	PlantCARE [17] [11]	Cis-acting regulatory element prediction	http://bioinformatics.psb.ugent.be/webtools/plantcare/html/
Software Tools	HMMER Suite [3] [17]	Hidden Markov Model-based sequence analysis	http://hmmer.org/
	MEME Suite [17] [13]	Motif discovery and analysis	https://meme-suite.org/meme/
	TBtools [17]	Bioinformatics analysis and visualization	https://github.com/CJ-Chen/TBtools
	MEGA [17]	Molecular Evolutionary Genetics Analysis	https://www.megasoftware.net/
	OrthoFinder [17]	Orthogroup inference and comparative genomics	https://github.com/davidemms/OrthoFinder
Experimental Methods	VIGS (Virus-Induced Gene Silencing) [3]	Functional validation of NLR genes	Protocol-dependent
	RNA-seq Analysis	Expression profiling of NLR genes	Platform-dependent
	SMRT/RenSeq [15]	Long-read sequencing for NLR characterization	Platform-dependent

Concluding Remarks

The phylogenetic distribution of TNL, CNL, and RNL subfamilies across plant lineages reveals a complex evolutionary history marked by repeated events of gene loss and gain, lineage-specific expansions and contractions, and adaptations to ecological niches. The consistent absence of TNLs in monocots and the convergent NLR reduction in aquatic plants and domesticated species highlight the dynamic nature of the plant immune repertoire. Future research directions should focus on elucidating the functional consequences of specific NLR losses, particularly the compensatory mechanisms that allow monocots to maintain effective immunity without TNLs. The development of comprehensive databases like ANNA provides powerful resources for comparative analyses, while advancing methodologies in genome sequencing and gene editing will enable functional validation of NLR candidates across diverse plant lineages. Understanding these evolutionary patterns not only illuminates fundamental plant biology but also informs strategies for enhancing crop resistance through breeding and biotechnology.

Diagram Title: NLR Subfamily Roles in Plant Immune Signaling

The nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes represent the largest and most critical class of plant disease resistance (R) genes, encoding intracellular receptors that recognize pathogen-secreted effectors to initiate effector-triggered immunity (ETI) [10]. These genes exhibit remarkable evolutionary dynamism, with significant lineage-specific expansions and losses occurring throughout plant evolutionary history. Understanding these patterns is particularly crucial for medicinal plant research and crop improvement strategies, as the evolution of these genes directly shapes a plant's immune repertoire [10] [6].

This technical review examines the macroevolutionary dynamics of NBS-LRR genes across major plant lineages, with particular focus on the distinct patterns observed between gymnosperms and angiosperms. We synthesize recent genomic evidence to elucidate the evolutionary forces driving gene family expansion and contraction, provide detailed methodological frameworks for NBS-LRR identification and analysis, and discuss the implications of these evolutionary patterns for plant immunity and specialized metabolism in medicinal species.

Comparative Evolutionary Dynamics of NBS-LRR Genes

Phylogenetic Distribution and Subfamily Divergence

The NBS-LRR gene family demonstrates striking lineage-specific variation in subfamily composition and gene content across plant phylogeny. Based on N-terminal domain structure, NBS-LRR genes are classified into three main subfamilies: TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), and RNL (RPW8-NBS-LRR) [18]. The distribution of these subfamilies reveals profound evolutionary divergence between major plant lineages.

Table 1: NBS-LRR Subfamily Distribution Across Plant Lineages

Plant Group	Species	Total NBS-LRR	CNL	TNL	RNL	Notable Patterns
Gymnosperms	Pinus taeda	311 (typical)	~10.7%	~89.3%	-	Massive TNL expansion
Monocots	Oryza sativa	505	100%	0%	0%	Complete TNL/RNL loss
Eudicots	Arabidopsis thaliana	207	Mixed	Mixed	Mixed	Balanced subfamilies
Medicinal Plants	Salvia miltiorrhiza	196 (62 typical)	61	0	1	Severe TNL reduction
Rosaceae	Various species	2188 (across 12 species)	Variable	Variable	Variable	Independent duplication/loss events

Gymnosperms, represented by Pinus taeda, exhibit a remarkable pattern of TNL subfamily dominance, with this subclass comprising approximately 89.3% of typical NBS-LRR genes [10]. This stands in stark contrast to monocot species such as Oryza sativa (rice), which have completely lost both TNL and RNL subfamilies, retaining only CNL-type genes [10]. Angiosperms demonstrate considerable variation in NBS-LRR content, with medicinal plants like Salvia miltiorrhiza (Danshen) showing a particularly dramatic reduction in TNL and RNL members—only 2 TNL and 1 RNL genes were identified from 196 NBS-LRR candidates in this species [10].

Evolutionary Patterns Across Plant Families

Recent genome-wide comparative analyses have revealed distinct evolutionary patterns of NBS-LRR genes across plant families, suggesting different evolutionary trajectories and selective pressures.

Table 2: Evolutionary Patterns of NBS-LRR Genes in Various Plant Families

Plant Family	Representative Species	Evolutionary Pattern	Key Characteristics
Poaceae	Rice, Maize, Sorghum	Contracting	Overall reduction in NBS-LRR numbers
Fabaceae	Medicago, Soybean, Common Bean	Consistent Expansion	Progressive increase in gene numbers
Solanaceae	Potato, Tomato, Pepper	Variable: Expansion/Contraction	Species-specific patterns
Rosaceae	Apple, Strawberry, Peach	Multiple distinct patterns	Range from "continuous expansion" to "sharp expansion followed by contraction"
Cucurbitaceae	Cucumber, Melon, Watermelon	Dominant loss and deficient duplication	Low copy numbers across species

The Rosaceae family presents particularly compelling case studies of diverse evolutionary patterns. Among 12 Rosaceae species analyzed, researchers identified multiple distinct evolutionary trajectories: Rubus occidentalis, Potentilla micrantha, Fragaria iinumae, and Gillenia trifoliata displayed a "first expansion and then contraction" pattern; Rosa chinensis exhibited "continuous expansion"; F. vesca showed "expansion followed by contraction, then further expansion"; while three Prunus species and three Maleae species shared an "early sharp expanding to abrupt shrinking" pattern [6].

These dynamic evolutionary patterns reflect independent gene duplication and loss events during Rosaceae divergence from 102 inferred ancestral genes (7 RNLs, 26 TNLs, and 69 CNLs) [6]. The substantial variation in NBS-LRR gene numbers across Rosaceae species—ranging from dozens to hundreds—highlights the remarkable plasticity of this gene family and its rapid adaptation to lineage-specific pathogenic pressures [6].

Genomic and Methodological Framework

Genome-Wide Identification Protocols

The identification and characterization of NBS-LRR genes across plant genomes follows a systematic bioinformatics workflow that combines multiple complementary approaches.

Data Retrieval and Preparation

Obtain whole genome sequences and annotation files from relevant databases (e.g., Genome Database for Rosaceae, NCBI)
Extract protein-coding sequences and corresponding genomic DNA sequences [6]

Initial Gene Identification

Perform BLASTP searches using known NBS domains (e.g., NB-ARC domain PF00931) as queries with threshold E-value of 1.0 [18] [6]
Conduct parallel HMMER searches with the hidden Markov model of the NB-ARC domain (PF00931) using default parameters [6]
Merge candidate genes from both approaches and remove redundant hits [18]

Domain Verification and Classification

Validate the presence of NBS domains using Pfam database (http://pfam.sanger.ac.uk/) with E-value cutoff of 10⁻⁴ [18] [6]
Confirm domain architecture using NCBI Conserved Domain Database (CDD) [18]
Identify N-terminal domains (CC, TIR, RPW8) using specialized tools:
- CC domains: Predict using Coiledcoil with threshold value of 0.5 [18]
- TIR and RPW8 domains: Identify via Pfam or CDD searches [6]
Classify genes into TNL, CNL, and RNL subclasses based on confirmed domain architecture [6]

Comparative and Evolutionary Analysis

Perform multiple sequence alignment of NBS domains
Construct phylogenetic trees using appropriate methods (Maximum Likelihood, Neighbor-Joining)
Analyze gene structures (intron/exon patterns) using tools like GSDS2.0
Identify conserved motifs using MEME Suite with parameters set to discover 10 motifs [6]
Determine chromosomal distributions and gene clustering patterns
Investigate evolutionary patterns through synteny analysis and divergence time estimation

Table 3: Key Research Reagents and Computational Tools for NBS-LRR Analysis

Category	Resource/Reagent	Specification/Function	Application Context
Domain Databases	Pfam Database	Curated protein family HMMs (e.g., PF00931 for NB-ARC)	Domain verification and classification
Sequence Analysis	NCBI CDD	Conserved Domain Database for domain identification	Supplementary domain confirmation
Motif Discovery	MEME Suite	Multiple EM for Motif Elicitation (typically 10 motifs)	Identification of conserved NBS domain motifs
Genome Databases	Genome Database for Rosaceae	Species-specific genome sequences and annotations	Data retrieval for comparative analyses
Classification Tool	Coiled-coil Prediction	Threshold value: 0.5 for CC domain identification	CNL subclass specification
Structural Analysis	GSDS2.0	Gene Structure Display Server	Intron/exon structure visualization
HMM Profiles	InterPro	Integrated resource of protein families, domains, sites	Hidden Markov Model generation for domain searches

Macroevolutionary Dynamics and Theoretical Frameworks

Evolutionary Trajectories of Gene Family Complexity

Recent research on macroevolutionary dynamics across eukaryotic lineages reveals a common pattern where gene family content peaks at major evolutionary transitions then gradually decreases toward extant organisms [19]. This pattern appears consistent across diverse lineages including deuterostomic animals (Homo sapiens), protostomic animals (Drosophila melanogaster), plants (Arabidopsis thaliana), and fungi (Saccharomyces cerevisiae) [19].

This evolutionary trajectory supports the "biphasic model" of genome complexity, which proposes that episodes of rampant increase in genome complexity through gene gain are followed by protracted periods of genome simplification through gene loss [19] [20]. Alternatively, the "complexity-by-subtraction model" predicts an initial rapid increase of complexity followed by decrease toward an optimum level over macroevolutionary time [19]. Both models suggest that simplification by gene family loss represents a dominant force in Phanerozoic genomes across various lineages, likely underpinned by intense ecological specializations and functional outsourcing [19].

For NBS-LRR genes, these macroevolutionary patterns manifest through lineage-specific expansions and contractions driven by differing selective pressures. Gymnosperms and angiosperms have experienced distinct evolutionary trajectories, with gymnosperms exhibiting lower rates of whole-genome duplication, fewer chromosomal rearrangements, and slower mutation rates compared to angiosperms [21]. These fundamental genomic differences have profoundly influenced the evolutionary dynamics of NBS-LRR genes in these lineages.

Methodological Approaches for Ancestral State Reconstruction

Understanding gene gain and loss patterns requires sophisticated statistical approaches for ancestral state reconstruction. Maximum likelihood methods have emerged as powerful tools for inferring gene content in ancestral species, including the Last Universal Common Ancestor (LUCA) [22].

These probabilistic models treat gene presence/absence as evolutionary states and estimate transition probabilities between states along phylogenetic branches. Advanced models incorporate multiple states representing not only gene presence/absence but also gene family size variations, providing more nuanced insights into evolutionary dynamics [22]. The crucial parameter in these models—the ratio of gene losses to gene gains—is typically estimated directly from genomic data, with empirical studies suggesting loss rates may be 2-4 times higher than gain rates in many lineages [22].

Functional and Practical Implications

Association with Secondary Metabolism and Stress Response

In medicinal plants like Salvia miltiorrhiza, NBS-LRR genes demonstrate intriguing connections to secondary metabolic pathways. Transcriptome analyses have revealed close associations between specific SmNBS-LRR genes and secondary metabolism, suggesting potential crosstalk between defense signaling and the production of bioactive compounds [10]. This relationship has significant implications for medicinal plant cultivation and metabolic engineering.

Promoter analyses of SmNBS genes have identified abundant cis-acting elements related to plant hormones and abiotic stress, indicating that these genes may integrate multiple signaling pathways to coordinate plant responses to both biotic and abiotic challenges [10]. This functional integration may explain the observed evolutionary patterns in medicinal plants, where specific NBS-LRR subfamilies have been preferentially retained or expanded based on their contributions to both defense and specialized metabolism.

Applications in Disease-Resistance Breeding

Understanding lineage-specific expansions and losses of NBS-LRR genes provides valuable insights for disease-resistance breeding programs. The identification of evolutionary patterns allows researchers to:

Prioritize specific NBS-LRR subfamilies for functional characterization based on their evolutionary history
Identify conserved, evolutionarily stable resistance genes that may provide more durable resistance
Develop markers for breeding programs based on phylogenetic relationships
Transfer knowledge from well-studied species to less-characterized crops based on evolutionary relationships

For non-model medicinal plants like Salvia miltiorrhiza, genome-wide analyses of NBS-LRR genes provide foundational resources for future functional characterization and molecular breeding efforts aimed at enhancing disease resistance while maintaining production of valuable secondary metabolites [10].

The evolutionary dynamics of NBS-LRR genes reveal a complex tapestry of lineage-specific expansions and losses across plant phylogeny. The striking contrast between gymnosperms, with their TNL-dominated repertoire, and angiosperms, with their diverse and variable NBS-LRR compositions, highlights the profound influence of evolutionary history on plant immune system architecture. These lineage-specific patterns reflect differing selective pressures, genomic constraints, and evolutionary trajectories that have shaped the genetic basis of plant immunity over millions of years.

The continued discovery and characterization of NBS-LRR genes across diverse plant lineages, coupled with advanced computational modeling of their evolutionary dynamics, will further illuminate the principles governing plant immunity evolution. This knowledge provides critical insights for managing plant diseases in agricultural and natural ecosystems, particularly in the face of changing climatic conditions and emerging pathogenic threats.

The Birth-and-Death Evolution Model and Genomic Organization Patterns

The nucleotide-binding site leucine-rich repeat (NBS-LRR) gene family represents one of the most extensive and dynamic gene families in plant genomes, serving as the primary source of disease resistance (R) genes against diverse pathogens [2] [23]. These genes encode proteins that function as critical intracellular immune receptors within the plant effector-triggered immunity (ETI) system, detecting pathogen effector proteins and initiating robust defense responses [24] [13]. The NBS-LRR family exhibits remarkable genetic diversity across plant species, with member counts ranging from approximately 50 in compact genomes like papaya and cucumber to over 1,000 in some flowering plants [1] [2] [23]. This striking variation in gene family size reflects a complex evolutionary history characterized by continuous gene gain and loss events—a process formally described as the birth-and-death evolution model [2].

Understanding the birth-and-death evolution and genomic organization of NBS genes provides crucial insights into plant-pathogen co-evolution and has significant implications for crop improvement strategies. This review synthesizes current knowledge of NBS gene evolutionary dynamics, genomic architecture, and regulatory mechanisms, framed within the context of broader research on NBS gene loss and gain across plant lineages. We further provide detailed methodologies for investigating these patterns and visualize key concepts and relationships through professionally designed diagrams to enhance comprehension of these complex evolutionary processes.

The Molecular Architecture and Classification of NBS Genes

Domain Organization and Structural Features

NBS-LRR proteins constitute some of the largest proteins in plants, ranging from approximately 860 to 1,900 amino acids in length [2]. These proteins exhibit a characteristic multi-domain architecture with at least four distinct domains connected by linker regions:

Variable N-terminal domain: This domain determines membership in one of two major subfamilies—Toll/interleukin-1 receptor (TIR) or coiled-coil (CC) motifs. A third minor category contains RPW8 domains [24] [13].
Nucleotide-Binding Site (NBS) domain: Also called the NB-ARC domain, this region contains several conserved motifs (P-loop, kinase-2, kinase-3a, GLPL, and MHDL) that function as a molecular switch through ATP/GTP binding and hydrolysis [2] [25].
Leucine-Rich Repeat (LRR) region: This C-terminal domain is highly variable and facilitates protein-protein interactions, primarily responsible for pathogen recognition specificity [2].
Variable C-terminal domains: These regions show substantial diversity across family members [2].

Table 1: Major NBS Protein Subfamilies and Their Characteristics

Subfamily	N-terminal Domain	Representative Species	Evolutionary Patterns
TNL	TIR (Toll/Interleukin-1 Receptor)	Arabidopsis thaliana, Soybean	Prevalent in dicots; absent in most monocots
CNL	CC (Coiled-Coil)	All angiosperms	Conserved across monocots and dicots
RNL	RPW8 (Resistance to Powdery Mildew 8)	Limited distribution	Involved in downstream signaling

Genomic Distribution and Variation Across Plant Lineages

The number and composition of NBS gene subfamilies vary dramatically across plant species, reflecting lineage-specific evolutionary paths [23]. Genomic analyses have revealed several key patterns:

In dicot species, both TNL and CNL subfamilies are typically present, often with TNL genes predominating. For example, Arabidopsis thaliana and soybean genomes contain two-fold to six-fold more TNL than CNL genes [23]. Conversely, in monocot species including cereals, TNL genes are almost entirely absent, with CNL genes representing the predominant NBS-LRR class [24] [2] [23]. This fundamental difference suggests that early angiosperm ancestors possessed few TNL genes that were subsequently lost in the cereal lineage [2].

Recent research in orchids demonstrates additional patterns of NBS gene evolution. Studies in Dendrobium species revealed significant degeneration of NBS-LRR genes, with only 22 intact NBS-LRR genes identified from 74 putative NBS genes in D. officinale [24]. This degeneration pattern, characterized by type changing and NB-ARC domain degeneration, appears common in the genus Dendrobium and contributes substantially to NBS gene diversity [24].

Table 2: NBS Gene Distribution Across Selected Plant Species

Plant Species	Total NBS Genes	TNL Genes	CNL Genes	RNL Genes	Genome Size
Akebia trifoliata	73	19	50	4	-
Dendrobium officinale	74 (22 with LRR)	0	10	-	1.23 Gb
Gossypium raimondii (diploid)	365	47 (TN+TNL)	146 (CN+CNL)	21 (RN+RNL)	~880 Mb
Gossypium hirsutum (allotetraploid)	588	35 (TN+TNL)	297 (CN+CNL)	28 (RN+RNL)	~2.5 Gb
Arabidopsis thaliana	~150	~100	~50	-	~135 Mb
Oryza sativa	~400	0	~400	-	~364 Mb

The Birth-and-Death Evolution Model: Mechanisms and Evidence

Core Principles of the Birth-and-Death Model

The birth-and-death evolution model describes the continuous process of gene duplication, diversification, and loss that shapes the NBS gene family [2]. Under this model:

Gene duplication creates new genetic material through various mechanisms including tandem duplication, segmental duplication, and transpositional events [2] [23].
Diversifying selection acts preferentially on solvent-exposed residues of the LRR domains, generating recognition specificities against evolving pathogen effectors [2].
Purifying selection maintains essential functional domains while allowing variation in recognition specificities [2].
Gene loss occurs when specific recognition capacities become obsolete due to pathogen extinction or shifts in defense priorities [2].

This evolutionary process results in differential expansion of specific NBS lineages across plant families. For example, distinct NBS subfamilies have undergone amplification in legumes, Solanaceae, and Asteraceae, creating family-specific resistance gene repertoires [2].

Genomic Evidence for Birth-and-Death Evolution

Comparative genomic analyses provide compelling evidence for birth-and-death evolution. In lettuce, NBS genes display heterogeneous evolutionary rates classified as type I and type II genes [2]. Type I genes evolve rapidly with frequent gene conversion events between paralogs, while type II genes evolve more slowly with rare gene conversion events between clades [2]. This heterogeneous evolutionary rate supports a density-dependent birth-and-death process where gene duplication and unequal crossing-over are followed by purifying selection acting on the haplotype [2].

Allotetraploid cotton species demonstrate how birth-and-death evolution operates following hybridization events. Gossypium hirsutum and Gossypium barbadense each possess approximately twice the number of NBS genes (588 and 682, respectively) compared to their diploid progenitors (G. arboreum: 246; G. raimondii: 365) [25]. However, this inheritance is asymmetric—G. hirsutum preferentially retained NBS genes from its G. arboreum progenitor, while G. barbadense retained more genes from its G. raimondii progenitor [25]. This asymmetric evolution correlates with disease resistance phenotypes, as G. raimondii and G. barbadense show greater resistance to Verticillium wilt, potentially linked to their higher retention of TNL genes [25].

Diagram 1: The Birth-and-Death Evolution Model of NBS Genes

Genomic Organization Patterns and Cluster Architecture

Chromosomal Distribution and Clustering Tendencies

NBS-LRR genes exhibit non-random, uneven distribution across plant chromosomes, with strong tendencies toward clustered organization [23] [25]. This clustering represents a fundamental genomic signature of the birth-and-death evolutionary process. The percentage of NBS genes organized in clusters varies significantly across species:

51% in Brachypodium distachyon [23]
50% in rice [23]
73% in potato (grouped into 63 clusters) [23]
Nearly 80% in Medicago truncatula [23]
56% in Akebia trifoliata (41 of 73 genes) [13]

Chromosomal distribution is typically asymmetric, with certain chromosomes harboring disproportionate numbers of NBS genes. For example, in Brachypodium distachyon, chromosome 4 contains approximately one-third of all NBS-LRR genes, while in Brassica rapa, chromosomes 3 and 9 contain more than half of the mapped NBS-LRR genes [23]. This uneven distribution reflects the location-specific nature of duplication events and selective pressures.

Cluster Types and Evolutionary Significance

NBS gene clusters are phylogenetically classified into two primary types:

Homogeneous clusters: Contain NBS-LRR genes derived from recent tandem duplication events, evidenced by phylogenetic grouping within species-wide gene trees [23].
Heterogeneous or mixed clusters: Contain NBS-LRR genes from different phylogenetic branches, resulting from ectopic duplication, transposition, or large-scale segmental duplication followed by local rearrangements [23].

Cluster organization facilitates evolutionary innovation through several mechanisms. Physical proximity enables frequent sequence exchange between paralogs through unequal crossing-over and gene conversion, generating novel recognition specificities [2]. This rapid diversification allows plant genomes to keep pace with evolving pathogen populations. Additionally, clusters may function as evolutionary reservoirs where multiple recognition specificities are maintained, providing broader spectrum resistance capabilities [23].

Methodologies for Investigating NBS Gene Evolution

Genomic Identification and Classification Protocols

Comprehensive identification of NBS genes requires integrated bioinformatic approaches:

Initial Identification: Perform BLASTP searches against target genomes using known NBS protein sequences (e.g., NB-ARC domain PF00931) as queries, with E-values typically set at 1.0 [13].
Domain Validation: Apply hidden Markov model (HMM) profiling using the NB-ARC domain (PF00931) to scan candidate genes, followed by Pfam database analysis to verify NBS domain presence (E-value threshold 10⁻⁴) [13].
Subfamily Classification: Analyze identified NBS sequences using multiple databases:
- NCBI Conserved Domain Database to identify TIR (PF01582), RPW8 (PF05659), and LRR (PF08191) domains [13]
- Coiled-coil domain prediction using tools like Coiledcoil with threshold 0.5 [13]
Structural Analysis: Predict conserved motifs in NBS domains using MEME Suite with parameters set to identify 8-10 motifs with widths ranging from 6-50 amino acids [13].

Evolutionary and Phylogenetic Analysis Methods

Phylogenetic Reconstruction: Construct gene trees using NBS protein sequences, particularly focusing on NBS domain regions, employing maximum likelihood or Bayesian methods [24] [25].
Evolutionary Rate Analysis: Calculate ratios of non-synonymous to synonymous nucleotide substitutions (dN/dS) to identify sites under diversifying selection, particularly in LRR domains [2].
Synteny Analysis: Compare genomic regions containing NBS genes across related species to identify orthologous relationships and evolutionary conservation [25].
Expression Profiling: Analyze RNA-seq data across tissues, developmental stages, and pathogen challenge conditions to identify functional constraints and regulatory patterns [24] [13].

Table 3: Essential Research Reagents and Tools for NBS Gene Studies

Category	Specific Tool/Reagent	Application	Key Features
Bioinformatic Tools	HMMER 3.1b2	Domain identification	Hidden Markov Model profiling for NB-ARC domain
	MEME Suite	Motif discovery	Identifies conserved protein motifs
	Pfam Database	Domain verification	Curated protein family database
	NCBI CDD	Domain classification	Conserved Domain Database analysis
Genomic Resources	3D-GDP Database	3D genome analysis	Plant 3D-genome database with 26 species
	Micro-C-XL data	Chromatin organization	Nucleosome-resolution interaction maps
Experimental Methods	Micro-C-XL	Chromatin conformation	Maps fine-scale chromatin organization
	RNA-seq	Expression analysis	Transcriptome profiling under various conditions

Regulatory Mechanisms and Co-evolutionary Dynamics

miRNA-Mediated Regulation of NBS Genes

Plants implement sophisticated regulatory mechanisms to control NBS-LRR gene expression, particularly through miRNA-mediated pathways. At least eight families of miRNAs have been identified that target NBS-LRR genes, with these miRNA-NBS-LRR regulatory systems tracing back to gymnosperms [1]. These miRNAs typically target highly duplicated NBS-LRRs, while heterogeneous NBS-LRR families are rarely targeted by miRNAs in Poaceae and Brassicaceae genomes [1].

The miR482/2118 superfamily represents a conserved regulatory pathway that targets the P-loop motif of NBS-LRR genes [1]. This co-evolutionary relationship exhibits periodic emergence of new miRNAs from duplicated NBS-LRR sequences, with most newly emerged miRNAs targeting the same conserved protein motifs—a pattern consistent with convergent evolution [1]. Nucleotide diversity in the wobble position of codons within miRNA target sites drives miRNA diversification, creating a feedback loop between NBS-LRR sequence variation and regulatory miRNA evolution [1].

Chromatin Organization and Epigenetic Regulation

Three-dimensional genome architecture plays a crucial role in regulating NBS gene expression and evolution. Advanced chromatin conformation capture technologies like Micro-C-XL have revealed fine-scale chromatin organization in plants, identifying over 14,000 boundary elements in Arabidopsis that correlate with chromatin accessibility, epigenetic modifications, and transcription factor binding [26].

RNA polymerase II (Pol II) significantly influences local chromatin organization, with genetic and chemical perturbation experiments confirming Pol II's role in establishing local chromatin domains [26]. Enhancer-promoter loops and stripe structures observed through high-resolution chromatin interaction maps provide insights into long-range regulatory mechanisms controlling NBS gene expression [26]. Super-enhancers frequently associate with these visible chromatin loops, offering direct evidence for complex distal regulation of immune gene networks in plants [26].

Diagram 2: Integrated Regulatory Network Controlling NBS Gene Expression

The birth-and-death evolution model and genomic organization patterns of NBS genes represent a paradigm for understanding plant-pathogen co-evolution. The dynamic nature of this gene family—characterized by continuous gene duplication, functional diversification, and selective loss—enables plants to maintain effective immune recognition systems against rapidly evolving pathogens. Cluster-based genomic organization facilitates evolutionary innovation through enhanced sequence exchange and functional diversification.

Future research directions should focus on several key areas:

Integrating multi-omics data to connect NBS gene sequence variation with three-dimensional chromatin architecture, epigenetic modifications, and expression dynamics.
Elucidating the mechanistic basis of TNL gene loss in monocot lineages and its functional compensation by alternative resistance mechanisms.
Leveraging advanced genome editing technologies to engineer novel resistance specificities based on natural evolutionary principles.
Expanding comparative genomics to non-angiosperm plant species to reconstruct the deep evolutionary history of plant immune gene families.

Understanding these evolutionary patterns and Genomic Organization Principles provides fundamental insights for crop improvement strategies, enabling more precise manipulation of disease resistance traits while minimizing potential fitness costs associated with NBS gene expression. The continued investigation of NBS gene birth-and-death evolution will undoubtedly yield both basic scientific insights and practical applications for sustainable agriculture.

Impact of Whole-Genome Duplication vs. Tandem Duplication on Family Expansion

Gene duplication is a fundamental driver of evolutionary innovation, with whole-genome duplication (WGD) and tandem duplication (TD) representing two predominant mechanisms with distinct evolutionary consequences. This technical analysis examines how these duplication modes differentially shape gene family expansion, functional diversification, and adaptive evolution in plants, with specific focus on nucleotide-binding site (NBS) resistance gene dynamics. Evidence across multiple plant lineages reveals that WGD events produce duplicates with significant functional retention in regulatory processes, while TD mechanisms generate genes preferentially involved in environmental adaptation and biotic stress responses. The contrasting evolutionary fates of genes derived from these duplication mechanisms illuminate fundamental principles governing genome plasticity and adaptive potential in flowering plants.

Plant genomes have substantially higher gene duplication rates compared with most other eukaryotes, with duplicates primarily derived from whole-genome and tandem duplication events [27]. These mechanisms create genetic raw material for evolutionary innovation through relaxation of selective constraints on duplicated copies. However, the scale, frequency, and functional consequences differ dramatically between duplication modes, leading to distinct patterns of gene retention and family expansion.

Whole-genome duplication (WGD or polyploidization) represents an episodic, catastrophic genomic event that duplicates all genes simultaneously, followed by extensive fractionation and selective retention of dosage-sensitive genes [28]. In contrast, tandem duplication (TD) occurs continuously through localized unequal crossing-over, producing clusters of adjacent gene copies that experience rapid functional divergence under selective pressures [29]. The plant kingdom exhibits remarkable propensity for both mechanisms, with approximately 70% of angiosperms having undergone at least one WGD event in their evolutionary history [28], while TD provides a constant supply of genetic variants for adaptation to continuously changing environments [28].

This technical review synthesizes current understanding of how these duplication mechanisms differentially shape gene family expansion, with emphasis on NBS-encoding resistance genes that illustrate contrasting evolutionary trajectories. We integrate comparative genomic analyses, expression studies, and evolutionary models to provide a comprehensive framework for understanding duplication-mediated genome evolution.

Mechanisms and Genomic Signatures of Duplication Modes

Whole-Genome Duplication (WGD) Characteristics

WGD involves duplication of the entire genome through mechanisms including autopolyploidization (within-species genome doubling) or allopolyploidization (hybridization between species followed by genome doubling). The genomic signatures of WGD include:

Syntenic blocks: Large chromosomal regions showing conserved gene order and content between duplicated regions
Ks peaks: Peaks in the distribution of synonymous substitution rates (Ks) between paralogs, indicating simultaneous duplication events
Fractionation: Preferential loss of duplicated genes from one homeologous region following polyploidization

WGD-derived genes typically exhibit slower sequence divergence and are preferentially retained in dosage-sensitive pathways including transcription factors, protein kinases, and ribosomal proteins [28]. Recent spatial transcriptomic studies reveal that WGD-derived paralogs maintain more conserved expression profiles across cell types due to preservation of cis-regulatory landscapes [30].

Tandem Duplication (TD) Characteristics

TD generates gene copies located in close proximity on chromosomes, typically separated by less than 100 kilobases. The mechanisms include:

Unequal crossing-over: Misalignment of homologous chromosomes during meiosis
Replication slippage: Template switching during DNA replication
Transposon-mediated duplication: Movement of gene fragments via transposable elements

TD-derived genes experience stronger selective pressure and exhibit rapid functional divergence compared to WGD-derived genes [28]. They are frequently organized in clusters and demonstrate lineage-specific expansion patterns, consistent with adaptation to rapidly changing environmental conditions [27].

Table 1: Comparative Features of Whole-Genome and Tandem Duplication

Feature	Whole-Genome Duplication (WGD)	Tandem Duplication (TD)
Genomic scale	Entire genome duplication	Localized gene duplication
Frequency	Episodic (millions of years between events)	Continuous
Gene retention bias	Dosage-sensitive genes, transcription factors	Stress-responsive genes, defense genes
Selective pressure	Weaker purifying selection	Stronger positive selection
Expression evolution	Conserved expression profiles	Rapid expression divergence
Typical fate	Subfunctionalization, retention of core functions	Neofunctionalization, adaptive specialization
Role in evolution	Genome stability, developmental complexity	Rapid adaptation, environmental response

Evolutionary Consequences and Functional Diversification

Contrasting Evolutionary Dynamics

The evolutionary trajectories of WGD and TD-derived genes follow distinct paths influenced by their mechanisms of origin. WGD-derived paralogs experience an initial period of relaxed selection followed by strong purifying selection that preserves ancestral functions, particularly for genes involved in multiprotein complexes and dose-sensitive regulatory networks [30]. In contrast, TD-derived genes undergo rapid functional diversification driven by positive selection, resulting in lineage-specific adaptations.

Large-scale genomic analyses across 141 plant genomes reveal that the number of WGD-derived duplicate genes decreases exponentially with increasing age of duplication events, while the frequency of tandem and proximal duplications shows no significant decrease over time, providing a continuous supply of genetic variants [28]. This temporal dynamic creates complementary evolutionary roles: WGD provides infrequent but comprehensive genomic rewiring, while TD enables continuous fine-tuning of specific gene families in response to environmental pressures.

Expression Divergence and Regulatory Evolution

Spatial transcriptomics across diverse angiosperms demonstrates that duplication mechanisms profoundly influence expression evolution. WGD-derived paralogs maintain broad expression patterns across multiple tissue types, often serving as hubs in coexpression networks [30]. This conservation stems from retention of ancestral transcription factor binding sites in promoters and enhancers.

TD-derived genes exhibit more asymmetric expression divergence, where one copy maintains the ancestral expression pattern while the other evolves tissue-specific or condition-specific expression [30]. This pattern facilitates functional specialization, particularly for defense-related genes that require rapid induction under specific stress conditions. Recent studies in Aurantioideae species confirm that TD-derived genes show higher expression differentiation between tissue types compared to WGD-derived genes [29].

Case Study: NBS Resistance Gene Evolution

NBS Gene Family Dynamics

Nucleotide-binding site (NBS)-encoding genes represent the largest family of plant resistance (R) genes, playing crucial roles in pathogen recognition and defense activation. Comparative genomic analyses reveal striking contrasts in how WGD and TD have shaped NBS gene family expansion across plant lineages:

In Solanaceae species (potato, tomato, and pepper), NBS genes primarily expand through species-specific tandem duplications rather than WGD events [31]. These genes typically cluster as tandem arrays on chromosomes, with few existing as singletons. Phylogenetic analysis of 447, 255, and 306 NBS-encoding genes from potato, tomato, and pepper, respectively, indicates they were derived from approximately 150 CNL, 22 TNL, and 4 RNL ancestral genes, with independent gene loss and duplication events after speciation [31].

The evolutionary patterns of NBS genes differ substantially between lineages:

Potato: "Consistent expansion" pattern
Tomato: "First expansion and then contraction" pattern
Pepper: "Shrinking" pattern [31]

These lineage-specific trajectories demonstrate how tandem duplication creates divergent NBS repertoires even in closely related species, potentially driving differences in pathogen resistance.

Table 2: NBS-Encoding Gene Family Expansion Patterns Across Plant Taxa

Plant Family/Species	Total NBS Genes	Expansion Mechanism	Evolutionary Pattern
Solanaceae
Potato (S. tuberosum)	447	Predominantly TD	Consistent expansion
Tomato (S. lycopersicum)	255	Predominantly TD	Expansion then contraction
Pepper (C. annuum)	306	Predominantly TD	Shrinking pattern
Cucurbitaceae
Cucumber (C. sativus)	57	Mixed, with frequent gene loss	Limited expansion
Brassicaceae	Various	WGD and TD	Expansion followed by contraction
Akebia trifoliata	73	Tandem and dispersed duplications	Moderate expansion

Functional and Structural Consequences

NBS genes expanded through TD exhibit distinct functional and structural characteristics. They are significantly enriched in biotic stress responses and show asymmetric expansion patterns between lineages, consistent with lineage-specific adaptation to pathogens [27]. Expression analyses in Akebia trifoliata demonstrate that tandemly duplicated NBS genes generally show low baseline expression but can be strongly induced during later developmental stages in specific tissues like fruit rinds [13], suggesting specialized defensive roles.

Structural analysis of NBS genes reveals that tandem duplicates often display exon/intron structural variation within clusters, with CNL-type genes typically containing fewer exons than TNL-type genes [13]. This structural diversity may facilitate alternative splicing and functional versatility in pathogen recognition.

Methodological Framework for Analyzing Duplication Mechanisms

Genomic Identification Pipeline

The accurate identification of duplication mechanisms requires integrated bioinformatic approaches. The DupGen_finder pipeline [28] provides a comprehensive framework for classifying duplicated genes into five categories: WGD, TD, proximal duplication (PD), transposed duplication (TRD), and dispersed duplication (DSD). Key methodological steps include:

Synteny analysis: Identification of WGD-derived blocks through genomic alignment
Gene clustering: Detection of tandem arrays through chromosomal proximity analysis
Phylogenetic reconciliation: Distinguishing different duplication modes through gene tree-species tree comparison
Ks distribution analysis: Dating duplication events through synonymous substitution rates

For NBS gene identification, hidden Markov model (HMM) searches using the NB-ARC domain (Pfam: PF00931) as query sequence provide the most reliable results [13] [4]. Additional domain analysis (TIR, CC, LRR, RPW8) enables functional classification into subfamilies (TNL, CNL, RNL).

Diagram 1: Bioinformatics workflow for identifying duplication mechanisms and analyzing their evolutionary consequences. Blue nodes represent key classification steps, while red indicates final biological interpretation.

Evolutionary Analysis Methods

Evolutionary analyses focus on quantifying selection pressures and functional divergence between duplication mechanisms:

Ka/Ks analysis: Ratio of nonsynonymous to synonymous substitutions indicates selection pressure
Expression divergence: Measurement of expression profile differences between duplicates
Gene conversion detection: Identification of nonreciprocal recombination events
Subfunctionalization tests: Assessment of partitioned ancestral functions between duplicates

These analyses consistently demonstrate that TD-derived genes experience stronger selective pressure and faster functional divergence compared to WGD-derived genes [28]. In Aurantioideae, Ka/Ks analysis confirms all duplication types are under purifying selection, with TD and proximal duplication undergoing the most rapid functional divergence [29].

Experimental Validation Approaches

Functional validation of duplication mechanisms employs multiple experimental approaches:

Gene expression profiling: RNA-seq across tissues, developmental stages, and stress conditions
Virus-induced gene silencing (VIGS): Functional characterization of candidate genes, as demonstrated for GaNBS in cotton [4]
Protein interaction studies: Yeast-two-hybrid and co-immunoprecipitation for protein complex analysis
Transgenic complementation: Functional assessment through heterologous expression

These methods have revealed that tandemly duplicated NBS genes frequently evolve novel specificities while maintaining core signaling components, creating pathogen recognition networks with both conserved and specialized elements.

Table 3: Essential Research Resources for Studying Gene Duplication Mechanisms

Resource Type	Specific Examples	Application/Function
Bioinformatics Tools	DupGen_finder [28]	Classification of duplication modes
	OrthoFinder [4]	Orthogroup inference and gene family analysis
	MCScanX	Synteny and collinearity analysis
Databases	Plant Duplicate Gene Database (PlantDGD) [28]	Repository of duplicated genes from 141 plant genomes
	Pfam database	Protein domain identification and classification
	Phytozome	Plant genomic data and comparative genomics
Experimental Methods	Virus-Induced Gene Silencing (VIGS) [4]	Rapid functional characterization of candidate genes
	Spatial transcriptomics [30]	Cell-type specific expression analysis of paralogs
	Long-read sequencing (ONT, PacBio) [32]	Structural variant detection in polyploid genomes
Analytical Approaches	Ks distribution analysis [28]	Dating duplication events
	Gene tree-species tree reconciliation [27]	Inference of duplication and loss history
	MEME Suite [13]	Conserved motif identification in protein sequences

Whole-genome and tandem duplication mechanisms create complementary evolutionary dynamics that collectively shape plant genome architecture and adaptive potential. WGD provides foundational genetic material for developmental and regulatory complexity, while TD enables rapid, lineage-specific adaptation to biotic and abiotic stresses. The contrasting evolutionary fates of genes derived from these mechanisms reflect fundamental principles of gene balance and functional innovation.

Future research directions include:

Integrating pan-genome approaches to capture species-level variation in duplication patterns
Applying single-cell transcriptomics to elucidate cell-type specific expression divergence
Developing machine learning models to predict duplicate gene retention based on sequence and network features
Exploring epigenetic regulation of duplicated genes across different genomic contexts

Understanding these duplication mechanisms provides crucial insights for crop improvement strategies, particularly for enhancing disease resistance through manipulation of NBS gene family dynamics. The continued development of genomic resources and analytical methods will further illuminate how duplication-driven evolution creates biological diversity across the plant kingdom.

From Genomes to Function: Advanced Methods for NBS-LRR Identification and Characterization

This technical guide provides a comprehensive overview of genome-wide screening methodologies utilizing HMMER and domain architecture analysis for identifying nucleotide-binding site (NBS) domain genes across plant lineages. The expansion and contraction of NBS genes represent a dynamic evolutionary process with significant implications for plant immunity and adaptation. We detail robust bioinformatic workflows for gene family identification, classification, and evolutionary analysis, supplemented by experimental validation protocols. Within the broader context of plant lineage evolution, this review synthesizes findings from recent comparative genomic studies that reveal patterns of NBS gene loss and gain, offering insights into co-evolutionary arms races between plants and their pathogens. The technical frameworks presented herein serve as essential resources for researchers investigating plant disease resistance mechanisms and evolutionary biology.

Plant genomes encode complex defense systems comprising numerous resistance (R) genes that confer protection against diverse pathogens. The nucleotide-binding site leucine-rich repeat (NBS-LRR or NLR) gene family represents the largest and most important class of plant R genes, with over 60% of cloned functional R genes in angiosperms belonging to this family [33]. The NBS domain serves as a molecular switch for ATP/GTP binding and hydrolysis, providing energy for defense signaling activation, while the LRR domain facilitates pathogen recognition specificity [7]. These genes are categorized into distinct subclasses based on their N-terminal domains: coiled-coil (CC-NBS-LRR or CNL), Toll/interleukin-1 receptor (TIR-NBS-LRR or TNL), and resistance to powdery mildew 8 (RPW8-NBS-LRR or RNL) [34].

The remarkable diversity of NBS genes across plant species reflects dynamic evolutionary processes driven by perpetual arms races with rapidly evolving pathogens. Comparative genomics has revealed substantial variation in NBS gene numbers, with recent studies identifying 12,820 NBS-domain-containing genes across 34 species spanning from mosses to monocots and dicots [4]. This expansion and contraction of NBS genes is not random but follows distinct evolutionary patterns across plant lineages. For instance, gymnosperms like Pinus taeda exhibit significant expansion of the TNL subfamily (comprising 89.3% of typical NBS-LRRs), while monocotyledonous species such as Oryza sativa have completely lost TNL and RNL subfamilies [10]. Similarly, comparative analysis of Salvia species revealed a marked reduction in TNL and RNL subfamily members [10].

Understanding these evolutionary patterns requires robust methodological frameworks for identifying and characterizing NBS genes across diverse species. This guide details comprehensive protocols for genome-wide screening using HMMER and domain architecture analysis, enabling researchers to systematically investigate NBS gene loss and gain across plant lineages.

HMMER-Based Identification of NBS Genes

Theoretical Foundations of Hidden Markov Models

Hidden Markov Models (HMMs) represent a powerful probabilistic framework for modeling multiple sequence alignments and capturing conserved domain signatures within protein families. Profile HMMs effectively model position-specific amino acid frequencies, insertion probabilities, and deletion probabilities across a conserved domain, enabling sensitive detection of even distantly related family members [35]. The HMMER software package implements this methodology for biological sequence analysis, providing optimized tools for database searching and sequence alignment [35]. For NBS gene identification, the approach capitalizes on the conserved NB-ARC domain (Pfam accession PF00931), which contains characteristic nucleotide-binding motifs critical for protein function.

Practical Implementation with HMMER

The standard workflow for HMMER-based identification of NBS genes involves sequential steps with optimized parameters:

Step 1: Domain Profile Acquisition Download the NBS (NB-ARC) domain HMM profile (PF00931) from the Pfam database (https://pfam.xfam.org/). This profile serves as a query for subsequent searches.

Step 2: Genome-Wide Scanning Perform a domain search against the target proteome using hmmsearch from the HMMER package (v3.3.2) with the following parameters:

The E-value threshold of 1e-4 provides a balance between sensitivity and specificity [36] [37].

Step 3: Candidate Verification Validate putative NBS-containing proteins using the NCBI's Conserved Domain Database (CDD) and SMART (Simple Modular Architecture Research Tool) to confirm the presence of characteristic NBS domain motifs [36].

Step 4: Redundancy Elimination Remove redundant sequences and partial genes, retaining only full-length candidates for subsequent analysis.

Table 1: HMMER Implementation Parameters for NBS Gene Identification Across Selected Studies

Plant Species	HMMER Version	E-value Threshold	NBS Genes Identified	Reference
Arabidopsis halleri	3.3.2	1e-4 (cut_tc)	12	[36]
Salvia miltiorrhiza	Not specified	1e-4	196	[10]
Passiflora edulis	3.0	1e-4	25 (purple)	[34]
Vernicia fordii	Not specified	1e-4	90	[7]
Zingiber officinale	3.0	Not specified	20 TCP genes	[38]

This methodology has been successfully applied across diverse plant taxa. For example, a comprehensive analysis of 34 plant species identified 12,820 NBS-domain-containing genes using HMMER with a stringent E-value cutoff of 1.1e-50 [4]. Similarly, studies in eggplant identified 269 NBS genes using this approach [33]. The consistency in methodology across studies enables comparative evolutionary analyses and reveals that NBS genes can constitute up to 0.42% of all annotated protein-coding genes in some species, as observed in Salvia miltiorrhiza [10].

Domain Architecture Analysis and Classification

Principles of Domain Architecture Analysis

Domain architecture analysis provides critical insights into protein function and evolutionary relationships. For NBS genes, this approach enables systematic classification based on domain composition and organization, revealing evolutionary patterns across plant lineages. The fundamental principle involves identifying characteristic domain combinations that define specific NBS gene subclasses, including N-terminal domains (CC, TIR, or RPW8), the central NBS domain, and C-terminal LRR regions.

Comprehensive Classification Framework

NBS genes are classified into distinct categories based on domain presence and completeness:

Typical NBS-LRR Genes:

CNL: CC-NBS-LRR (N-terminal coiled-coil domain)
TNL: TIR-NBS-LRR (N-terminal TIR domain)
RNL: RPW8-NBS-LRR (N-terminal RPW8 domain)

Atypical NBS Genes:

N: NBS domain only
TN: TIR-NBS
CN: CC-NBS
NL: NBS-LRR

The classification workflow employs multiple bioinformatic tools:

CC Domain Prediction: Use COILS (https://toolkit.tuebingen.mpg.de/pcoils) with a threshold of 0.9 [33]
TIR Domain Identification: Use Pfam (PF01582) and SMART databases
RPW8 Domain Detection: Use Pfam (PF05659)
LRR Domain Confirmation: Use Pfam (PF13855.9) and SMART

Table 2: NBS Gene Classification in Selected Plant Species Revealing Evolutionary Patterns

Plant Species	CNL	TNL	RNL	Atypical	Total	Lineage-Specific Pattern
Solanum melongena (Eggplant)	231	36	2	0	269	TNL retention in eudicot
Salvia miltiorrhiza	61	0	1	134	196	Complete TNL loss
Vernicia montana	9	3	0	137	149	Partial TNL retention
Vernicia fordii	12	0	0	78	90	Complete TNL loss
Passiflora edulis (Purple)	25	0	0	0	25	Complete TNL loss

Domain architecture analysis has revealed significant evolutionary dynamics in NBS genes across plant lineages. For instance, studies across multiple Salvia species (S. miltiorrhiza, S. bowleyana, S. divinorum, S. hispanica, and S. splendens) revealed an absence of TNL subfamily members and limited RNL copies (only one or two), far fewer than in other angiosperms such as Arabidopsis thaliana, Nicotiana tabacum, and Vitis vinifera [10]. Similarly, analysis of Vernicia species identified 90 NBS-LRRs in susceptible V. fordii and 149 in resistant V. montana, with notable differences in TIR domain presence [7]. These distribution patterns reflect lineage-specific evolutionary trajectories, including independent losses of specific NBS subclasses.

Evolutionary Analysis of NBS Gene Family

Phylogenetic and Orthology Analysis

Evolutionary analysis of NBS genes provides insights into lineage-specific expansion and contraction patterns. The standard phylogenetic workflow involves:

Multiple Sequence Alignment: Use ClustalW or MAFFT with default parameters to align NBS domain sequences [37]
Phylogenetic Tree Construction: Apply Maximum Likelihood method with IQ-TREE, selecting the best-fit substitution model via ModelFinder [37]
Branch Support Assessment: Calculate SH-aLRT and UFBoot2 values with 1000 bootstrap replicates [37]
Orthogroup Delineation: Use OrthoFinder v2.5.1 with DIAMOND for sequence similarity searches and MCL for clustering [4]

This approach has revealed conserved orthogroups across plant lineages. A comprehensive analysis identified 603 orthogroups (OGs), including core OGs (OG0, OG1, OG2) present across multiple species and unique OGs (OG80, OG82) specific to particular lineages [4]. Expression profiling demonstrated upregulation of OG2, OG6, and OG15 orthogroups under various biotic and abiotic stresses in cotton, suggesting conserved functional roles [4].

Gene Gain and Loss Dynamics

Comparative genomic analyses across plant lineages reveal dynamic patterns of NBS gene expansion and contraction:

Contraction Patterns: Apiaceae species exemplify contraction dynamics, with Angelica sinensis containing only 95 NLR genes compared to 183 in Coriandrum sativum, representing different evolutionary trajectories within the same family [37]. Analysis of NLR genes in four Apiaceae species demonstrated they were derived from 183 ancestral NLR lineages that experienced different levels of gene-loss and gain events [37].

Expansion Mechanisms: Tandem duplication represents a primary mechanism for NBS gene expansion. In eggplant, 269 SmNBS genes showed uneven distribution across chromosomes, with predominant clusters on chromosomes 10, 11, and 12, and evolutionary analysis demonstrated that tandem duplication events mainly contributed to SmNBS expansion [33]. Similarly, passion fruit CNL genes expanded through both segmental (17 gene pairs) and tandem duplications (17 gene pairs) [34].

Lineage-Specific Evolution: Brassicaceae species exhibit first expansion then contraction of NLR genes [37], while Fabaceae species show consistent expansion [37]. These contrasting evolutionary patterns reflect different host-pathogen co-evolutionary dynamics across plant families.

Experimental Validation and Functional Characterization

Expression Profiling Methodologies

Transcriptomic analysis provides critical insights into NBS gene regulation under various conditions. Standard approaches include:

RNA-Seq Analysis: Utilize Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values from databases such as the Plant RNA-seq Database (http://ipf.sustech.edu.cn/pub/) [4]. Categorize expression data into tissue-specific, abiotic stress-specific, and biotic-stress-specific profiles.

qRT-PCR Validation: Perform quantitative real-time PCR using SYBR Green chemistry with three technical replicates. Calculate relative expression using the 2−ΔΔCT method with ACTIN1 as an internal control [36]. Specific example from eggplant bacterial wilt response: Collect root tissues at 0, 24, and 48 hours post-inoculation with Ralstonia solanacearum (108 cfu/mL concentration) using root-dipping inoculation [33].

Expression analyses have revealed functionally important NBS genes across species. In passion fruit, transcriptome data indicated that PeCNL3, PeCNL13, and PeCNL14 were differentially expressed under Cucumber mosaic virus and cold stress [34]. In eggplant, qRT-PCR analysis demonstrated that nine SmNBS genes showed differential expression patterns in response to R. solanacearum stress, with EGP05874.1 potentially involved in the resistance response [33].

Functional Characterization Techniques

Virus-Induced Gene Silencing (VIGS): VIGS provides an efficient approach for functional validation. A detailed protocol for validating NBS gene function includes:

Amplify a 300-400 bp gene-specific fragment and clone into TRV2 vector
Transform vectors into Agrobacterium tumefaciens strain GV3101
Infiltrate young leaves with agrobacterium mixture (OD600 = 1.0)
Monitor disease symptoms and quantify pathogen biomass after challenge inoculation

This approach successfully demonstrated that silencing of GaNBS (OG2) in resistant cotton increased susceptibility to cotton leaf curl disease [4].

Luciferase Complementation Assays: Protein-protein interactions can be validated using luciferase complementation imaging (LCI) assays:

Clone full-length CDS of target genes into pCAMBIA1300-cLuc and pCAMBIA1300-nLuc vectors
Co-transform vector pairs into Agrobacterium strain GV3101
Inject agrobacterium mixtures into Nicotiana benthamiana leaves
Image luciferase activity after 48-72 hours using CCD camera [36]

This method confirmed interactions between AtMBD3 and several AtMBD protein members in Arabidopsis [36].

The Scientist's Toolkit: Essential Research Reagents

Table 3: Essential Research Reagents for NBS Gene Identification and Functional Characterization

Reagent/Tool	Specifications	Application	Reference
HMMER Software	Version 3.3.2	Domain-based gene identification	[36] [35]
Pfam Database	PF00931 (NB-ARC)	HMM profile source	[36] [33]
CDD Database	NCBI's Conserved Domain Database	Domain verification	[36]
SMART Tool	http://smart.embl-heidelberg.de/	Domain architecture analysis	[36]
Phytozome	v13 database	Genomic resources	[36]
TRV VIGS Vectors	TRV1 and TRV2	Functional gene validation	[4]
Agrobacterium tumefaciens	GV3101 strain	Plant transformation	[4]
pCAMBIA1300-cLuc/nLuc	Luciferase complementation vectors	Protein interaction studies	[36]

The integrated application of HMMER-based identification, domain architecture analysis, and evolutionary profiling provides a powerful framework for investigating NBS gene dynamics across plant lineages. The methodologies detailed in this technical guide have revealed profound patterns of gene loss and gain, reflecting continuous evolutionary arms races between plants and their pathogens. Technical standards for E-value thresholds (typically 1e-4), domain validation pipelines, and phylogenetic methodologies enable comparative analyses across species. The consistent finding of lineage-specific NBS gene contractions and expansions—from complete TNL loss in Salvia species to differential NBS repertoire sizes in Vernicia species—highlights the dynamic nature of plant immune gene evolution. These genome-wide screening approaches continue to illuminate the complex co-evolutionary dynamics shaping plant genomes and provide essential methodologies for identifying candidate genes for crop improvement programs.

Orthogroup Analysis and Phylogenetic Reconstruction Across Species

Orthogroup analysis represents a fundamental methodology in comparative genomics that enables researchers to infer evolutionary relationships across species by identifying groups of genes descended from a single ancestral gene in a common ancestor. This technical guide provides a comprehensive framework for conducting orthogroup analysis and phylogenetic reconstruction, with specific application to studying nucleotide-binding site (NBS) gene gain and loss patterns across plant lineages. We detail scalable computational methods, visualization approaches, and experimental protocols that collectively empower researchers to trace evolutionary histories, identify key genetic innovations, and understand the dynamic patterns of gene family evolution that underpin plant immunity mechanisms.

Orthogroup analysis has emerged as a cornerstone of modern comparative genomics, providing a systematic framework for identifying evolutionarily related genes across multiple species. An orthogroup is defined as a set of genes descended from a single ancestral gene in the last common ancestor of the species being considered, encompassing both orthologs and paralogs [39]. This approach has proven particularly valuable for studying gene family evolution, as it enables researchers to trace duplication events, gene losses, and functional diversification across evolutionary timescales.

When applied to the study of NBS gene families—the largest class of plant disease resistance (R) genes—orthogroup analysis reveals dynamic evolutionary patterns characterized by frequent gene gain and loss events [31] [6]. These genes, which encode proteins containing nucleotide-binding site and leucine-rich repeat domains (NBS-LRR), play critical roles in plant immunity by recognizing pathogen effectors and initiating defense responses [31] [11]. The copy number variation of NBS genes across plant lineages reflects an evolutionary arms race between plants and their pathogens, with different species exhibiting distinct patterns of gene family expansion and contraction.

Table 1: Classification of NBS-LRR Genes Based on N-Terminal Domains

Gene Subclass	N-Terminal Domain	Key Characteristics	Representative Functions
TNL	Toll/Interleukin-1 Receptor (TIR)	Predominant in eudicots; absent in monocots	Triggers resistance pathways via EDS1 signaling
CNL	Coiled-Coil (CC)	Most abundant subclass across angiosperms	Direct pathogen recognition and immunity activation
RNL	Resistance to Powdery Mildew 8 (RPW8)	Lowest copy numbers; conserved across species	Signal transduction from TNL/CNL proteins

Methodological Framework for Orthogroup Analysis

Orthology Inference Methods and Tools

Contemporary orthology inference methods can be broadly categorized into graph-based and tree-based approaches, with hierarchical orthologous groups (HOGs) providing a powerful framework for capturing evolutionary relationships across multiple taxonomic levels [40]. The HOG framework systematically organizes homologous genes using the species phylogeny as a guide, capturing duplications, losses, and ancestral gene content in a structured manner [40]. A HOG represents a set of genes descended from a single ancestral gene, defined with respect to a given taxonomic level, enabling researchers to analyze gene families at different evolutionary depths without recomputing orthology relationships.

Several computational tools have been developed specifically for large-scale orthogroup inference:

OrthoFinder implements a comprehensive pipeline for inferring orthogroups from whole proteome data, using sequence similarity searches, graph-based clustering, and phylogenetic tree inference [39]. The algorithm begins with all-vs-all sequence comparisons, applies Markov Cluster Algorithm (MCL) to identify orthogroups, and then reconstructs gene trees for each orthogroup to refine orthology predictions.
FastOMA represents a breakthrough in scalable orthology inference, achieving linear time complexity through innovative algorithms that combine k-mer-based homology clustering with taxonomy-guided subsampling [41]. This tool can process thousands of eukaryotic genomes within a day while maintaining high accuracy, addressing a critical bottleneck in large-scale comparative genomics.
SHOOT implements a phylogeny-based search approach that places query sequences into pre-computed phylogenetic trees, providing evolutionary context and accurate ortholog identification comparable to conventional tree inference methods [42]. Benchmarking studies demonstrated that SHOOT correctly identified the closest related gene sequence in 94.2% of test cases, outperforming BLAST (88.4%) and DIAMOND (88.3%) [42].

Orthogroup Inference Workflow

Figure 1: Orthogroup Inference and Analysis Workflow. This pipeline transforms raw sequence data into evolutionarily informed hierarchical orthologous groups enabling comparative genomic analyses.

Data Preparation and Quality Control

Effective orthogroup analysis begins with comprehensive data preparation. For plant NBS gene studies, this typically involves:

Genome and Transcriptome Acquisition: Leverage public resources such as the OneKP (1000 plant transcriptomes) and MMETSP (Marine Microbial Eukaryote Transcriptome Sequencing Project) databases, which provide extensive taxonomic coverage across plant lineages [43]. The OneKP dataset contains 1341 transcriptomes from 1179 species covering all major classes of land plants, green algae, red algae, and glaucophytes [43].
Homology Identification: Perform sensitive sequence searches using tools like BLAST, HMMER, or OMAmer with the NB-ARC domain (Pfam accession: PF00931) as query to identify candidate NBS-encoding genes [31] [6]. Recommended parameters include E-value thresholds of 0.01 for BLAST searches and inclusion of domain architecture analysis to verify NBS domain presence.
Sequence Validation: Confirm the presence of characteristic NBS domains using Pfam (http://pfam.sanger.ac.uk/) and NCBI Conserved Domain Database (CDD) with an E-value cutoff of 10−4 [31] [6]. Classify genes into TNL, CNL, and RNL subclasses based on N-terminal domains (TIR, CC, or RPW8) detected using SMART and COILS programs [31].

Phylogenetic Reconstruction Methods

Phylogenomic Tree Inference

Phylogenetic reconstruction from orthogroup data enables researchers to resolve species relationships and gene family evolutionary histories. A robust phylogenomic approach involves:

Gene Tree-Species Tree Reconciliation: Modern methods address the inherent discordance between individual gene trees and the species tree using multi-species coalescent models [44]. The divide-and-conquer strategy implemented in large-scale studies like the angiosperm tree of life project involves computing a backbone species tree with limited sampling, then using this to constrain global gene tree inference [44]. This approach balances comprehensive sampling with computational tractability.

Model Selection and Tree Inference: Use model testing software such as ModelFinder [43] to select optimal substitution models for each gene alignment. Then perform maximum likelihood tree inference with tools like IQ-TREE or RAxML, assessing branch support with ultrafast bootstrap approximation or SH-aLRT tests [43].

Table 2: Evolutionary Patterns of NBS Genes Across Plant Families

Plant Family	Species	NBS Gene Count	Evolutionary Pattern	Key Mechanisms
Solanaceae	Potato (S. tuberosum)	447	"Consistent expansion"	Species-specific tandem duplications
Solanaceae	Tomato (S. lycopersicum)	255	"First expansion and then contraction"	Differential gene loss after duplication
Solanaceae	Pepper (C. annuum)	306	"Shrinking"	Preferential gene loss
Rosaceae	Apple (M. domestica)	Varies by species	"Early sharp expanding to abrupt shrinking"	Lineage-specific duplication/loss
Orchidaceae	Dendrobium officinale	74	"Degeneration and diversification"	NB-ARC domain degeneration, type changing

Ancestral State Reconstruction

Reconstructing ancestral gene complements at evolutionary nodes is essential for understanding NBS gene gain and loss dynamics. The protocol described by Mutte et al. provides a generalized framework for ancestral state reconstruction across multiple kingdoms of eukaryotes [43]. Key steps include:

Ortholog Selection: Implement multi-layered orthology confirmation based on domain architecture, reciprocal BLAST, and phylogenetic tree position to ensure accurate inference of orthologous relationships [43].
Alignment and Tree Construction: Generate multiple sequence alignments for each orthogroup using tools such as MAFFT or MUSCLE, then infer gene trees with maximum likelihood methods [43].
Ancestral Gene Content Inference: Apply phylogenetic reconciliation methods to estimate gene content at ancestral nodes, identifying duplication and loss events along each lineage [40]. For NBS genes in Solanaceae, studies have inferred that extant genomes were derived from approximately 150 CNL, 22 TNL, and 4 RNL ancestral genes [31].

Phylogenetic Visualization and Interpretation

Effective visualization enables researchers to interpret complex phylogenetic relationships and evolutionary patterns:

OrthoBrowser provides an interactive web-based platform for visualizing orthogroup phylogenies, multiple sequence alignments, and synteny relationships [39]. The tool integrates with OrthoFinder results, enabling researchers to filter datasets to specific subtrees of interest and export publication-quality figures.
ETE Toolkit offers programmable tree visualization capabilities within Python scripts, supporting the annotation of trees with domain structures, gene gain/loss events, and other evolutionary features [39].

Figure 2: NBS Gene Ancestral State Reconstruction Workflow. This specialized pipeline reconstructs evolutionary history of disease resistance genes across plant lineages.

Table 3: Computational Tools for Orthogroup Analysis and Phylogenetic Reconstruction

Tool/Resource	Primary Function	Key Features	Application in NBS Gene Studies
OrthoFinder	Orthogroup inference	Graph-based clustering, gene tree inference, species tree estimation	Identifying NBS gene families across multiple plant genomes
FastOMA	Scalable orthology inference	Linear time complexity, k-mer-based homology search	Processing thousands of plant genomes for comparative NBS gene analysis
SHOOT	Phylogenetic gene search	Places queries into pre-computed trees, ortholog identification	Rapid identification of NBS orthologs in newly sequenced species
OrthoBrowser	Results visualization	Interactive trees, multiple sequence alignments, synteny views	Visualizing NBS gene family evolution and conservation
MEME Suite	Motif discovery	Identifies conserved protein motifs, domain architecture	Characterizing NBS domain conservation and variation

Table 4: Genomic Databases for Plant NBS Gene Research

Database	Scope	Key Features	Relevance to NBS Studies
OneKP	1,341 transcriptomes from 1,179 plant species	Broad taxonomic coverage across land plants and algae	Discovering novel NBS genes across diverse plant lineages
MMETSP	678 transcriptomes from 410 marine microbial eukaryotes	Coverage of SAR group and unclassified marine eukaryotes	Studying early evolution of NBS genes in diverse eukaryotes
Phytozome	100+ sequenced plant genomes	Uniform annotation, comparative genomics tools	Systematic identification of NBS genes across model plants
Rosaceae	12 Rosaceae species genomes	Family-specific genomic resources	Comparative analysis of NBS gene evolution in fruit crops

Experimental Protocols for NBS Gene Analysis

Genome-Wide Identification of NBS Genes

Comprehensive identification of NBS-encoding genes requires a multi-step validation approach:

Initial Candidate Identification:
- Perform BLAST searches with E-value threshold of 1.0 using the NB-ARC domain (PF00931) as query [31] [6]
- Conduct parallel HMMER searches with default parameters against the same domain profile
- Merge results and remove redundant hits while retaining sequence variants
Domain Architecture Validation:
- Confirm NBS domain presence using Pfam (E-value < 10−4) and NCBI CDD [6]
- Identify N-terminal domains (TIR, CC, RPW8) using SMART and COILS programs (threshold 0.9) [31]
- Classify genes into TNL, CNL, and RNL subclasses based on domain composition
Motif Analysis:
- Identify conserved amino acid motifs using MEME suite with parameters set to discover 10 motifs [31] [6]
- Generate sequence logos using WebLogo to visualize motif conservation [6]
- Analyze exon-intron structures using GSDS2.0 to identify conserved splicing patterns [6]

Expression Analysis of NBS-LRR Genes

Functional validation of NBS genes requires assessment of their expression patterns and responses to pathogen challenges:

Transcriptome Sequencing:
- Treat plant materials with defense signaling molecules such as salicylic acid (SA) to induce immune responses [11]
- Extract RNA from treated and control tissues at multiple time points (e.g., 0, 6, 12, 24, 48 hours post-treatment)
- Prepare and sequence RNA-seq libraries using standard protocols (Illumina platform recommended)
Differential Expression Analysis:
- Identify significantly upregulated NBS-LRR genes following SA treatment [11]
- Perform weighted gene co-expression network analysis (WGCNA) to identify NBS genes clustered with pathogen recognition pathways, MAPK signaling, and plant hormone transduction [11]
- Validate key candidates using qRT-PCR with gene-specific primers
Functional Annotation:
- Analyze cis-regulatory elements in promoter regions of differentially expressed NBS genes
- Annotate genes with GO terms and KEGG pathways to identify associated biological processes [11]
- Integrate expression data with phylogenetic analysis to identify evolutionarily conserved expression patterns

Case Studies: NBS Gene Evolution Across Plant Lineages

Dynamic Evolution in Solanaceae Species

Comparative analysis of NBS genes in three Solanaceae species—potato (Solanum tuberosum), tomato (Solanum lycopersicum), and pepper (Capsicum annuum)—reveals distinct evolutionary patterns driven by independent gene duplication and loss events [31]. Genome-wide identification revealed 447, 255, and 306 NBS-encoding genes in potato, tomato, and pepper, respectively [31]. These genes predominantly cluster as tandem arrays on chromosomes, with few singleton genes, suggesting tandem duplication as the primary mechanism for NBS gene expansion.

Phylogenetic analysis demonstrates that the three NBS subclasses (TNL, CNL, RNL) each form monophyletic clades distinguished by unique exon/intron structures and amino acid motif sequences [31]. Reconciliation of the gene trees with the species phylogeny indicates that extant NBS genes were derived from approximately 150 CNL, 22 TNL, and 4 RNL ancestral genes in the common ancestor of these species [31]. Following speciation, each lineage experienced independent duplication and loss events, resulting in the observed species-specific gene counts:

Potato exhibits a "consistent expansion" pattern, with maintained gene duplication leading to the highest NBS gene count among the three species [31].
Tomato shows a "first expansion and then contraction" pattern, with initial gene duplication followed by subsequent gene losses [31].
Pepper displays a "shrinking" pattern, characterized by predominant gene loss rather than expansion [31].

Diverse Evolutionary Patterns in Rosaceae

Analysis of 2,188 NBS-LRR genes across 12 Rosaceae species reveals even more diverse evolutionary patterns [6]. The reconciled phylogeny inferred 102 ancestral NBS genes (7 RNLs, 26 TNLs, and 69 CNLs) in the Rosaceae common ancestor, which underwent independent gene duplication and loss events during species diversification [6]. The specific evolutionary patterns include:

Rubus occidentalis, Potentilla micrantha, Fragaria iinumae and Gillenia trifoliata: "First expansion and then contraction" evolutionary pattern [6]
Rosa chinensis: "Continuous expansion" pattern [6]
F. vesca: "Expansion followed by contraction, then a further expansion" pattern [6]
Three Prunus species and three Maleae species: "Early sharp expanding to abrupt shrinking" pattern [6]

This remarkable diversity in evolutionary patterns within a single plant family highlights the dynamic nature of NBS gene evolution and suggests that different lineages have employed distinct evolutionary strategies to adapt to their specific pathogen environments.

Degeneration and Diversification in Orchidaceae

Study of NBS genes in Dendrobium species reveals distinctive evolutionary mechanisms including type changing and NB-ARC domain degeneration [11]. Analysis of 655 NBS genes across six orchid species and Arabidopsis thaliana identified significant degeneration of CNL-type genes on specific phylogenetic branches, with no TNL-type genes detected in any orchid species [11]. This absence of TNL genes in orchids aligns with the pattern observed in other monocots and appears to be driven by NRG1/SAG101 pathway deficiency [11].

Expression analysis in Dendrobium officinale following salicylic acid treatment identified 1,677 differentially expressed genes, including six NBS-LRR genes that were significantly upregulated [11]. Weighted gene co-expression network analysis revealed that one key NBS-LRR gene (Dof020138) was closely associated with pathogen recognition pathways, MAPK signaling, plant hormone signal transduction, and energy metabolism pathways [11], suggesting its central role in the orchid immune response.

Orthogroup analysis and phylogenetic reconstruction provide powerful approaches for understanding the evolutionary history of gene families across species. When applied to NBS genes in plants, these methods reveal dynamic and lineage-specific patterns of gene gain and loss, reflecting ongoing evolutionary arms races between plants and their pathogens. The methodological framework presented in this technical guide—encompassing orthology inference, phylogenetic reconstruction, ancestral state estimation, and functional validation—equips researchers with comprehensive tools for investigating these evolutionary processes.

Future advances in orthogroup analysis will likely focus on improving scalability to accommodate thousands of genomes, integrating structural and functional data to refine orthology predictions, and developing more sophisticated models of gene family evolution that incorporate population genetic parameters. As sequencing technologies continue to produce genomic data at an accelerating pace, the methods outlined here will become increasingly essential for extracting evolutionary insights from the wealth of comparative genomic data.

Expression Profiling Under Biotic Stress and Hormonal Treatments

Expression profiling under controlled stress conditions is a fundamental technique in plant molecular biology for deciphering gene function, particularly for complex gene families involved in plant immunity. This methodology enables researchers to identify candidate genes involved in defense responses and understand their regulatory networks. When framed within the context of nucleotide-binding site (NBS) gene family research, expression profiling becomes a powerful tool for investigating the functional consequences of gene loss and gain events across plant lineages. The dynamic expansion and contraction of NBS genes through evolution creates a natural variation that can be exploited to understand structure-function relationships in plant immunity [4] [14]. This technical guide provides comprehensive methodologies and frameworks for conducting robust expression profiling studies, with emphasis on their application to NBS gene research.

Experimental Design Considerations

Treatment Selection and Application

Biotic Stressors: Appropriate selection of pathogens based on the research objectives is crucial. For comprehensive studies, multiple pathogen types with different infection strategies should be included:

Soil-borne fungi: Fusarium oxysporum, Rhizoctonia solani, and Pythium species (P. myriotylum, P. aphanidermatum) are excellent choices for studying root-pathogen interactions [45]. These pathogens cause significant damage to various crops, with F. oxysporum inducing vascular wilt that can lead to 10-100% yield loss in chickpea [45].
Foliar fungal pathogens: Magnaporthe oryzae (rice blast fungus) and Puccinia hordei (barley leaf rust) are well-established model systems [46] [47].
Viral pathogens: Begomoviruses causing cotton leaf curl disease (CLCuD) and rice tungro viruses (RTBV/RTSV) provide insights into plant-virus interactions [4] [47].

Hormonal Treatments: Plant hormone treatments should mimic defense signaling pathways:

Jasmonic Acid (JA) and Methyl Jasmonate (MeJA): 50-100 μM solutions, applied as foliar spray or root immersion, to activate jasmonate-mediated defense responses [45] [46].
Salicylic Acid (SA): 100 μM solutions to induce systemic acquired resistance (SAR) pathways [45] [46].
Abscisic Acid (ABA): 50 μM solutions to study abiotic-biotic stress cross-talk [45].
Ethylene (ET) precursors: 1 mM solutions to investigate ethylene-responsive defense mechanisms [45].

Application methods include root immersion, foliar spraying, or direct injection, with appropriate solvent controls. Treatment duration should be optimized for each system, with time-course experiments (0, 12, 24, 48 hours post-treatment) providing comprehensive expression dynamics [45].

Reference Gene Validation

A critical prerequisite for accurate qRT-PCR analysis is the validation of stable reference genes under specific experimental conditions. Traditional reference genes (e.g., ACTIN, GAPDH, TUBULIN) often show variable expression under stress conditions, necessitating empirical validation [45].

Table 1: Recommended Reference Genes for Different Experimental Conditions

Treatment Type	Most Stable Reference Genes	Validation Method	Application in Study
SA Treatment	TUA	GeNorm, NormFinder, BestKeeper	Mung bean pathogen interaction [45]
MeJA Treatment	ACT	GeNorm, NormFinder, BestKeeper	Mung bean hormone response [45]
ABA Treatment	TUA	GeNorm, NormFinder, BestKeeper	Mung bean hormone response [45]
ETH Treatment	TUB	GeNorm, NormFinder, BestKeeper	Mung bean hormone response [45]
P. myriotylum Infection	TUA	GeNorm, NormFinder, BestKeeper	Mung bean soil-borne disease [45]
F. oxysporum Infection	ACT	GeNorm, NormFinder, BestKeeper	Mung bean soil-borne disease [45]
R. solani Infection	EF1α	GeNorm, NormFinder, BestKeeper	Mung bean soil-borne disease [45]

Multiple algorithms (GeNorm, NormFinder, BestKeeper, and RefFinder) should be employed for comprehensive stability analysis [45]. For example, in mung bean studies, TUA demonstrated the most stable expression across multiple abiotic and biotic stress conditions, while Cons4 was the least stable [45].

Genomic Context: NBS Gene Loss and Gain Across Plant Lineages

The NBS gene family exhibits remarkable evolutionary dynamism, with frequent gene loss and gain events across plant lineages. Expression profiling provides functional insights into the consequences of these evolutionary patterns:

Lineage-Specific NBS Subfamily Contractions

Comparative genomic analyses reveal significant variation in NBS subfamily distributions:

TNL Subfamily Loss: Monocot species (e.g., Oryza sativa, Triticum aestivum, Zea mays) have completely lost TNL genes, while specific eudicot lineages (e.g., Salvia species, Vernicia fordii, Sesamum indicum) show independent losses of this subfamily [10] [3].
Differential Expansion: Gymnosperms (e.g., Pinus taeda) show TNL subfamily expansion (89.3% of typical NBS-LRRs), whereas Salvia species display marked reduction in both TNL and RNL subfamilies [10].

Table 2: NBS-LRR Gene Family Distribution Across Plant Lineages

Plant Species	Total NBS Genes	CNL Subfamily	TNL Subfamily	RNL Subfamily	Lineage-Specific Patterns
Arabidopsis thaliana	207	~61%	~36%	~3%	Balanced subfamily distribution [10]
Salvia miltiorrhiza	196 (62 typical)	61	0	1	Severe TNL reduction [10]
Vernicia montana	149	98 (CC-containing)	12	-	Limited TNL retention [3]
Vernicia fordii	90	49 (CC-containing)	0	-	Complete TNL loss [3]
Oryza sativa	505	All CNL	0	0	Monocot-typical TNL absence [10]
Pinus taeda	311	Minor proportion	89.3%	Minor proportion	TNL dominance [10]

Evolutionary Mechanisms Driving NBS Diversity

Duplication Mechanisms: Different duplication modes contribute to NBS gene expansion with distinct evolutionary outcomes:

Tandem Duplications: Generate adaptive NBS clusters under positive selection, creating genetic variation for pathogen recognition [14].
Whole-Genome Duplications (WGD): Produce duplicate NBS genes typically under strong purifying selection, preserving conserved immune functions [14].
Dispersed Duplications: Contribute significantly to CNL/CN gene expansion in maize [14].

Structural Variation: Presence-absence variations (PAVs) and structural variants (SVs) create "core" and "adaptive" NBS subgroups. Core subgroups (e.g., ZmNBS31, ZmNBS17-19 in maize) are conserved across accessions, while adaptive subgroups (e.g., ZmNBS1-10, ZmNBS43-60) show high variability and potential for specialized pathogen recognition [14].

Methodological Approaches for Expression Profiling

Transcriptome Sequencing and Analysis

Library Preparation and Sequencing:

RNA Extraction: Use quality-controlled RNA extraction methods (RNeasy Plant Mini Kit) with rigorous quality assessment (NanoDrop, Bioanalyzer) [48]. RNA integrity is critical for library construction.
Library Construction: Employ strand-specific cDNA library protocols compatible with Illumina platforms (100-150 bp paired-end reads recommended) [48].
Sequencing Depth: Generate >120 million reads per library for adequate transcriptome coverage, particularly for detecting low-abundance transcripts [48].

Data Analysis Pipeline:

Read Processing: Remove adapter sequences, PCR duplicates, and low-quality reads (Q<20) using tools like Trimmomatic or Fastp [48].
Transcriptome Assembly: For non-model species without reference genomes, use de novo assemblers (Trinity) followed by redundancy reduction (iAssembler, CD-HIT-EST) [48].
Read Mapping and Quantification: Map clean reads to reference transcripts using TopHat or HISAT2, then calculate expression values (FPKM or TPM) [48].
Differential Expression: Identify significantly differentially expressed genes using statistical methods (DESeq2, edgeR) with appropriate false discovery rate correction [48].

Quantitative Real-Time PCR (qRT-PCR) Validation

Primer Design and Validation:

Design gene-specific primers with amplicon lengths of 150-250 bp using Primer5.0 or similar tools [45].
Validate primer specificity through melt curve analysis and agarose gel electrophoresis.
Determine amplification efficiency (90-110% ideal) using standard curves with serial dilutions [45].

Reaction Conditions and Analysis:

Use SYBR Green-based detection systems with three technical and biological replicates [45] [49].
Employ appropriate cDNA concentrations (1 μg total RNA input for reverse transcription) [45].
Calculate relative expression using the 2−ΔΔCT method with validated reference genes [45] [49].

Co-Expression Network Analysis

Modular Gene Co-Expression Analysis:

Construct co-expression networks from expression matrices using Pearson Correlation Coefficient (PCC) in tools like CEMiTool [47].
Identify modules (clusters) of co-expressed genes with similar expression patterns across treatments.
Detect hub genes within modules based on connectivity measures (Maximal Clique Centrality) [47].

Functional Enrichment Analysis:

Perform Gene Ontology (GO) and pathway enrichment analysis (KEGG) on co-expression modules.
Identify over-represented cis-regulatory elements in promoter regions of co-expressed genes [46].

Case Studies in Expression Profiling of NBS Genes

Orthogroup-Based Analysis Across Species

Large-scale comparative transcriptomic analysis of NBS genes across 34 plant species identified 603 orthogroups (OGs) with distinct expression patterns [4]:

Core Orthogroups: OG0, OG1, OG2 represented conserved NBS genes across multiple species.
Specialized Orthogroups: OG80, OG82 showed species-specific expression patterns.
Stress-Responsive OGs: OG2, OG6, and OG15 exhibited significant upregulation under various biotic and abiotic stresses in cotton leaf curl disease (CLCuD) susceptible and tolerant varieties [4].

Functional Validation of NBS Gene Expression

Virus-Induced Gene Silencing (VIGS):

VIGS of GaNBS (OG2) in resistant cotton demonstrated its role in virus tittering, validating predictions from expression profiling [4].
In Vernicia montana, VIGS of Vm019719 (NBS-LRR ortholog) confirmed its contribution to Fusarium wilt resistance, while its allele in susceptible V. fordii (Vf11G0978) showed ineffective defense response due to promoter variation [3].

Expression Pattern Correlations:

In Salvia miltiorrhiza, specific SmNBS genes clustered phylogenetically with well-characterized resistance proteins: SmNBS35/49/51 with Arabidopsis RPH8A (hypersensitive response), and SmNBS55/56 with RPM1 (Pseudomonas syringae resistance) [10].
Promoter analysis revealed abundant cis-acting elements related to plant hormones and abiotic stress in SmNBS genes, explaining their expression responsiveness [10].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents for Expression Profiling Studies

Reagent/Category	Specific Examples	Function/Application	Technical Considerations
RNA Extraction Kits	RNeasy Plant Mini Kit (Qiagen), Trizol reagent	High-quality RNA isolation for transcriptomics	Assess RNA integrity (RIN >8.0) for library construction [45] [48]
cDNA Synthesis Kits	HiScript III 1st Strand cDNA Synthesis Kit (Vazyme)	Reverse transcription with gDNA removal	Use 1 μg total RNA input for consistent results [45]
qPCR Master Mixes	SYBR Green Premix Pro Taq HS qPCR Kit	Quantitative gene expression analysis	Validate primer efficiency (90-110%) for accurate 2−ΔΔCT calculation [45] [49]
Pathogen Cultures	Fusarium oxysporum, Rhizoctonia solani, Magnaporthe oryzae	Biotic stress treatments	Standardize inoculation methods (root immersion, spray) [45] [47]
Hormone Solutions	SA (100 μM), MeJA (50-100 μM), ABA (50 μM)	Defense signaling induction	Prepare fresh stocks, include solvent controls [45] [46]
Reference Genes	TUA, ACT, TUB, EF1α	Expression normalization	Validate stability for specific treatments using GeNorm/NormFinder [45]
VIGS Vectors	TRV-based vectors (pTRV1, pTRV2)	Functional gene validation	Optimize Agrobacterium strain and inoculation method [4] [3]

Signaling Pathways in Plant Defense

The following diagrams illustrate key signaling pathways involved in plant defense responses to biotic stress and hormonal treatments, with emphasis on NBS gene regulation.

NBS-Mediated Effector-Triggered Immunity Signaling

Diagram 1: NBS-Mediated Effector-Triggered Immunity Signaling. This pathway illustrates how NBS-LRR proteins recognize pathogen effectors and activate defense responses. Different NBS-LRR types (CNL, TNL, RNL) may have distinct signaling outputs but converge on hypersensitive response (HR), programmed cell death (PCD), and defense gene activation [10] [3].

Hormonal Signaling Crosstalk in Plant Defense

Diagram 2: Hormonal Signaling Crosstalk in Plant Defense. This diagram shows the complex interactions between major hormone signaling pathways in plant defense. Salicylic acid (SA) induces systemic acquired resistance (SAR) against biotrophic pathogens, while jasmonic acid (JA) and ethylene (ET) promote induced systemic resistance (ISR) against necrotrophs. These pathways exhibit crosstalk (often antagonistic) and collectively modulate NBS gene expression [46] [47].

Experimental Workflow for Expression Profiling

Diagram 3: Experimental Workflow for Expression Profiling. This workflow outlines the key steps in a comprehensive expression profiling study, from experimental design through functional validation. Reference gene validation (in parallel) ensures accurate qRT-PCR analysis, while integrated wet lab and computational approaches provide robust insights into gene function [45] [4] [3].

Expression profiling under biotic stress and hormonal treatments provides critical functional insights that complement evolutionary studies of NBS gene loss and gain across plant lineages. The methodological frameworks presented in this guide enable researchers to connect genomic variation with functional outcomes in plant immunity. By integrating robust experimental design with comprehensive bioinformatic analysis and functional validation, researchers can decipher the complex relationships between NBS gene evolution, expression regulation, and disease resistance phenotypes. These approaches are fundamental for identifying candidate genes for crop improvement and understanding the evolutionary dynamics of plant immune systems.

Functional Validation Through Virus-Induced Gene Silencing (VIGS)

Virus-Induced Gene Silencing (VIGS) has emerged as a powerful reverse genetics tool for rapidly analyzing gene function in plants. This technology leverages the innate RNA-based antiviral defense mechanism of plants, whereby recombinant viral vectors carrying host gene fragments trigger sequence-specific silencing of corresponding target genes [50]. The significance of VIGS is particularly pronounced in functional genomics studies of species that are recalcitrant to stable genetic transformation or have long life cycles, enabling medium- to high-throughput gene functional screening without the need for stable transformation [50] [51].

Within the context of researching nucleotide-binding site (NBS) gene loss and gain across plant lineages, VIGS provides an indispensable tool for functionally validating the roles of specific NBS genes in pathogen resistance. NBS genes encode proteins containing nucleotide-binding site and C-terminal leucine-rich repeat domains, representing the largest class of plant disease resistance (R) genes that play vital roles in effector-triggered immunity (ETI) [24] [4] [13]. The evolution of NBS genes exhibits remarkable diversity across plant lineages, with frequent gene loss, gain, and domain degeneration events observed [24] [4]. VIGS enables researchers to rapidly link specific NBS gene sequences to resistance phenotypes, thereby illuminating the functional consequences of these evolutionary patterns.

Molecular Mechanisms of VIGS

Fundamental Principles

VIGS operates through the plant's post-transcriptional gene silencing (PTGS) pathway, which naturally serves as an antiviral defense system [50]. The process begins with the introduction of a recombinant viral vector containing a fragment of the target plant gene. As the virus replicates and spreads systemically throughout the plant, double-stranded RNA (dsRNA) replication intermediates are generated. These dsRNA molecules are recognized and processed by the plant's RNA interference machinery [50].

The core mechanism involves the cleavage of long double-stranded RNA by Dicer-like (DCL) enzymes, generating 21- to 24-nucleotide small interfering RNAs (siRNAs) [52] [50]. These siRNAs are then incorporated into an RNA-induced silencing complex (RISC), which guides the sequence-specific degradation of complementary mRNA transcripts, thereby suppressing the expression of the target gene [50]. The silencing signal spreads systemically through the plant, leading to phenotypic changes that enable functional characterization of the targeted gene [50].

Key Molecular Components

The following diagram illustrates the core molecular mechanism of VIGS:

Recent advances have refined our understanding of siRNA production in VIGS. Studies using optimized viral-delivered short RNA inserts (vsRNAi) as short as 24 nucleotides have demonstrated effective gene silencing, with 32-nt inserts producing robust phenotypes through specific enrichment of 21-nt and 22-nt siRNAs, corresponding to DCL4 and DCL2 activities respectively [52]. This precision approach minimizes off-target effects while maintaining high silencing efficiency.

VIGS Experimental Workflow

Comprehensive Experimental Pipeline

Implementing VIGS requires careful execution of sequential steps, from vector design to phenotypic analysis. The following diagram outlines the complete experimental workflow:

Critical Optimization Parameters

Successful VIGS implementation depends on careful optimization of several key parameters that significantly influence silencing efficiency:

Insert Design: For conventional VIGS, inserts of 200-400 bp targeting less conserved regions ensure specificity [52]. Recent advances show that viral-delivered short RNA inserts (vsRNAi) as short as 24-32 nt can effectively trigger silencing when designed against conserved coding sequences [52].
Plant Developmental Stage: Younger plants (2-4 leaf stage) generally show higher silencing efficiency, though this varies by species [50].
Agroinoculum Concentration: Optimal OD₆₀₀ typically ranges from 0.3 to 2.0, with species-specific optimization required [50] [51].
Environmental Conditions: Temperature (18-22°C), humidity (60-70%), and photoperiod (16h light/8h dark) significantly impact silencing efficiency and viral spread [50].
Infection Method: Selection of appropriate delivery method (agroinfiltration, injection, or soaking) depends on plant species and tissue type [51].

VIGS Applications in NBS Gene Research

Functional Validation of NBS Genes

VIGS has become an indispensable tool for functionally characterizing NBS genes across numerous plant species, providing critical insights into their roles in disease resistance pathways. In cotton, silencing of GaNBS (orthogroup OG2) through VIGS demonstrated its putative role in virus tittering, validating its function in resistance to cotton leaf curl disease [4]. Similarly, in Dendrobium officinale, six NBS-LRR genes (Dof013264, Dof020566, Dof019188, Dof019191, Dof020138, and Dof020707) were significantly up-regulated in response to salicylic acid treatment, with Dof020138 identified as a key mediator connecting pathogen recognition to downstream signaling pathways [24].

The application of VIGS in eggplant revealed nine SmNBS genes with differential expression patterns in response to Ralstonia solanacearum stress, with EGP05874.1 potentially involved in the resistance response to bacterial wilt [33]. These findings highlight how VIGS enables rapid functional screening of NBS gene candidates identified through genomic studies, particularly important given the large size and functional redundancy of NBS gene families.

Elucidating Evolutionary Patterns

VIGS functional studies have contributed significantly to understanding NBS gene evolution across plant lineages. Research in Dendrobium species revealed that NBS gene degenerations are common in the genus, representing the main reason for NBS gene diversity [24]. Phylogenetic analyses showed that orchid NBS-LRR genes have significantly degenerated, with Dendrobium NBS genes exhibiting type changing and NB-ARC domain degeneration [24].

Comparative genomics across 34 species from mosses to monocots and dicots identified 12,820 NBS-domain-containing genes classified into 168 classes with several novel domain architecture patterns, revealing both classical (NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR) and species-specific structural patterns [4]. VIGS provides the functional validation needed to interpret how these evolutionary changes affect gene function and plant immunity.

Essential Research Reagents and Solutions

VIGS Research Toolkit

The following table summarizes key reagents and materials essential for implementing VIGS technology:

Reagent Category	Specific Examples	Function and Application
Viral Vectors	TRV (Tobacco Rattle Virus), BPMV (Bean Pod Mottle Virus), CLCrV (Cotton Leaf Crumple Virus)	Delivery of target gene fragments; TRV most widely used for broad host range [50] [51]
Agrobacterium Strains	GV3101, LBA4404, AGL1	Delivery of viral vectors into plant cells [51]
Selection Antibiotics	Kanamycin, Rifampicin, Gentamicin	Selection of transformed Agrobacterium and plasmid maintenance [51]
Induction Media	LB, YEP, M9 with AS (acetosyringone)	Agrobacterium culture and induction of virulence genes [51]
Infiltration Buffers	MMA (MgCl₂, MES, AS)	Enhancement of Agrobacterium infection efficiency [50]
Positive Control Constructs	PDS (phytoene desaturase), CHLI (magnesium protoporphyrin chelatase)	Silencing produces visible phenotypes (photobleaching, yellowing) to validate system efficiency [52] [51]

Vector Systems for Different Applications

Different viral vectors offer distinct advantages for specific research applications. TRV-based vectors are particularly versatile for Solanaceae species and beyond, with a bipartite genome organization requiring TRV1 (encoding replicase and movement proteins) and TRV2 (containing the capsid protein and multiple cloning site for target sequences) [50]. For legumes like soybean, BPMV-based vectors have been widely adopted, though recent optimization of TRV for soybean through cotyledon node soaking has achieved silencing efficiencies of 65-95% [51].

Advanced vector systems like the JoinTRV system enable simplified cloning of short RNA inserts through one-step digestion-ligation reactions, significantly reducing insert size requirements while maintaining efficiency [52]. The development of satellite virus-based systems and vectors incorporating viral suppressors of RNA silencing (VSRs) like P19 and C2b further expands the toolbox for challenging plant species [50].

Quantitative Analysis of VIGS Efficiency

Silencing Efficiency Across Plant Systems

Extensive research has quantified VIGS efficiency parameters across different plant species and experimental conditions:

Table 1: VIGS Efficiency Metrics Across Plant Systems

Plant Species	Target Gene	Silencing Efficiency	Key Optimization Factors	Reference
Nicotiana benthamiana	CHLI (Mg-chelatase)	24-32 nt vsRNAi induced significant chlorophyll reduction	Insert length optimization; Conserved target regions	[52]
Soybean (Glycine max)	GmPDS, GmRpp6907, GmRPT4	65-95% silencing efficiency	Cotyledon node soaking method; Agrobacterium strain GV3101	[51]
Scarlet eggplant (S. aethiopicum)	CHLI	Visible yellowing phenotype	32-nt insert conservation across species	[52]
Tomato (S. lycopersicum)	CHLI	Leaf yellowing confirmed functionality	Portability of vsRNAi across Solanaceae	[52]

Molecular Validation Parameters

Rigorous validation of silencing efficiency requires multiple molecular approaches:

Transcript Level Analysis: Quantitative RT-PCR showing 60-95% reduction in target mRNA levels [51].
Phenotypic Validation: Visible phenotypes such as photobleaching (PDS silencing) or yellowing (CHLI silencing) [52] [51].
Small RNA Sequencing: Confirmation of 21-nt and 22-nt siRNA accumulation in target regions [52].
Protein Level Analysis: Western blotting or enzymatic activity assays where antibodies or functional assays are available.

The correlation between phenotypic strength and molecular silencing efficiency has been quantitatively demonstrated through fluorometry measurements of chlorophyll levels in CHLI-silenced plants, showing significant correlations with transcript reduction levels [52].

Integration with NBS Gene Evolution Studies

Connecting Sequence Evolution to Function

VIGS provides a critical functional bridge between bioinformatic identification of NBS genes and their biological roles in plant immunity. Genomic studies have revealed dramatic variation in NBS gene content across plant lineages, with species exhibiting tens to thousands of NBS genes [4] [13]. For example, Akebia trifoliata contains only 73 NBS genes (50 CNL, 19 TNL, 4 RNL) [13], while eggplant has 269 SmNBS genes (231 CNLs, 36 TNLs, 2 RNLs) [33].

Evolutionary analyses indicate that tandem and dispersed duplications are the main forces responsible for NBS gene expansion [4] [13]. The composition of NBS gene subfamilies also shows remarkable variation, with monocots generally lacking TNL-type genes [24] [4], potentially driven by NRG1/SAG101 pathway deficiency [24]. VIGS enables functional testing of how these evolutionary patterns affect disease resistance capabilities.

Case Studies in Evolutionary Functional Genomics

Several research programs have successfully integrated evolutionary genomics with VIGS functional validation:

In cotton, comparative analysis between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified 6,583 unique variants in NBS genes of Mac7 versus 5,173 in Coker312, with VIGS functionally validating the role of GaNBS in virus resistance [4].

Research across 34 plant species identified 603 orthogroups of NBS genes, with some core (OG0, OG1, OG2) and unique (OG80, OG82) orthogroups showing tandem duplications [4]. Expression profiling demonstrated putative upregulation of OG2, OG6, and OG15 orthogroups under various biotic stresses, providing candidates for functional validation through VIGS.

These integrated approaches demonstrate how VIGS bridges the gap between sequence-based evolutionary studies and functional understanding of plant immunity, particularly relevant for understanding patterns of NBS gene loss and gain across plant lineages.

The continued development of VIGS technology promises to further enhance its utility in functional genomics and evolutionary studies. Emerging approaches include the combination of VIGS with CRISPR-based systems for enhanced functional analysis [53], the development of more versatile viral vectors with reduced symptom development [50], and the integration of VIGS with multi-omics technologies for comprehensive analysis of gene function [54].

For NBS gene research, VIGS enables medium-throughput functional screening of the numerous candidate genes identified through comparative genomics. This is particularly valuable for perennial species with long life cycles or species recalcitrant to stable transformation [50] [13]. The ability to rapidly validate gene function directly in the context of evolutionary patterns significantly accelerates our understanding of how NBS gene family dynamics contribute to plant immunity adaptation.

As genomic resources continue to expand across diverse plant lineages, VIGS will remain an essential tool for translating sequence information into functional understanding, particularly for illuminating the functional consequences of NBS gene loss and gain events throughout plant evolution. The integration of improved VIGS methodologies with comprehensive evolutionary analyses will continue to reveal how plant immune systems adapt to changing pathogen pressures.

Promoter Analysis and Cis-Element Identification for Regulation Studies

Promoter analysis and the identification of cis-acting regulatory elements (CAREs) are fundamental to understanding the transcriptional regulation of genes. Within the context of studying the evolutionary dynamics of Nucleotide-Binding Site-Leucine Rich Repeat (NBS-LRR) genes across plant lineages, these techniques are indispensable. NBS genes represent the largest class of plant disease resistance (R) genes and exhibit remarkable diversity and complex evolutionary patterns, including significant gene loss and gain events [24] [4]. Unraveling the regulatory mechanisms that control their expression is crucial for deciphering how plants adapt to pathogen pressures. This guide provides an in-depth technical overview of promoter analysis methodologies, framed within the specific research focus of NBS gene evolution.

The Critical Role of Promoter Analysis in NBS Gene Research

NBS genes are key components of the plant immune system, particularly in the effector-triggered immunity (ETI) pathway [24]. Comparative genomic studies have revealed that NBS gene families have undergone extensive expansion and contraction throughout plant evolution. For instance, while dicots often possess both TNL (TIR-NBS-LRR) and CNL (CC-NBS-LRR) subfamilies, monocots, including many orchids, have experienced widespread TNL gene loss [24] [4]. Furthermore, studies across 34 plant species have identified NBS genes with both classical and novel domain architectures, highlighting their diversification [4].

The regulation of these genes is equally complex. Promoter analysis of 22 D. officinale NBS-LRR genes revealed that their upstream regions contain cis-elements implicated not only in the ETI system but also in plant hormone signal transduction and the Ras signaling pathway [24]. This suggests that the expression of NBS genes is controlled by a sophisticated network of regulatory cues. Therefore, profiling their promoters is not merely a descriptive exercise but a vital step to understand the evolutionary forces and functional specialization of NBS genes across different plant lineages.

Core Methodologies for Promoter Sequence Analysis

Promoter Sequence Retrieval

The first step involves obtaining the DNA sequence upstream of the transcription start site (TSS) of your gene of interest.

Sequence Length: Typically, a region spanning 2,000 base pairs (bp) upstream of the TSS is retrieved for analysis [55].
Data Sources: Sequences can be acquired from genome databases such as Phytozome, NCBI, or the PLAZA genome database [4].
Validation: It is critical to clip the sequence if a predicted gene is located within this region to prevent overlap with a neighboring gene's coding region [55].

Identification of Cis-Acting Regulatory Elements

Once promoter sequences are acquired, CAREs are identified using specialized databases.

Primary Tool: The PlantCARE database is widely used for this purpose. Researchers upload promoter sequences, and the database returns a detailed list of identified CAREs using default parameters [55].
Data Filtering: Elements without a known putative function or those involved in core transcription initiation (e.g., TATA-boxes and CAAT-boxes) are typically discarded from further analysis to focus on regulatory elements [55].
Visualization: Identified CAREs can be mapped onto the promoter regions using bioinformatics tools like TBtools [55].

Table 1: Key Databases and Tools for Promoter and CARE Analysis

Resource Name	Type	Primary Function	Key Features
PlantCARE [55]	Database	Identification of cis-acting elements	Curated database of plant CAREs; web-based analysis
PlantPAN [56]	Database/Navigator	Identification of combinatorial TF binding sites	Integrates data from PLACE, TRANSFAC, AGRIS, JASPER; finds co-occurring sites
RSAT (Regulatory Sequence Analysis Tool) [55]	Web Tool	Retrieval of promoter sequences	Allows extraction of upstream sequences from a reference genome
TBtools [55]	Software Suite	Visualization	Mapping of identified CAREs onto promoter regions for visual inspection
Pfam / SMART [24]	Database	Domain verification	Confirms protein domains to complement gene classification

Advanced Analysis: Combinatorial Regulation and Cross-Species Comparison

Basic CARE identification can be extended to uncover more complex regulatory logic.

Combinatorial Element Analysis: Tools like PlantPAN allow for the identification of combinatorial transcription factor binding sites (TFBS) that occur within a defined distance (e.g., 20-200 bp) in a set of gene promoters [56]. This is vital for understanding cooperative gene regulation.
Cross-Species Comparison: PlantPAN also provides a function for discovering TFBS in the conserved regions of promoters across homologous genes from different species [56]. This comparative approach can highlight evolutionarily conserved regulatory modules.

Experimental Validation and Functional Characterization

Bioinformatic predictions require experimental validation to confirm functionality.

Protocol: Functional Validation of Promoter Elements

A standard approach involves linking the putative promoter sequence to a reporter gene and assaying its activity in vivo.

Reporter Gene Constructs: Fuse the candidate promoter sequence (e.g., the ~2 kb region upstream of the TSS) to a reporter gene such as β-glucuronidase (GUS) or Green Fluorescent Protein (GFP) in a transformation vector [57].
Plant Transformation: Introduce the constructed vector into the host plant (or a model plant like Arabidopsis) using Agrobacterium-mediated transformation or other methods.
Activity Assay: Under specific stress conditions (e.g., pathogen infection, hormone treatment), measure the activity of the reporter gene. For GUS, this is a histochemical stain; for GFP, it is fluorescence microscopy.
Synthetic Promoters/Introns: An advanced validation method involves constructing synthetic promoters or incorporating synthetic introns based on the predicted CAREs to test if these sequences are sufficient to drive expected expression patterns [57].

Integrating Expression Data with Promoter Analysis

Correlating promoter content with gene expression patterns provides powerful, multi-layered evidence.

Expression Profiling: Data from RNA-seq or microarrays under various conditions (biotic/abiotic stress, different tissues) is crucial. For example, in a study of NBS genes in cotton, FPKM values from RNA-seq data were categorized and analyzed to link expression to specific stresses [4].
Algorithmic Correlation: Advanced algorithms like cis/TF can correlate the expression pattern of a transcription factor with the composite expression pattern of all genes containing a specific motif in their promoters. This helps in matching transcription factors to their binding sites, even when the target genes do not have perfectly correlated expression [58].

A Practical Workflow: From Sequence to Validated Element

The following diagram summarizes the comprehensive workflow for promoter analysis and cis-element identification.

Successful promoter analysis relies on a suite of bioinformatic and experimental reagents.

Table 2: Essential Research Reagent Solutions for Promoter Studies

Reagent / Resource	Category	Function in Analysis
Genome Database (e.g., Phytozome, NCBI) [4]	Bioinformatic	Source of reference genome and gene annotation files for promoter sequence extraction.
PlantCARE Database [55]	Bioinformatic	Core database for annotating cis-acting regulatory elements in plant promoter sequences.
HMMER Suite	Bioinformatic	For identifying genes based on conserved domains (e.g., NB-ARC PF00931) prior to promoter analysis [33].
Reporter Vector (e.g., pCAMBIA with GUS/GFP)	Molecular Biology	Plasmid backbone for constructing promoter-reporter fusions for functional validation [57].
Stable Transformation System (e.g., Agrobacterium)	Molecular Biology	Method for integrating the promoter-reporter construct into the plant genome for in vivo testing.
RNA-seq Data & Analysis Pipeline [4]	Bioinformatic/Experimental	Provides gene expression profiles under different conditions to correlate with predicted promoter elements.

Promoter analysis and cis-element identification are powerful approaches that bridge genomics and functional biology. When applied to the study of NBS gene evolution, they move beyond cataloging gene gains and losses to provide mechanistic insights into the regulatory evolution that underpins plant immunity. By integrating the bioinformatic and experimental protocols outlined in this guide, researchers can elucidate the complex regulatory codes that govern these critical resistance genes, ultimately advancing strategies for breeding more resilient crops.

Navigating Research Challenges: From Gene Annotation to Functional Redundancy

Addressing Incomplete Genomes and Annotation Inconsistencies

Incomplete genome assemblies and annotation inconsistencies represent significant technical bottlenecks in genomic research. For the study of Nucleotide-Binding Site-Leucine Rich Repeat (NBS-LRR) genes—the largest family of plant disease resistance (R) genes—these limitations directly impact our ability to accurately trace gene loss and gain across plant lineages. The NBS gene family exhibits remarkable diversity in size and composition across species, with counts ranging from just 73 in Akebia trifoliata to over 700 in some apple varieties [18] [59]. This variation reflects dynamic evolutionary processes including whole-genome duplication (WGD), tandem duplication, and gene loss. However, without complete and accurately annotated genomes, researchers risk mischaracterizing these evolutionary patterns, potentially misidentifying annotation gaps as genuine gene losses or missing recent species-specific expansions that underlie adaptation to rapidly evolving pathogens [60] [59].

Key Challenges in Plant Genome Annotation

Technical Limitations and Their Consequences

Plant genomes present unique annotation challenges due to their structural complexity. Many species have large genomes with abundant transposable elements and variable ploidy, which complicates accurate gene prediction [61]. Errors in genome annotation are frequent, even among well-studied models, and are propagated through downstream analyses [61]. For NBS-LRR genes specifically, several factors exacerbate these challenges:

Repetitive nature: NBS-LRR genes often occur in clustered arrangements on chromosomes, with heterogeneous or homogeneous domains resulting from recent duplication events [33] [62]. These repetitive regions are difficult to resolve in draft genomes.
Multi-exonic structure: NBS-LRR genes typically contain multiple exons, with TNL-type genes generally possessing more exons than CNL-type genes [59]. Incomplete gene models may miss exons or entire genes.
Domain fragmentation: Many NBS-LRR proteins contain additional integrated domains that can be misclassified if annotation is incomplete [60].

Impact on Evolutionary Inferences

Incomplete annotations directly impact evolutionary studies by distorting apparent gene family sizes and compositions. For example, the absence of TNL-type genes in monocots was once thought to represent ancestral loss, but improved genomic sampling has revealed more complex patterns of domain evolution [62]. Similarly, the dramatic variation in NBS-LRR gene numbers between related species—such as the 144 NBS-LRR genes in strawberry (Fragaria vesca) compared to 748 in apple (Malus × domestica)—may reflect both genuine evolutionary differences and technical disparities in genome quality and annotation methods [59].

Table 1: NBS-LRR Gene Counts Across Plant Species

Species	Total NBS-LRR Genes	TNL Genes	CNL Genes	RNL Genes	Reference
Arabidopsis thaliana	167	Not specified	Not specified	Not specified	[62]
Brassica oleracea	157	Not specified	Not specified	Not specified	[62]
Brassica rapa	206	Not specified	Not specified	Not specified	[62]
Akebia trifoliata	73	19	50	4	[18]
Fragaria vesca (strawberry)	144	23 (15.97%)	121 (84.03%)	Not specified	[59]
Malus × domestica (apple)	748	219 (29.28%)	529 (70.72%)	Not specified	[59]
Prunus persica (peach)	354	128 (36.16%)	226 (63.84%)	Not specified	[59]
Solanum melongena (eggplant)	269	36	231	2	[33]

Best Practices for Genome Annotation Quality

Integrated Annotation Approaches

Current best practices recommend combining multiple evidence types and annotation methods to improve accuracy [61] [63]. The following integrated approach significantly improves annotation completeness for complex gene families like NBS-LRR genes:

Evidence Integration: Combine ab initio predictions with evidence from transcriptomes (RNA-seq) and protein homology searches [61].
Multi-Tool Workflows: Implement complementary annotation pipelines such as MAKER and BRAKER, which leverage different algorithmic approaches [61] [63].
Diverse Transcript Evidence: Incorporate both short-read and long-read transcriptomic data where possible, as long-read technologies (PacBio Iso-Seq, Oxford Nanopore) better resolve splice variants and full-length transcripts [61].
Careful Repeat Masking: Use tools like RepeatModeler2 and RepeatMasker to identify repetitive elements without obscuring genuine coding sequences through appropriate soft-masking approaches [61].

Special Considerations for NBS-LRR Gene Annotation

The unique characteristics of NBS-LRR genes warrant specialized annotation approaches:

Domain-Based Identification: Initial identification should use HMMER searches with the NB-ARC domain (PF00931) as query, followed by verification of associated domains (TIR, CC, LRR, RPW8) using Pfam, SMART, and coiled-coil prediction tools [33] [18].
Manual Curation: Automated annotations require manual verification due to the complex structure of R genes, including validation of conserved motifs (P-loop, kinase-2, kinase-3a) within the NBS domain [33].
Orthology Assessment: Compare annotations with orthologs from closely related species to identify potentially missing or fragmented genes [62].

Table 2: Key Tools for Improving Genome Annotation

Tool Category	Specific Tools	Primary Function	Considerations for NBS-LRR Genes
Repeat Identification	RepeatModeler2, RepeatMasker	Identifies and masks repetitive elements	Essential for distinguishing recent NBS-LRR duplicates from transposable elements
Evidence Alignment	HISAT2, StringTie2, Miniprot	Aligns RNA-seq and protein evidence to genome	Long-read aligners help resolve complex gene structures
Gene Prediction	AUGUSTUS, BRAKER, MAKER	Predicts gene structures using evidence	Combining multiple predictors improves accuracy
Domain Identification	HMMER, Pfam, CDD, SMART	Identifies protein domains	Critical for classifying NBS-LRR subtypes (TNL, CNL, RNL)
Conserved Motif Detection	MEME Suite	Identifies conserved protein motifs	Validates presence of essential NBS domain motifs

Experimental Protocols for NBS Gene Family Analysis

Comprehensive Identification Protocol

The following protocol, adapted from multiple studies [33] [62] [18], provides a robust framework for identifying NBS-LRR genes in plant genomes:

Initial HMM Search
- Download the NB-ARC domain (PF00931) HMM profile from Pfam
- Perform HMMER search (hmmsearch) against the proteome with E-value threshold of 10⁻⁴
- Extract significant hits (E-value < 10⁻²⁰) and construct a species-specific HMM profile
Comprehensive Candidate Identification
- Use the custom HMM profile to identify additional candidates with relaxed threshold (E-value < 0.01)
- Perform BLASTP search against known NBS-LRR proteins from related species
- Combine results and remove redundant entries
Domain Verification and Classification
- Verify NBS domain presence using Pfam and SMART databases (E-value < 10⁻⁴)
- Identify N-terminal domains:
  - TIR domain: Pfam (PF01582) and SMART
  - CC domain: COILS, PAIRCOIL2, or MARCOIL with appropriate thresholds
  - RPW8 domain: Pfam (PF05659)
- Classify genes into TNL, CNL, RNL, and other subclasses
Manual Curation
- Verify gene models using transcriptomic evidence
- Check for conserved NBS motifs (P-loop, kinase-2, kinase-3a)
- Correct obviously fragmented or fused genes

Diagram 1: NBS-LRR Gene Identification and Analysis Workflow

Evolutionary Analysis Methods

To accurately trace NBS gene loss and gain across lineages, implement the following evolutionary analyses:

Orthogroup Delineation
- Use OrthoFinder with DIAMOND for sequence similarity searches and MCL for clustering [60]
- Identify core orthogroups (shared across species) and lineage-specific expansions
Duplication Pattern Analysis
- Calculate synonymous (Ks) and nonsynonymous (Ka) substitution rates for gene pairs
- Identify tandem duplications (genes within 10 genes of each other on same chromosome) and segmental duplications (from WGD) [59]
- Use MCScanX or similar tools for whole-genome duplication analysis
Selection Pressure Analysis
- Calculate Ka/Ks ratios to identify positive selection (Ka/Ks > 1), purifying selection (Ka/Ks < 1), or neutral evolution (Ka/Ks ≈ 1)
- Compare selection pressures between TNL and CNL subfamilies [59]

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents and Resources for NBS-LRR Gene Studies

Reagent/Resource	Specific Examples/Types	Function in NBS-LRR Research	Implementation Considerations
Genome Assemblies	Chromosome-scale, Telomere-to-telomere	Reference for gene identification and synteny	Prioritize assemblies with high BUSCO scores (>90%) and low duplicate rates [61]
Transcriptomic Data	RNA-seq (Illumina), Iso-seq (PacBio), Nanopore	Evidence for gene model validation and expression	Combine tissue-specific, stress-induced, and developmental time courses [33] [64]
Protein Databases	Pfam, InterPro, CDD, OrthoDB	Domain identification and functional annotation	Use for verifying NBS, TIR, CC, LRR, and RPW8 domains [33] [60]
Annotation Pipelines	BRAKER, MAKER, EVidenceModeler	Automated gene prediction and evidence integration	Run multiple pipelines and compare results [61] [63]
Evolutionary Analysis Tools	OrthoFinder, MCScanX, FastTree	Orthology inference, duplication dating, phylogeny	Use species-specific substitution rates for dating [60] [59]
Validation Reagents	Virus-Induced Gene Silencing (VIGS), qRT-PCR primers	Functional validation of candidate NBS-LRR genes	Design primers targeting conserved motifs for expression analysis [33] [60]

Case Studies in NBS Gene Evolution

Brassica Lineage-Specific Evolution

Comparative analysis of NBS-encoding genes between Brassica oleracea, Brassica rapa, and Arabidopsis thaliana revealed that after whole-genome triplication of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions were rapidly deleted or lost [62]. However, subsequent species-specific gene amplification occurred through tandem duplication after the divergence of B. rapa and B. oleracea [62]. This pattern highlights how both gene loss (fractionation) and gain (recent duplication) shape NBS gene repertoires.

Rosaceae Lineage-Specific Expansions

Analysis of five Rosaceae species demonstrated that recent species-specific duplications have driven NBS-LRR gene expansion, particularly in woody perennial species [59]. The proportion of NBS-LRR genes derived from species-specific duplication ranged from 37.01% in peach to 66.04% in apple [59]. Furthermore, TNL genes showed significantly higher Ks values and Ka/Ks ratios than non-TNL genes, suggesting different evolutionary patterns and potentially distinct mechanisms for adapting to different pathogen pressures [59].

Diagram 2: Evolutionary Processes Shaping NBS-LRR Gene Repertoires

Addressing incomplete genomes and annotation inconsistencies is not merely a technical exercise but a fundamental requirement for accurate evolutionary inference of NBS gene loss and gain across plant lineages. The implementation of integrated annotation approaches, careful manual curation of NBS-LRR gene models, and application of standardized evolutionary分析方法 will enable more robust comparisons across species. As genome sequencing and annotation technologies continue to advance, particularly with the advent of telomere-to-telomere assemblies and long-read transcriptomics, we can anticipate progressively more complete and accurate characterization of this dynamically evolving gene family. This in turn will illuminate the complex co-evolutionary arms race between plants and their pathogens that has shaped the remarkable diversity of NBS-LRR genes observed across the plant kingdom.

Distinguishing Functional Genes from Pseudogenes and Truncated Copies

In the study of Nucleotide-Binding Site-Leucine Rich Repeat (NBS-LRR) genes—the largest class of plant disease resistance (R) genes—researchers consistently encounter a complex genomic landscape filled with functional genes, pseudogenes, and truncated copies. This diversity arises from the intense evolutionary arms race between plants and their pathogens, which drives rapid gene duplication, diversification, and degeneration [18] [4]. For scientists investigating NBS gene loss and gain across plant lineages, accurately distinguishing functional resistance genes from their non-functional counterparts is not merely a technical prerequisite but fundamental to understanding evolutionary dynamics and molecular immune mechanisms.

The prevalence of pseudogenes and truncated copies presents a substantial annotation challenge. Recent pan-genomic studies in soybeans reveal that structural variations routinely disrupt coding sequences, creating non-functional gene copies that complicate genomic analyses [65]. Furthermore, in angiosperms, the process of rediploidization following whole-genome duplication generates numerous pseudogenes, though interestingly, recombinative DNA deletion appears to be a more prominent mechanism of gene loss than pseudogenization [66]. This technical guide provides a comprehensive framework for differentiating functional NBS genes from pseudogenes and truncated copies, with specific methodologies and considerations for research on plant NBS gene families.

Definitions and Key Characteristics

Within plant genomes, particularly in complex gene families like NBS-LRRs, three primary types of gene sequences require differentiation. Their defining characteristics are summarized in Table 1.

Table 1: Key Characteristics of Functional Genes, Pseudogenes, and Truncated Copies

Feature	Functional Gene	Pseudogene	Truncated Copy
Open Reading Frame	Complete, uninterrupted	Disrupted by premature stop codons, frameshifts, or indels	Often partial but may have intact sub-regions
Conserved Domains	Full complement (TIR/CC, NBS, LRR) intact	Critical domains often missing or degenerate	May retain one or more functional domains
Transcriptional Activity	Expressed under specific conditions (e.g., pathogen attack)	Typically not expressed; some may produce regulatory RNAs	Potentially expressed, depending on genomic context
Evolutionary Pressure	Under purifying selection	Evolving neutrally or under relaxed selection	Selection pressure varies
Protein Function	Encodes a functional immune receptor	Non-functional	May produce a truncated protein with potential novel function

Functional NBS-LRR Genes encode proteins that typically contain three core domains: an N-terminal Toll/Interleukin-1 Receptor (TIR) or Coiled-Coil (CC) domain, a central Nucleotide-Binding Site (NBS), and a C-terminal Leucine-Rich Repeat (LRR) region [18] [33]. These genes are under purifying selection, are often induced by pathogen infection or specific stresses, and can trigger defense responses such as the hypersensitive response [67] [33].

Pseudogenes are genomic sequences that resemble functional protein-coding genes but have been inactivated by disabling mutations. These mutations include frameshifts, in-frame stop codons, or disruptive insertions of transposable elements in the original protein-coding sequence or its regulatory regions [68]. They are generally considered evolutionary relics, though some may acquire novel regulatory functions as non-coding RNAs [68] [66].

Truncated Copies (or partial genes) often arise from unequal recombination or incomplete duplication events. They may lack substantial portions of the canonical structure (e.g., missing the LRR or N-terminal domain) but can still be transcribed. Their biological roles are nuanced; they may function as decoys, dominant-negative regulators, or components of paired immune receptors [67].

Experimental and Bioinformatics Workflows for Differentiation

A multi-faceted approach combining bioinformatics predictions and experimental validation is required for accurate classification. The following diagram illustrates a comprehensive workflow for distinguishing these gene types.

Core Bioinformatics Methodologies

Domain and Motif Analysis The initial step involves identifying conserved domains and motifs. Use HMMER with the NB-ARC domain profile (PF00931) from the Pfam database to identify core NBS domains [18] [33] [69]. Subsequently, scan for TIR (PF01582), CC (using Coiled-coil predictors like COILS with a threshold of 0.9), RPW8 (PF05659), and LRR (PF13855, PF00560, PF07725, PF12779, PF13306, PF13516, PF13855, PF14580) domains [33] [69]. Tools like the MEME Suite can identify conserved motifs within NBS domains; functional genes typically exhibit a complete set of eight conserved motifs in the correct order [18].

Open Reading Frame and Gene Structure Assessment Analyze the coding sequence for a complete, uninterrupted Open Reading Frame (ORF). Pseudogenes are characterized by the presence of premature stop codons, frameshift mutations caused by insertions or deletions, and the disruption of splicing sites [68]. Tools like GeneWise or GeneMark can assist in this analysis. Furthermore, compare genomic DNA with available transcriptome data or expressed sequence tags (ESTs) to verify correct splicing and expression.

Evolutionary Analysis (Ka/Ks Calculation) Calculate the ratio of non-synonymous (Ka) to synonymous (Ks) substitutions to infer selective pressure. Functional genes are typically under purifying selection (Ka/Ks < 1), while pseudogenes evolve neutrally (Ka/Ks ≈ 1) [68] [66]. Use tools like KaKs_Calculator 2.0 with models such as Nei-Gojobori (NG) for this analysis [69]. Note that some pseudogenes may show signs of purifying selection if they have acquired new regulatory functions [68].

Synteny and Collinearity Analysis For genes derived from whole-genome duplication (WGD) events, examine collinear genomic segments. A pseudogene in one segment paired with a functional gene in the homologous segment is classified as a WGM-derived pseudogene [66]. Tools like MCScanX are widely used for synteny analysis [29] [69]. This helps distinguish pseudogenes originating from large-scale duplication events from those arising from small-scale duplications.

Experimental Validation Techniques

Transcriptomic Analysis RNA-seq data under various conditions, especially pathogen challenge, is a powerful tool for validation. Functional NBS genes are often differentially expressed during infection [67] [33]. Use tools like Hisat2 for read alignment and DESeq2 or Cufflinks/Cuffdiff for differential expression analysis [67] [69]. The absence of expression may suggest a pseudogene, though this requires confirmation with other methods.

Functional Assays Virus-Induced Gene Silencing (VIGS) can be used to knock down candidate genes in resistant plants. A loss of resistance phenotype confirms the functional importance of the targeted gene, as demonstrated in studies of GaNBS in cotton [4] and a Verticillium wilt resistance gene in cotton [69]. Conversely, heterologous overexpression of a candidate gene in a susceptible plant can confer resistance, providing strong evidence for its functionality [69].

The Scientist's Toolkit: Key Research Reagents and Solutions

Table 2: Essential Research Reagents and Tools for NBS Gene Characterization

Reagent/Tool	Primary Function	Application Example
HMMER/PFAM	Identification of conserved protein domains	Finding NB-ARC (PF00931), TIR, LRR domains [18] [33]
MCScanX	Synteny and collinearity analysis	Identifying WGD-derived gene pairs and pseudogenes [29] [69]
KaKs_Calculator	Calculating Ka/Ks ratios	Inferring evolutionary pressure on gene sequences [69]
PlantCARE	Predicting cis-regulatory elements	Identifying defense-related motifs (e.g., SA/JA-responsive elements) [67]
VIGS Vectors	Functional gene validation via silencing	Knocking down GaNBS to validate its role in virus resistance [4]
RNA-seq Datasets	Profiling gene expression	Identifying NBS genes differentially expressed during pathogen infection [67] [33]

Case Studies in Plant NBS Gene Research

Case Study 1: Pepper NLR Family and Phytophthora capsici Resistance A genome-wide study of Capsicum annuum identified 288 canonical NLR genes. Tandem duplication was a key expansion mechanism. Researchers combined domain analysis, phylogenetic trees, and RNA-seq profiling of Phytophthora capsici-infected resistant and susceptible cultivars to identify 44 differentially expressed NLRs. This integrated approach allowed them to distinguish functional immune receptors from non-functional copies and pinpoint candidates like Caz09g03820 for further study [67].

Case Study 2: Pseudogenization vs. DNA Deletion in Polyploids A large-scale study across 12 paleo-polyploid angiosperms challenged the assumption that pseudogenization is the primary pathway for gene loss after whole-genome duplication. The research found far fewer WGM-derived pseudogenes than expected, suggesting that recombinative DNA deletion is the dominant mechanism. This highlights that what appears as "gene loss" in genomes is often the complete physical removal of the sequence, leaving no pseudogene trace [66].

Distinguishing functional NBS genes from pseudogenes and truncated copies is a critical, multi-step process that requires integrating computational predictions with experimental evidence. As genomic sequencing technologies advance and pan-genome studies become standard, the application of these robust classification frameworks will be essential. This will enable researchers to accurately map the evolutionary history of NBS gene families, understand the mechanisms of gene loss and gain across plant lineages, and ultimately identify true functional resistance genes for crop improvement. Future efforts will likely focus on better characterizing the potential regulatory roles of transcribed pseudogenes and the functional significance of truncated NLR proteins within the plant immune network.

Overcoming Functional Redundancy in Large Gene Families

Functional redundancy in large gene families represents a significant bottleneck in functional genomics and gene discovery efforts. This is particularly true for the Nucleotide-binding site Leucine-Rich Repeat (NBS-LRR or NLR) gene family, which constitutes one of the largest and most dynamic families of plant disease resistance (R) genes. NLR genes encode intracellular immune receptors that recognize pathogen effectors and activate effector-triggered immunity (ETI), often culminating in a hypersensitive response (HR) to restrict pathogen spread [70] [33]. The "arms race" between plants and their pathogens drives rapid evolution and expansion of NLR families, primarily through mechanisms like tandem duplication, leading to the proliferation of numerous, functionally overlapping paralogs [70] [67]. This expansion complicates phenotypic analysis, as disrupting a single gene often fails to produce discernible phenotypes due to compensatory effects from redundant family members, obscuring the roles of individual genes [71] [7].

This technical guide examines strategies for overcoming functional redundancy, framed within research on the gain and loss of NLR genes across plant lineages. Studies comparing resistant and susceptible genotypes often reveal correlations between NLR repertoire composition and disease resistance phenotypes. For instance, in tung trees, the resistant Vernicia montana possesses 149 NBS-LRR genes, including key TIR-domain types, while the susceptible V. fordii has only 90 and lacks TIR-NLRs entirely [7]. Similarly, in eggplant, 269 NBS-LRR genes include 36 TNLs and 2 RNLs, with specific genes differentially expressed in response to bacterial wilt [33]. These lineage-specific expansions and losses highlight the dynamic nature of this gene family and its direct impact on plant health.

The NLR gene family exhibits remarkable quantitative variation across plant species, reflecting diverse evolutionary paths and adaptation strategies. The table below summarizes the family size and composition in recently studied species.

Table 1: Genome-Wide Overview of NLR Genes in Various Plant Species

Plant Species	Total NLR Genes	CNL	TNL	RNL	Atypical/Other	Primary Expansion Mechanism	Reference
Capsicum annuum (Pepper)	288	231	36	2	19	Tandem Duplication	[70] [67]
Solanum melongena (Eggplant)	269	231	36	2	Not Specified	Tandem Duplication	[33]
Salvia miltiorrhiza	196	61	0	1	134	Not Specified	[10]
Nicotiana tabacum (Tobacco)	603	~45.5% CN	~2.5% TN	Not Specified	~52% (NBS-only, etc.)	Whole-Genome Duplication	[72]
Vernicia montana (Resistant)	149	98 (CC-domain)	12 (TIR-domain)	Not Specified	39	Tandem Duplication	[7]
Vernicia fordii (Susceptible)	90	49 (CC-domain)	0	Not Specified	41	Tandem Duplication	[7]

This quantitative diversity necessitates robust methods for functional characterization. A broader analysis of 34 plant species identified 12,820 NBS-domain-containing genes, classified into 168 different domain architecture classes, underscoring the extensive structural and functional diversification within this superfamily [4].

Experimental Strategies and Protocols

Overcoming redundancy requires a multi-faceted approach that integrates genomics, transcriptomics, and high-throughput functional validation.

Genome-Wide Identification and Evolutionary Analysis

A critical first step is the comprehensive identification of all NLR family members within a genome.

Protocol 1.1: Identification and Classification of NLR Genes
- Step 1: HMMER Search: Use HMMER software (e.g., v3.3.2) to search the target proteome with the NB-ARC domain Hidden Markov Model (HMM) (PF00931 from Pfam). A typical E-value cutoff is 1 × 10^-5 [70] [33].
- Step 2: Domain Validation: Confirm the presence of the NB-ARC domain in candidate sequences using the NCBI Conserved Domain Database (CDD) (cd00204) and Pfam batch search [70] [67].
- Step 3: Architectural Classification: Identify N-terminal (TIR, CC, RPW8) and C-terminal (LRR) domains using tools like Pfam, SMART, and COILS (for CC domains). Classify genes into subfamilies (TNL, CNL, RNL, and atypical forms) based on domain presence and completeness [10] [33].
- Step 4: Evolutionary Analysis: Use MCScanX (often implemented within TBtools) to analyze synteny and identify gene duplication events (tandem, segmental, whole-genome). Calculate non-synonymous (Ka) and synonymous (Ks) substitution rates to assess selection pressures [72] [4].

Leveraging Expression-Based Screening for Candidate Prioritization

Transcriptomic data provides a powerful filter to identify NLR genes most likely involved in a specific immune response, thereby reducing the functional validation space.

Protocol 2.1: Expression Profiling and Differential Expression
- Step 1: RNA-Seq Data Collection: Obtain RNA sequencing data from resistant and susceptible genotypes under control and pathogen-infected conditions. Public repositories like NCBI SRA are primary sources (e.g., SRR9883231, SRR9883230) [70].
- Step 2: Read Mapping and Quantification: Map clean reads to the reference genome using HISAT2. Calculate gene expression levels (e.g., FPKM, TPM) using tools like Cufflinks or the DESeq2 plugin in TBtools [70] [72].
- Step 3: Differential Expression Analysis: Identify significantly differentially expressed NLR genes (DEGs) using |log2 Fold Change| ≥ 1 and FDR < 0.05 as standard thresholds [70].
- Step 4: Co-expression and Network Analysis: Construct protein-protein interaction (PPI) networks using tools like STRING to identify hub genes and potential functional modules [70].

A groundbreaking study demonstrated that functional NLRs are often highly expressed in uninfected plants, challenging the paradigm that they are strictly repressed. This "expression signature" can be exploited for candidate prioritization. A proof-of-concept study created a transgenic array of 995 NLRs from diverse grasses in wheat, selected based on high expression, and identified 31 new resistance genes against rust pathogens, showcasing the power of scale [73].

High-Throughput Functional Validation

Moving from correlation to causality requires direct functional testing. High-throughput methodologies are essential to tackle the sheer number of candidates.

Protocol 3.1: Virus-Induced Gene Silencing (VIGS)
- Application: Rapid, transient loss-of-function screening for redundant genes. Silencing a single gene might not yield a phenotype, but silencing entire subclades can.
- Procedure: A VIGS vector (e.g., based on Tobacco Rattle Virus) is used to deliver a fragment of the target NLR gene. The recombinant vector is inoculated into plants (e.g., Agrobacterium-mediated infiltration). Silenced plants are then challenged with the pathogen to assess loss of resistance [7].
- Example: Silencing of GaNBS in resistant cotton led to increased viral titers, confirming its role in resistance to cotton leaf curl disease [4].
Protocol 3.2: CRISPR Activation (CRISPRa) for Gain-of-Function
- Application: A transformative approach to overcome redundancy by simultaneously activating multiple redundant genes or their master regulators, creating a discernible gain-of-function phenotype [71].
- Procedure:
  - System Design: A deactivated Cas9 (dCas9) is fused to transcriptional activators (e.g., VP64, p65AD).
  - sgRNA Design: Design sgRNAs to target the promoter regions of candidate redundant NLR genes.
  - Transformation: Deliver the dCas9-activator and sgRNA(s) into the plant.
  - Phenotyping: Assess for enhanced disease resistance or auto-immune symptoms in the absence of pathogen.
- Example: CRISPRa-mediated upregulation of the SlPR-1 and SlPAL2 genes in tomato enhanced defense against bacterial infection [71].

Diagram 1: Integrated workflow for identifying functional NLRs and overcoming redundancy, showing the convergence of bioinformatic, transcriptomic, and functional validation approaches.

The Scientist's Toolkit: Key Research Reagents and Solutions

Successful dissection of redundant gene families relies on a suite of specialized bioinformatic and molecular tools.

Table 2: Essential Research Reagents and Solutions for NLR Functional Analysis

Category	Tool/Reagent	Specific Function	Application Example
Bioinformatics	HMMER (PF00931)	Identifies NBS-ARC domain in proteome	Initial genome-wide NLR identification [70] [33]
	MCScanX	Analyzes gene synteny and duplication events	Determine tandem/segmental duplication driving NLR expansion [70] [72]
	OrthoFinder	Clusters genes into orthogroups (OGs)	Cross-species comparative analysis and core OG identification [4]
Omics Analysis	HISAT2/DESeq2	RNA-seq read alignment and differential expression	Identify NLRs responsive to pathogen infection [70] [73]
	STRING Database	Predicts protein-protein interaction networks	Identify hub NLRs in immune signaling networks [70]
Functional Validation	VIGS Vectors (e.g., TRV)	Transient post-transcriptional gene silencing	Rapid loss-of-function assay for resistance [4] [7]
	CRISPR-dCas9 Activators	Targeted gene activation without DNA cleavage	Gain-of-function screening to overcome redundancy (CRISPRa) [71]
	High-Efficiency Transformation Systems	Enables large-scale transgenic complementation	Creating transgenic arrays for hundreds of NLRs [73]

Integrated Workflow and Visualization of NLR Networks

The path from a complex genome to a validated, non-redundant function requires an integrated workflow. This begins with comprehensive genome-wide identification using HMMER and domain analysis, followed by evolutionary analysis to understand duplication history and selective pressures. Transcriptomic profiling during infection then prioritizes candidates. Finally, functional validation employs tailored strategies: VIGS for necessary genes, CRISPRa to activate redundant pathways, and high-throughput transformation for systematic screening [70] [73] [71].

A key concept in understanding NLR function is the "sensor-helper" network. Sensor NLRs detect specific pathogen effectors, while helper NLRs mediate downstream signaling, often for multiple sensors. This network-based organization is another layer of complexity beyond simple redundancy.

Diagram 2: NLR immune network showing helper NLR-mediated signaling. Sensor NLRs recognize pathogen effectors and activate shared helper NLRs, which transduce the defense signal. This network architecture creates functional redundancy, where a single helper can be required for multiple sensors [73].

Functional redundancy in large gene families like the NLRs is no longer an insurmountable barrier. The integration of comparative genomics to understand lineage-specific gains and losses, transcriptomic signatures to prioritize candidates, and innovative functional genomics tools like CRISPRa and high-throughput transformation, provides a robust pipeline for gene discovery. By employing these integrated strategies, researchers can systematically dissect complex genetic networks, identify key non-redundant functions, and accelerate the development of crops with durable, broad-spectrum disease resistance. The ongoing research on NLR evolution and function continues to refine these tools, promising deeper insights into plant immunity and more effective genetic solutions to agricultural challenges.

Analyzing Rapidly Evolving LRR Domains and Recognition Specificities

The leucine-rich repeat (LRR) domain is a versatile structural motif found in a vast array of plant immune receptors, including receptor-like proteins (RLPs) and intracellular NBS-LRR proteins [74]. Its slender, arc-shaped structure maximizes surface area for protein-protein interactions, making it ideal for roles in pathogen sensing and immune activation [74]. In the context of plant immunity, LRR domains are subject to intense evolutionary selection, leading to dramatic diversification in recognition specificities. This rapid evolution of LRR domains is a primary driver behind the phenomenon of NBS gene loss and gain observed across different plant lineages, as genomes dynamically expand and contract their arsenals of immune receptors to cope with changing pathogen pressures [31] [6].

This technical guide explores the mechanisms driving the evolution of LRR domains, their impact on receptor function, and the experimental methodologies used to decipher these processes, framed within the broader research on the evolutionary dynamics of plant immune gene families.

Evolutionary Patterns of NBS-LRR Genes Across Plant Lineages

Comparative genomics reveals that NBS-LRR genes are remarkably dynamic components of plant genomes. They undergo frequent lineage-specific gene duplication and loss events, resulting in significant variation in gene number and repertoire even among closely related species [31] [6]. These evolutionary patterns are not random but can be categorized into distinct models.

Table 1: Evolutionary Patterns of NBS-LRR Genes in Plant Families

Plant Family	Example Species	Evolutionary Pattern	Key Characteristics
Solanaceae	Potato (S. tuberosum)	"Consistent Expansion"	447 NBS genes identified; ongoing gene duplication [31].
	Tomato (S. lycopersicum)	"Expansion & Contraction"	255 NBS genes; initial expansion followed by loss [31].
	Pepper (C. annuum)	"Shrinking"	306 NBS genes; net loss of genes [31].
Rosaceae	Apple (M. domestica), Pear (P. betulifolia)	"Early expansion to abrupt shrinking"	Pattern shared by Maleae tribe; rapid initial diversification followed by loss [6].
	Rose (R. chinensis)	"Continuous Expansion"	Sustained gene duplication [6].
Orchidaceae	Dendrobium officinale	"Degeneration & Diversification"	74 NBS genes; widespread domain loss leading to diversity [11].
Fabaceae	Medicago truncatula, Soybean	"Consistently Expanding"	Frequent gene gains through duplication [6].
Poaceae	Maize (Z. mays)	"Contracting"	Number of NBS genes is half that of sorghum and rice [31].

A key manifestation of this evolution is the frequent and lineage-specific loss of entire NBS-LRR subclasses. For instance, the TNL subclass has been completely lost in monocots like rice and some medicinal plants such as Salvia miltiorrhiza, while the RNL subclass is often maintained in low copy numbers [10] [11]. These gains and losses are primarily driven by mechanisms such as tandem gene duplication, which is the major contributor to gene family expansion, and unequal crossing-over, which facilitates the creation of novel LRR configurations [31] [74].

Molecular Mechanisms of LRR Domain Evolution and Diversification

The exceptional variability of LRR domains stems from specific structural features and evolutionary mechanisms that generate diversity.

Structural Basis for Adaptability

The LRR domain forms a slender, arc-shaped solenoid structure composed of repeating units, each typically containing a β-strand connected by loops [74]. This creates a large, curved surface ideal for interactions. In plant LRRs, a conserved 16-residue segment—LxxLxLxxNxL(s/t)GxLP (where "L" is a hydrophobic residue and "x" is variable)—forms the core of each repeat [75]. The residues on the solvent-exposed, concave β-sheet are highly variable and often under positive selection, directly enabling the evolution of new pathogen recognition specificities [74].

Genetic Mechanisms Driving Diversification

Positive Selection on Solvent-Exposed Residues: The variable "x" positions in the LRR consensus, particularly those on the concave surface, are hotspots for positive selection, allowing for adaptive changes in binding specificity [74].
Tandem Duplication and Unequal Crossing-Over: LRR-encoding genes are often arranged in tandem arrays on chromosomes. Unequal crossing-over between these arrays readily causes the duplication or deletion of entire LRR repeats, rapidly altering the receptor's interaction surface and specificity [74]. Studies of the lettuce RGC2 cluster show LRR repeat numbers varying from 40 to 47 due to such events [74].
Gene Conversion and Domain Shuffling: Ectopic recombination between homologous sequences can lead to gene conversion, further mixing variation between paralogs and creating novel LRR combinations [74].

Table 2: Key Genetic and Molecular Features of LRR Domain Evolution

Feature	Description	Impact on Recognition Specificity
Consensus Motif	Plant-specific LRRs often follow LxxLxLxxNxL(s/t)GxLP [75].	The conserved core maintains structural integrity, while variable "x" positions determine specificity.
Concave β-Sheet	Continuous surface formed by aligned β-strands.	Primary site for effector binding; solvent-exposed residues are under diversifying selection [74].
Tandem Arrays	NBS-LRR genes cluster on chromosomes [31].	Facilitates unequal crossing-over, leading to repeat number variation and new specificities.
Subfamily Loss	Lineage-specific absence of TNL or RNL genes [10] [11].	Shapes the overall immune strategy of a lineage, constraining or enabling certain recognition pathways.

Experimental Methodologies for Functional Characterization

A multi-pronged approach is required to functionally characterize rapidly evolving LRR domains and link sequence variation to immune function.

Genome-Wide Identification and Classification

The foundational step is a comprehensive bioinformatics pipeline to identify all NBS-LRR or LRR-RLP genes in a genome.

Sequence Retrieval: Obtain whole-genome sequences and annotation files from databases like Phytozome or specialized resources (e.g., Genome Database for Rosaceae, Pepper Genome Database) [31] [6].
Candidate Gene Identification: Perform dual searches using BLAST and Hidden Markov Model (HMM) scans. The HMM search typically uses the NB-ARC domain (Pfam: PF00931) for NBS-LRRs or custom profiles for LRR-RLPs as a query [31] [6] [75].
Domain Validation and Classification: Submit candidate sequences to domain analysis tools (Pfam, SMART, NCBI-CDD) to confirm the presence of NBS, TIR, CC, RPW8, and LRR domains. Classification into TNL, CNL, or RNL is based on the N-terminal domain [31] [6].
Motif and Gene Structure Analysis: Use tools like MEME for conserved motif analysis and GSDS to map gene structures (exons/introns) [6].

Functional Validation of Immune Regulators

Once candidate genes are identified, their role in immunity must be tested.

Heterologous Expression & Gene Silencing: As demonstrated in apple, Agrobacterium-mediated overexpression or RNA interference (RNAi) can be used to test whether a specific LRR-RLP (e.g., MdRLP7) enhances or reduces resistance to a pathogen like Valsa mali [76]. The model plant Nicotiana benthamiana is a common heterologous system for such assays [77].
CRISPR/Cas9-Mediated Genome Editing: This technology is used to generate knockout mutants, providing definitive evidence of gene function. In apple, CRISPR/Cas9 was used to confirm that certain MdRLPs were not receptors for a specific fungal PAMP [76].
Transcriptional Profiling: Expression analysis via RNA-seq or qPCR under pathogen infection or hormone treatment (e.g., Salicylic Acid) identifies which NBS-LRR genes are differentially expressed, implicating them in immune responses [10] [11].

Deciphering Direct Recognition: NLR-Effector Interactions

Determining the specific effector recognized by an NLR is a major challenge. A modern computational pipeline can prioritize interactions for experimental validation.

Structure Prediction: Use AlphaFold2-Multimer to predict the 3D structure of the NLRLRR domain in complex with a candidate pathogen effector [78].
Binding Affinity and Energy Calculation: Employ machine learning models (e.g., Area-Affinity) to predict the binding affinity (BA) and binding energy (BE) for the predicted complex [78].
Interaction Prediction: An Ensemble machine learning model can distinguish "true" interacting pairs from non-functional "forced" pairs with high accuracy by comparing the calculated BA and BE values against a trained model [78].
Experimental Validation: Predicted interactions require final validation through methods like yeast two-hybrid assays, co-immunoprecipitation, or functional assays in plants [78].

Figure 1: Integrated experimental workflow for identifying and characterizing LRR-domain immune receptors, from genome mining to functional validation.

The Scientist's Toolkit: Key Research Reagents and Solutions

The following table details essential materials and tools for research in this field.

Table 3: Essential Research Reagents and Computational Tools

Reagent / Tool Name	Category	Primary Function in Research
Nicotiana benthamiana	Model Organism	Heterologous system for transient gene expression (e.g., agroinfiltration) to test protein function and interactions [76] [77].
AlphaFold2-Multimer	Software	Predicts 3D structures of protein complexes (e.g., NLRLRR-Effector) to hypothesize binding interfaces [78].
Phyto-LRR Prediction	Database & Tool	Specialized program and database for efficiently predicting LRR motifs in plant LRR-RLKs/RLPs [75].
Area-Affinity (ML Models)	Software	Suite of machine learning models used to calculate binding affinity and energy from predicted protein structures [78].
MEME Suite	Software	Identifies conserved protein motifs within sequences, helping to characterize and classify NBS-LRR domains [31] [6].
CRISPR/Cas9 System	Molecular Tool	Enables targeted gene knockouts to establish gene function, such as validating the role of an LRR gene in immunity [76].

The rapid evolution of LRR domains is a cornerstone of the plant immune system's ability to adapt, directly fueling the dynamic patterns of NBS gene gain and loss observed across plant lineages. The integration of comparative genomics, sophisticated computational predictions, and robust experimental validation provides a powerful framework for deciphering these complex evolutionary processes. Understanding the rules governing LRR diversification not only advances fundamental knowledge of plant-pathogen co-evolution but also provides the tools and insights needed to engineer durable disease resistance in crops.

Interpreting Expression Patterns in Low-Expressed Resistance Genes

Nucleotide-binding site leucine-rich repeat (NBS-LRR) genes represent the largest class of plant disease resistance (R) genes, encoding intracellular immune receptors that initiate effector-triggered immunity (ETI) upon pathogen recognition [10] [5]. Accurate interpretation of their expression patterns is crucial for understanding plant immunity mechanisms, particularly for low-expressed variants that complicate conventional transcriptomic analysis. The prevailing assumption that NLRs are transcriptionally repressed due to fitness costs has recently been challenged by evidence demonstrating that functional NLRs can be highly expressed in uninfected plants [73]. This paradigm shift necessitates refined methodologies for distinguishing genuinely low-expressed but functional NLRs from non-functional pseudogenes or transcriptionally silenced loci.

Within the broader context of NBS gene loss and gain across plant lineages, expression analysis provides critical functional insights. Comparative genomics reveals dramatic variation in NBS-LRR family sizes, from approximately 25 NLRs in the bryophyte Physcomitrella patens to over 2,000 in hexaploid wheat [4]. Lineage-specific evolution is evident in the significant reduction or complete loss of TNL subfamilies in monocots and specific dicot lineages including Salvia species [10] [5]. In this technical guide, we present advanced methodologies for accurate interpretation of low-expressed resistance genes, framing these approaches within the evolutionary dynamics of NBS gene families across plant lineages.

Methodological Framework for Analyzing Low-Expressed NBS-LRR Genes

Transcriptomic Identification and Quantification

Comprehensive Identification Pipeline: Begin with genome-wide identification using Hidden Markov Model (HMM) profiles of the NB-ARC domain (PF00931) from the Pfam database [33] [79] [72]. Follow with domain architecture validation using SMART, NCBI CDD, and COILS to confirm associated domains (TIR, CC, LRR, RPW8) [33] [72]. This foundational step ensures complete characterization of the NBS-LRR repertoire before expression analysis.

RNA-Seq Experimental Design: For expression profiling of low-expressed NBS-LRR genes, implement replication-heavy designs with at least four biological replicates to achieve sufficient power for detecting subtle expression differences [73]. Sequence to high depth (>50 million reads per sample) using strand-specific protocols to accurately capture antisense transcripts and overlapping gene models common in NBS-LRR clusters [4]. Include time-course experiments capturing multiple post-inoculation time points (0, 24, 48 hours) to identify transient expression patterns critical for immune activation [33].

Quantification and Normalization: Process raw sequencing data through quality control (Trimmomatic), alignment (HISAT2), and transcript quantification (Cufflinks with FPKM normalization) [72]. For low-expressed genes, avoid overly stringent expression filters that might eliminate genuine low-abundance transcripts; instead, retain genes with ≥0.5 FPKM in at least 20% of samples [73]. Confirm findings with targeted qRT-PCR using optimized conditions for GC-rich NBS-LRR sequences [33].

Statistical Framework for Low-Abundance Transcripts

Gene Homeostasis Z-Index Application: Implement the gene homeostasis Z-index to identify genuine regulatory activity in low-expressed NBS-LRR genes [80]. This method distinguishes genes with widespread low expression from those with selective upregulation in specific cell subpopulations, which is particularly relevant for NBS-LRR genes that may be expressed only in pathogen-contact cells.

Calculation Method:

Calculate the "k-proportion" - the percentage of cells with expression levels below an integer value (k) determined by the mean gene expression count [80].
Compare observed k-proportion against expected values from negative binomial distributions with shared dispersion parameters.
Compute Z-scores representing relative expression instability, with higher values indicating active regulation [80].

This approach outperforms conventional variability metrics (variance, CV) in detecting regulatory genes whose expression instability arises from small cell subpopulations [80].

Differential Expression Analysis: For comparative experiments, employ specialized tools for low-count RNA-seq data such as DESeq2 with independent filtering disabled or edgeR with robust dispersion estimation. Incorporate phylogenetic relationships when analyzing multiple species to account for evolutionary constraints on expression patterns [4].

Table 1: Statistical Approaches for Low-Expressed NBS-LRR Genes

Method	Application Context	Key Parameters	Advantages for Low-Expression Genes
Gene Homeostasis Z-index [80]	Single-cell RNA-seq; cell population heterogeneity	k-proportion, negative binomial distribution	Identifies genes with expression driven by small cell subpopulations
SCRAN [80]	Cell-to-cell variability assessment	Mean-expression-dependent trend	Effective for capturing biological variability in homogeneous populations
Seurat VST [80]	Highly variable gene detection	Variance stabilization transform	Identifies genes with high variance relative to mean expression
DESeq2 with independent filtering disabled	Bulk RNA-seq with low counts	Negative binomial model	Maintains sensitivity for low-count genes without filtering

Functional Validation of Low-Expressed Candidates

Virus-Induced Gene Silencing (VIGS): Prioritize low-expressed NBS-LRR genes with patterns suggesting regulatory specialization for functional validation. Implement VIGS in resistant genotypes using Agrobacterium-mediated delivery of TRV-based vectors containing 150-300bp gene-specific fragments [4]. Include empty vector and non-silenced controls. Challenge silenced plants with target pathogens and quantify disease symptoms and pathogen titers. As demonstrated in cotton, successful silencing of functional low-expressed NBS-LRR genes (e.g., GaNBS) significantly increases viral titers, confirming their role in resistance despite low expression levels [4].

High-Throughput Transformation Arrays: For systematic functional screening, employ high-throughput transformation systems as demonstrated in wheat, where 995 NLRs were simultaneously tested for resistance to rust pathogens [73]. This approach identifies functional receptors regardless of expression level, with the finding that multiple transgene copies are sometimes required for resistance, suggesting expression threshold effects [73].

Heterologous Expression: Validate function through heterologous expression in susceptible plants or model systems. As demonstrated with maize NBS-LRR genes in Arabidopsis, this approach confirms functionality while circumventing potential autoimmunity issues in native systems [4].

Comparative Expression Patterns Across Plant Lineages

Evolutionary Context of NBS-LRR Expression

The interpretation of low-expressed NBS-LRR genes requires consideration of lineage-specific evolutionary patterns. Across land plants, NBS-LRR genes show remarkable diversification, with 12,820 NBS-domain-containing genes identified across 34 species from mosses to monocots and dicots, classified into 168 distinct domain architecture classes [4]. This expansion occurred primarily in flowering plants, with bryophytes maintaining relatively small NLR repertoires (approximately 25 in Physcomitrella patens) [4].

Expression patterns frequently reflect evolutionary history, with tandemly duplicated NBS-LRR genes often showing coordinated expression while maintaining distinct induction thresholds [4]. Phylogenetic analysis of NBS-LRR families across multiple Nicotiana species reveals that whole-genome duplication contributes significantly to NBS gene family expansion, with 76.62% of N. tabacum NBS members traceable to parental genomes [72].

Table 2: NBS-LRR Gene Family Size Variation Across Plant Lineages

Plant Species	Total NBS Genes	CNL	TNL	RNL	Notable Expression Features
*Arabidopsis thaliana* [10]	207	61	139	7	Known functional NLRs enriched in highly expressed transcripts
*Oryza sativa* (rice) [10]	505	505	0	0	Complete loss of TNL subfamily
*Salvia miltiorrhiza* [10]	196	61	0	1	Marked reduction in TNL and RNL subfamilies
*Solanum melongena* (eggplant) [33]	269	231	36	2	Uneven distribution with clustering on chromosomes 10-12
*Capsicum annuum* (pepper) [5]	252	248	4	-	NLNLN subclass represented by only one gene
*Nicotiana benthamiana* [79]	156	25	5	4	60 N-type proteins lacking LRR domains

Subfamily-Specific Expression Characteristics

Expression patterns diverge significantly between NBS-LRR subfamilies. CNL-type genes frequently show broader expression ranges than TNL-type genes, with some maintaining constitutive expression while others remain silent until pathogen challenge [33]. RNL subfamily members, functioning as helper NLRs, often display more stable expression but with pronounced tissue specificity [73]. For example, in tomato, helper NLR NRC6 shows high root-specific expression while NRC0 expression varies significantly between cultivars [73].

Low-expressed TNL genes require particular attention in monocots and specific dicot lineages where this subfamily has undergone significant reduction or complete loss. In Salvia miltiorrhiza, only 2 TNL-type genes were identified from 196 NBS-LRR genes, while no TNL subfamily members were found across five Salvia species examined [10]. Similar patterns occur in pepper, with only 4 TNL genes identified from 252 NBS-LRR candidates [5].

Technical Considerations for Experimental Design

Addressing Technical Challenges

Spurious Mapping: NBS-LRR genes frequently reside in complex genomic regions with high sequence similarity due to tandem duplications. Implement stringent mapping parameters and verify alignments through visual inspection in IGV. For highly similar paralogs, consider assigning reads to gene groups rather than individual genes [4].

Stochastic Expression: Low-expressed genes show greater variability in transcript detection. Utilize the gene homeostasis Z-index to distinguish technical noise from biological regulation [80]. Incorporate spike-in controls to quantify technical variability and normalize accordingly.

Temporal Dynamics: Immune-responsive genes often exhibit rapid, transient expression changes. Conduct dense time-course experiments with frequent sampling (3-12 hour intervals) during early infection stages [33]. Employ temporal visualization tools like Temporal GeneTerrain to capture dynamic expression patterns that conventional heatmaps might obscure [81].

Single-Cell Resolution Applications

For comprehensive understanding of low-expressed NBS-LRR genes, implement single-cell RNA-seq approaches. Traditional bulk RNA-seq averages expression across cell types, potentially masking cell-specific expression patterns particularly relevant for NBS-LRR genes that may function only in specific cell types [80]. The gene homeostasis Z-index effectively identifies NBS-LRR genes with expression restricted to pathogen-responsive cell subsets, providing biological context for apparently low expression in bulk analyses [80].

Integrative Analysis Framework

Multi-Omics Data Integration

Combine expression data with genomic context information, as NBS-LRR genes distributed in clusters across chromosomes often show coordinated expression patterns [33] [72]. In pepper, 54% of NBS-LRR genes form 47 gene clusters driven by tandem duplications and genomic rearrangements [5]. These clustered genes frequently exhibit similar but not identical expression patterns, with variations in induction thresholds and timing.

Incorporate epigenetic marks (DNA methylation, histone modifications) to distinguish functional low-expressed genes from transcriptionally silenced pseudogenes. Actively regulated genes typically display permissive chromatin states even at low expression levels.

Expression-Evolution Interrelationships

Analyze expression patterns within phylogenetic frameworks to identify evolutionary constraints. Orthogroup analysis across land plants has identified 603 orthogroups with both core (widely conserved) and unique (lineage-specific) NBS-LRR genes [4]. Core orthogroups frequently show more stable expression patterns across species, while lineage-specific genes exhibit greater expression variability and are more likely to show low or restricted expression.

Positive selection signatures in specific NBS-LRR subfamilies often correlate with distinctive expression patterns, particularly for genes responding to rapidly evolving pathogens [4].

Visualization Approaches

Temporal Dynamics Mapping

Implement Temporal GeneTerrain for visualizing expression dynamics of low-expressed NBS-LRR genes [81]. This method creates continuous representations of expression trajectories rather than discrete snapshots, revealing transient waves and sustained shifts in gene activity that might be missed by conventional approaches.

Workflow:

Z-score normalize expression values across time courses
Select genes with coordinated temporal dynamics (Pearson correlation ≥0.5)
Construct protein-protein interaction networks
Embed in two dimensions using Kamada-Kawai force-directed algorithm
Map normalized expression values as Gaussian density fields onto fixed layouts

This approach effectively captures the multidimensional and transient nature of expression patterns, particularly valuable for low-expressed genes with dynamic behavior [81].

Comparative Expression Visualization

For cross-species comparisons, develop phylogenetic expression maps that integrate gene trees with expression heatmaps. This visualization highlights expression conservation and divergence in relation to evolutionary relationships, assisting interpretation of low-expressed orthologs.

Diagram Title: Workflow for Analyzing Low-Expressed NBS-LRR Genes

Diagram Title: Interpretation Framework for Low NBS-LRR Expression

Research Reagent Solutions

Table 3: Essential Research Reagents for NBS-LRR Expression Studies

Reagent/Tool	Specific Application	Function in Analysis	Implementation Example
HMMER v3.1b2 with PF00931 [72]	NBS-LRR identification	Hidden Markov Model for domain identification	Genome-wide mining of NBS genes
TRV-based VIGS vectors [4]	Functional validation	Virus-induced gene silencing	Testing low-expressed gene function in resistant plants
DESeq2 with independent filtering disabled	Differential expression analysis	Statistical testing for low-count genes	Identifying significant expression changes
Gene homeostasis Z-index [80]	Single-cell RNA-seq analysis	Detecting cell-subset specific expression	Identifying regulatory genes in heterogeneous cell populations
Temporal GeneTerrain [81]	Time-course visualization	Dynamic expression pattern mapping	Capturing transient expression waves
OrthoFinder v2.5.1 [4]	Evolutionary analysis	Orthogroup identification across species	Classifying core and lineage-specific NBS genes

Interpreting expression patterns in low-expressed NBS-LRR genes requires specialized methodologies that distinguish technical artifacts from biological significance. The framework presented here integrates evolutionary context, advanced statistical approaches, and functional validation to accurately characterize these critical components of plant immunity. As research in this field advances, particularly through single-cell technologies and improved comparative genomics, our understanding of low-expressed resistance genes will continue to refine, supporting crop improvement programs and fundamental plant immunity research.

Case Studies and Evolutionary Patterns: Validating NBS-LRR Dynamics Across Species

The Rosaceae family, comprising over 3,000 species including economically vital crops like apple, strawberry, peach, and rose, represents a cornerstone of agricultural production and nutritional security worldwide [82] [83]. A central component of plant immunity is encoded by the nucleotide-binding site and leucine-rich repeat (NBS-LRR) gene family, which constitutes one of the largest and most dynamic gene families in plant genomes [31] [6]. These genes play critical roles in pathogen recognition and defense activation, undergoing rapid evolution in response to changing pathogen pressures [84]. Recent comparative genomic analyses across multiple Rosaceae species have revealed that NBS-LRR genes exhibit distinctive 'expansion-contraction' evolutionary patterns, characterized by lineage-specific gene gains and losses that have shaped the immune repertoire of these economically important plants [6]. This whitepaper examines these dynamic evolutionary patterns within the broader context of NBS gene loss and gain across plant lineages, providing researchers with methodological frameworks and analytical approaches for investigating genomic plasticity in plant immunity.

Evolutionary Patterns of NBS-LRR Genes in Rosaceae

Diversity of Evolutionary Trajectories

Comprehensive genome-wide analysis of 12 Rosaceae species has uncovered distinct evolutionary patterns of NBS-LRR genes, driven primarily by independent gene duplication and loss events following species divergence [6]. The ancestral Rosaceae genome contained approximately 102 NBS-LRR genes (7 RNLs, 26 TNLs, and 69 CNLs), which subsequently underwent lineage-specific evolutionary trajectories [6].

Table 1: Evolutionary Patterns of NBS-LRR Genes in Rosaceae Species

Species	Evolutionary Pattern	Key Characteristics
Rosa chinensis	"Continuous expansion"	Progressive gene duplication leading to increased NBS-LRR repertoire
Fragaria vesca	"Expansion followed by contraction, then further expansion"	Complex evolutionary history with multiple phases of gene gain and loss
Rubus occidentalis, Potentilla micrantha, Fragaria iinumae, Gillenia trifoliata	"First expansion and then contraction"	Initial gene duplication followed by subsequent gene loss
Three Prunus species (armenica, avium, persica) and three Maleae species (Pyrus betulifolia, Malus baccata, Malus × domestica)	"Early sharp expanding to abrupt shrinking"	Rapid initial gene expansion followed by pronounced contraction

These divergent evolutionary patterns have resulted in substantial variation in NBS-LRR gene numbers across Rosaceae species, ranging from relatively compact repertoires to extensively expanded families [6]. This genomic plasticity reflects the continuous arms race between plants and their pathogens, with different lineages employing distinct evolutionary strategies to adapt to their specific pathogenic environments.

Comparison with Other Plant Families

The evolutionary patterns observed in Rosaceae mirror similar dynamics documented in other plant families, though with distinct lineage-specific characteristics:

Solanaceae Family: Potato exhibits "consistent expansion," tomato shows "first expansion and then contraction," while pepper displays a "shrinking" pattern [31].
Fabaceae and Brassicaceae Families: Fabaceae and Rosaceae species generally exhibit "consistent expansion" patterns, while Brassicaceae species follow "first expansion and then contraction" evolutionary trajectories [31] [84].
Soapberry Family (Sapindaceae): Xanthoceras sorbifolium displays "first expansion and then contraction," while both Acer yangbiense and Dinnocarpus longan exhibit "expansion followed by contraction, and then further expansion" patterns [84].

These comparative analyses across plant families reveal that NBS-encoding genes exhibit diverse and dynamic evolutionary patterns, giving rise to the discrepant gene numbers observed today, with species-specific tandem duplications contributing most significantly to gene expansions [31].

Methodological Framework for Analysis

Identification and Classification of NBS-LRR Genes

Table 2: Standardized Protocol for NBS-LRR Gene Identification

Step	Method	Parameters	Purpose
1. Gene Identification	BLAST Search	Expect value threshold: 1.0	Initial identification of candidate NBS-encoding genes
2. Domain Confirmation	HMMER Search (Hidden Markov Model)	Pfam NB-ARC domain (PF00931); E-value: 10⁻⁴	Confirm presence of NBS domain
3. Domain Architecture Analysis	Pfam, SMART, NCBI-CDD	CC (PF18052), TIR (PF01582), RPW8 (PF05659) domains	Classification into TNL, CNL, RNL subclasses
4. Motif Identification	MEME Suite	Discovered motifs: 10	Identify conserved amino acid motifs
5. Structural Analysis	GSDS 2.0	-	Visualize gene structure, intron/exon boundaries

The experimental workflow begins with the retrieval of whole-genome sequences and annotation files from databases such as the Genome Database for Rosaceae (https://www.rosaceae.org/) [6]. Following identification, NBS-LRR genes are classified into three subclasses based on their N-terminal domains: TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL) [6]. The CNL subclass typically dominates in terms of gene numbers across Rosaceae species, while RNL genes remain at low copy numbers due to their conserved functions in signal transduction rather than direct pathogen recognition [31] [84].

Evolutionary and Phylogenetic Analysis

NBS-LRR Gene Evolutionary Analysis Workflow

Phylogenetic analysis typically involves the reconstruction of evolutionary relationships using maximum likelihood or Bayesian inference methods [6]. The identification of gene clusters on chromosomes is particularly important, as NBS-LRR genes typically arrange as tandem arrays rather than existing as singletons [31]. Chromosomal distribution mapping provides insights into the mechanisms of gene family expansion, with tandem duplications recognized as the primary driver of NBS-LRR gene diversification in Rosaceae [31].

Molecular dating approaches can be integrated with phylogenetic analyses to correlate evolutionary events with geological and climatic changes. For example, estimates of divergence events in Rosa species indicate rapid differentiation around 4.46 million years ago, potentially influenced by the uplift of the Qinghai-Tibet Plateau during the Late Miocene [85].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Research Reagents and Resources for NBS-LRR Genomics

Research Reagent/Resource	Function/Application	Example Use Case
Phytozome Database	Genomic data repository	Access to annotated plant genomes
Genome Database for Rosaceae (GDR)	Rosaceae-specific genomic resources	Retrieval of Rosaceae genome sequences and annotations
Pfam Database	Protein family curation	NB-ARC domain identification (PF00931)
MEME Suite	Motif discovery and analysis	Identification of conserved NBS domain motifs
OrthoMCL	Ortholog group identification	Comparative analysis of NBS-LRR genes across species
Plant Genomic DNA Extraction Kits	High-quality DNA isolation	Preparation of sequencing libraries
PacBio SMRT Sequencing	Long-read sequencing technology	Genome assembly and structural variant detection
Illumina RNA-seq Library Prep Kits	Transcriptome profiling	Expression analysis of NBS-LRR genes under pathogen challenge

This comprehensive toolkit enables researchers to identify, classify, and conduct evolutionary analyses of NBS-LRR genes across Rosaceae species. The integration of multiple bioinformatic tools with experimental validation approaches provides a robust framework for investigating the dynamic evolutionary patterns of these critical immune genes.

Biological Significance and Research Applications

Functional Implications of Expansion-Contraction Patterns

The divergent evolutionary patterns observed in Rosaceae NBS-LRR genes have direct implications for disease resistance mechanisms and breeding strategies:

Gene Family Expansion: Species with expanding NBS-LRR repertoires, such as Rosa chinensis, potentially possess enhanced capabilities for pathogen recognition through the diversification of LRR domains responsible for specific effector binding [6].
Gene Family Contraction: Contracting lineages may employ alternative defense strategies or experience reduced pathogen pressure, leading to the loss of superfluous resistance genes [6].
Subclass-Specific Dynamics: The CNL subclass typically demonstrates the most dynamic evolutionary patterns, while RNL genes remain relatively stable due to their conserved role in signaling transduction downstream of pathogen recognition [31] [84].

Recent studies have begun to elucidate the connection between evolutionary patterns and functional specialization. In Dendrobium species, NBS gene evolution is characterized by frequent domain degeneration, including type changes and NB-ARC domain degeneration, contributing to functional diversification [11].

Epigenetic Regulation of NBS-LRR Genes

Beyond sequence-level evolution, epigenetic mechanisms play crucial roles in regulating NBS-LRR gene expression and functionality in Rosaceae species. DNA methylation, particularly at the 5th position of cytosine (5mC), represents an important epigenetic mark affecting gene expression and genome stability [83]. The establishment of 5mC DNA methylation involves RNA-directed DNA methylation (RdDM) pathways, which rely on plant-specific RNA polymerases Pol IV and Pol V [83]. Understanding these epigenetic regulatory mechanisms provides additional insights into how NBS-LRR genes are modulated in response to pathogen challenge and how epigenetic variation might contribute to disease resistance traits.

Future Directions in Rosaceae NBS-LRR Research

The integration of comparative genomics with functional studies represents the future of NBS-LRR research in Rosaceae. Several promising directions emerge:

Pan-genome Analyses: Construction of pan-genomes for key Rosaceae species will provide comprehensive catalogs of NBS-LRR gene diversity within species, revealing presence-absence variation and its relationship with disease resistance phenotypes.
Single-Cell Transcriptomics: Application of single-cell RNA sequencing to pathogen-infected tissues will elucidate cell-type-specific expression patterns of NBS-LRR genes and their activation in response to infection.
Structural Biology Approaches: Determination of NBS-LRR protein structures will advance understanding of pathogen recognition mechanisms and enable engineering of novel specificities.
Epigenome Editing: Utilization of CRISPR-dCas systems to modulate epigenetic marks at NBS-LRR loci may provide new strategies for enhancing disease resistance without altering coding sequences.

These approaches, combined with the methodological frameworks presented in this whitepaper, will accelerate the discovery and utilization of NBS-LRR genes in Rosaceae crop improvement, ultimately contributing to sustainable agricultural production and food security.

The nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family constitutes the largest class of plant resistance (R) proteins, serving as critical intracellular immune receptors that recognize pathogen effector proteins to initiate effector-triggered immunity (ETI) [86] [10]. These genes encode proteins characterized by a conserved nucleotide-binding site (NBS) domain and a C-terminal leucine-rich repeat (LRR) domain, with classification into TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), and RNL (RPW8-NBS-LRR) subfamilies based on their N-terminal domains [69] [7]. While extensive research has focused on model plants and crops, investigation of NBS-LRR genes in medicinal plants remains limited, despite their economic importance and unique evolutionary trajectories [86] [10].

Recent genomic studies have revealed substantial variation in NBS-LRR gene copy numbers and subfamily composition across angiosperms, with patterns suggesting associations between gene family dynamics and ecological adaptation [4] [12]. This whitepaper examines the specific pattern of NBS-LRR reduction in Salvia species, particularly the model medicinal plant Salvia miltiorrhiza (Danshen), within the broader context of NBS gene loss and gain across plant lineages. Through comprehensive genomic analysis and comparative phylogenetics, we elucidate the distinctive evolutionary path of Salvia species, which contrasts with the expansion patterns observed in many other plant families.

Results and Discussion

Genome-Wide Reduction of TNL and RNL Subfamilies in Salvia miltiorrhiza

Comprehensive genome-wide identification and analysis of NBS-LRR genes in Salvia miltiorrhiza reveals a marked reduction in specific subfamilies compared to other angiosperms. Through Hidden Markov Model (HMM) profiling with the NB-ARC domain (PF00931) and subsequent domain verification, researchers identified 196 genes containing the NBS domain, representing 0.42% of all annotated protein-coding genes [86] [10]. However, among these, only 62 possessed both complete N-terminal and LRR domains, classifying them as typical NLR proteins [86].

Table 1: NBS-LRR Gene Subfamily Distribution in Salvia miltiorrhiza

Subfamily	N-Terminal Domain	Number of Genes	Percentage of Typical NLRs
CNL	Coiled-coil (CC)	61	98.4%
RNL	RPW8	1	1.6%
TNL	TIR	0	0%

Phylogenetic analysis integrating NLRs from multiple plant species demonstrates that all SmNBS-LRR proteins cluster within the CNL clade, with the exception of a single RNL protein (SmNBS167) that groups with the Arabidopsis ADR1 protein [86] [10]. The complete absence of typical TNL subfamily members and extreme reduction of the RNL subfamily represents a distinctive evolutionary pattern not observed in most other eudicots [86] [7].

Comparative Genomic Analysis Across Plant Lineages

The reduction of specific NBS-LRR subfamilies observed in Salvia species reflects a broader pattern of lineage-specific evolution driven by ecological adaptation. Comparative analysis across land plants reveals tremendous variation in NLR gene content, differing up to 66-fold among closely related species due to rapid gene loss and gain events [12].

Table 2: NBS-LRR Subfamily Distribution Across Representative Plant Species

Plant Species	Family	Total NBS-LRR Genes	CNL	TNL	RNL	Genome Size
Salvia miltiorrhiza	Lamiaceae	196 (62 typical)	61	0	1	Moderate
Arabidopsis thaliana	Brassicaceae	207	~60	~40	~7	135 Mb
Oryza sativa	Poaceae	505	505	0	0	389 Mb
Nicotiana tabacum	Solanaceae	603	224	73	-	~3.5 Gb
Vernicia montana	Euphorbiaceae	149	98	12	-	~1.2 Gb
Triticum aestivum	Poaceae	2151	2151	0	0	~17 Gb

Notably, convergent NLR reduction has been associated with adaptations to specific ecological niches, including aquatic, parasitic, and carnivorous lifestyles [12]. The pattern observed in Salvia species parallels the complete absence of TNL and RNL subfamilies in monocotyledonous species such as Oryza sativa, Triticum aestivum, and Zea mays, suggesting potential convergent evolutionary mechanisms [86] [10]. Analysis across multiple Salvia species (S. miltiorrhiza, S. bowleyana, S. divinorum, S. hispanica, and S. splendens) confirms that none contain TNL subfamily members, with RNL subfamily limited to only one or two copies, significantly fewer than in other angiosperms such as Arabidopsis thaliana, Nicotiana tabacum, and Vitis vinifera [10].

Experimental Methodologies for NBS-LRR Gene Identification and Characterization

Genome-Wide Identification Pipeline

The identification and characterization of NBS-LRR genes follows a standardized bioinformatics workflow:

Genome Data Acquisition: Obtain complete genome assemblies and annotated protein sequences from relevant databases (NCBI, Phytozome, Plaza) [69] [4].
HMMER Search: Perform hidden Markov model searches using HMMER v3.1b2 with the NB-ARC domain model (PF00931) from the PFAM database [86] [69].
Domain Verification: Confirm additional domains (TIR: PF01582; LRR: PF00560, PF07723, PF07725, PF12779, PF13306, PF13516, PF13855, PF14580; CC: via NCBI CDD) using NCBI Conserved Domain Database and PFAM scans [69].
Classification: Categorize genes into subfamilies based on domain architecture (CNL, TNL, RNL, and atypical variants) [69] [7].
Phylogenetic Analysis: Construct phylogenetic trees using multiple sequence alignment (MUSCLE) and maximum likelihood methods (MEGA11 or FastTreeMP) with bootstrap validation [69] [4].

Functional Characterization Approaches

Expression Analysis: Utilize RNA-seq data from various tissues, stress conditions, and pathogen challenges to identify differentially expressed NBS-LRR genes [69] [87]. Process data through established pipelines (Hisat2 for alignment, Cufflinks/Cuffdiff for differential expression) [69].
Promoter Analysis: Identify cis-regulatory elements in upstream promoter regions associated with plant hormones (salicylic acid, jasmonic acid, ethylene) and stress responses [86] [87].
Protein-Protein Interaction: Investigate potential interactions with pathogen effectors and signaling components through yeast two-hybrid, co-immunoprecipitation, or computational docking studies [4] [87].
Functional Validation: Implement virus-induced gene silencing (VIGS) to knock down candidate NBS-LRR genes and assess changes in disease resistance phenotypes [4] [7].

Table 3: Key Research Reagents for NBS-LRR Gene Studies

Reagent/Resource	Specification	Application	Example Sources
HMM Profile PF00931	NB-ARC domain model	Initial gene identification	PFAM Database
CDD Search Tools	Conserved domain detection	Domain verification	NCBI Conserved Domain Database
MUSCLE Software	Multiple sequence alignment	Phylogenetic analysis	EMBL-EBI
MEGA11 Software	Phylogenetic tree construction	Evolutionary analysis	MEGA Software Team
RNA-seq Datasets	Tissue-specific, stress-induced	Expression profiling	NCBI SRA, IPF Database
VIGS Vectors	Tobacco rattle virus-based	Functional validation	AGRIKOLA, VIGS repositories
S-Nitrosylation Assay	Detection of NO modifications	Signaling studies	Commercial kits

Evolutionary Implications and Adaptive Significance

The dramatic reduction of TNL and RNL subfamilies in Salvia species represents an intriguing evolutionary trajectory that contrasts with the expansion patterns observed in many other plant lineages. Several non-exclusive hypotheses may explain this phenomenon:

Pathogen-Driven Selection: Specific pathogen pressures in the ecological niches occupied by Salvia species may favor CNL-mediated recognition, allowing for elimination of redundant TNL pathways [4] [12].
Genetic Trade-offs: Resource allocation constraints in perennial medicinal plants might favor retention of core immune receptors while eliminating genetically costly redundant systems, potentially redirecting resources toward secondary metabolite production [86] [10].
Signaling Pathway Co-evolution: Loss of specific NLR subfamilies may correlate with deficiencies in corresponding signaling components. Research has identified a co-evolutionary pattern between NLR subclasses and immune pathway components, suggesting that immune pathway deficiencies may drive TNL loss [12].
Genomic Constraints: Structural genomic features, such as retrotransposon distributions or chromosomal rearrangements, may facilitate biased gene loss in specific NLR subfamilies [7] [6].

The NBS-LRR gene family in plants typically evolves through a combination of whole-genome duplication (WGD) and small-scale duplication (SSD) events, with subsequent birth-and-death evolution creating lineage-specific patterns [4] [6]. In Rosaceae species, for instance, independent gene duplication and loss events have resulted in distinct evolutionary patterns including "first expansion and then contraction," "continuous expansion," and "early sharp expanding to abrupt shrinking" [6]. The Salvia pattern most closely resembles the "contracting" pattern observed in Poaceae species [6].

The reduction of TNL and RNL subfamilies in Salvia miltiorrhiza and related species represents a compelling example of lineage-specific evolution in plant immune gene families. This pattern, consistent with adaptations to specific ecological niches and potentially linked to the plant's investment in secondary metabolite production, offers insights into the evolutionary plasticity of plant immune systems.

Future research should focus on several key areas:

Functional Characterization: Despite bioinformatic identification of SmNBS-LRR genes, experimental validation of their specific roles in pathogen recognition and defense signaling remains limited [86].
Comparative Genomics: Expanded analysis across the Lamiaceae family will determine whether the observed reduction pattern is conserved or exhibits further lineage-specific variations [12].
Signaling Network Analysis: Investigation of potential compensatory mechanisms in CNL-mediated immunity that allow for loss of TNL functionality without compromising disease resistance [12] [87].
Metabolic Trade-offs: Examination of potential connections between immune gene repertoire reduction and enhanced production of valuable secondary metabolites in medicinal plants [86].

This research direction not only advances our understanding of plant immunity evolution but also provides potential applications in crop improvement and sustainable disease management strategies.

This whitepaper provides a comparative analysis of disease resistance mechanisms in two tung tree species, Vernicia fordii (susceptible) and Vernicia montana (resistant), against Fusarium wilt. The investigation centers on the role of nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes, the largest class of plant resistance (R) genes. A genome-wide study identified significant divergence in NBS-LRR gene composition between these species, revealing a specific candidate gene, Vm019719, that confers resistance in V. montana [3] [7]. Functional characterization demonstrated that the susceptibility of V. fordii is linked to a dysfunctional allele of this gene, Vf11G0978, attributed to a promoter mutation that disrupts its expression [3] [7]. These findings are contextualized within broader evolutionary patterns of NBS gene gain and loss across plant lineages, offering a resource for marker-assisted breeding and a model for understanding plant-pathogen co-evolution.

Plant immunity relies heavily on a sophisticated innate system where resistance (R) genes encode proteins that detect pathogenic effectors, triggering robust defense responses. The most prominent class of R genes encodes Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) proteins [88]. These modular proteins function as intracellular immune receptors:

N-terminal Domain: Dictates signaling pathway activation and classifies NBS-LRRs into major subfamilies: TNL (Toll/Interleukin-1 Receptor-like) or CNL (Coiled-Coil) [3] [88].
Central NBS Domain: Binds and hydrolyzes ATP/GTP, providing energy for conformational changes and activation [3] [7].
C-terminal LRR Domain: Confers recognition specificity through protein-protein interactions, often evolving under diversifying selection to detect changing pathogen effectors [3] [88].

NBS-LRR proteins detect pathogens via two primary mechanisms: direct interaction with pathogen effector molecules or indirect interaction by "guarding" host proteins modified by pathogen effectors [88]. This gene family exhibits remarkable dynamism, with its size and composition varying significantly between plant species due to processes like tandem duplication and gene loss, influencing the evolutionary trajectory of plant-pathogen interactions [4] [31].

Genomic Landscape of NBS-LRR Genes in Vernicia

A comparative genome-wide analysis of V. fordii and V. montana reveals fundamental differences in their NBS-LRR gene repertoire, which underlies their contrasting resistance phenotypes [3] [7].

Table 1: Comparative Genomic Profile of NBS-LRR Genes in Vernicia Species

Feature	*V. fordii* (Susceptible)	*V. montana* (Resistant)
Total NBS-containing genes	90	149
CC-NBS-LRR (CNL) genes	12	9
TIR-NBS-LRR (TNL) genes	0	3
NBS-LRR (NL) genes	12	12
Genes with CC domain	49 (54.4%)	98 (65.8%)
Genes with TIR domain	0	12 (8.1%)
LRR domain types	LRR3, LRR8	LRR1, LRR3, LRR4, LRR8
Presence of NBS-TNL class	Absent	Present

Key Divergences in Gene Family Architecture

The data reveals two critical divergences:

Species-Specific Gene Expansion: V. montana possesses a 65% larger NBS-encoding gene repertoire than V. fordii (149 vs. 90 genes) [3] [7]. This expansion provides a more diverse arsenal for pathogen recognition.
Domain Loss Events: The susceptible V. fordii genome shows a complete absence of TIR-domain-containing NBS-LRR (TNL) genes [3]. This represents a significant loss of a major NBS-LRR subclass, as TNL genes are common in eudicots and mediate specific defense signaling pathways. V. montana, in contrast, possesses 12 genes with TIR domains [3] [7]. Furthermore, V. fordii has lost two types of LRR domains (LRR1 and LRR4) found in V. montana, potentially reducing its spectrum of pathogen recognition [3] [7].

Functional Characterization of a Key Resistance Gene

Identification and Expression Profiling

Orthologous gene analysis identified 43 orthologous pairs between the two species. Among these, the pair Vf11G0978 (V. fordii) and Vm019719 (V. montana) exhibited starkly contrasting expression patterns during Fusarium oxysporum infection [3] [7]:

In resistant V. montana, Vm019719 showed upregulated expression.
In susceptible V. fordii, Vf11G0978 showed downregulated expression [3] [7].

This inverse correlation suggested this orthologous pair was a prime candidate responsible for the divergent resistance phenotypes.

Experimental Validation via Virus-Induced Gene Silencing (VIGS)

The function of Vm019719 was validated experimentally. When V. montana plants were subjected to Virus-Induced Gene Silencing (VIGS) targeting this gene, they became susceptible to Fusarium wilt [3] [7]. This loss-of-function experiment provided direct evidence that Vm019719 is necessary for resistance in V. montana.

Molecular Basis of Susceptibility: A Promoter Mutation

Investigating the cause of the low expression in V. fordii, researchers analyzed the promoter region of Vf11G0978. They discovered a deletion in the W-box element, a cis-regulatory motif known to be bound by WRKY transcription factors [3] [7]. In V. montana, the promoter of Vm019719 is activated by the transcription factor VmWRKY64 [3]. The deletion in the V. fordii allele disrupts this regulatory interaction, abolishing its pathogen-induced expression and rendering the plant susceptible.

Table 2: Summary of Key Experimental Findings for the Critical Orthologous Gene Pair

Characteristic	V. montana (Vm019719)	V. fordii (Vf11G0978)
Expression upon Infection	Upregulated	Downregulated
Gene Function	Confers resistance	Dysfunctional defense response
Functional Validation	VIGS silencing leads to susceptibility	Not applicable (already non-functional)
Promoter W-box	Intact	Deletion mutation
Regulation by WRKY	Activated by VmWRKY64	Not activated due to W-box deletion

Experimental Protocols for Key Methodologies

Genome-Wide Identification of NBS-LRR Genes

The methodology for identifying NBS-encoding genes is standardized and relies on the conserved NB-ARC domain [3] [18] [31].

Sequence Retrieval: Obtain the complete proteome files for the target plant species from genomic databases.
HMMER Search: Perform a Hidden Markov Model (HMM) search against the proteome using the NB-ARC domain (Pfam accession: PF00931) as a query. An E-value threshold of 1.0 is typically used [18] [31].
Domain Verification: Subject all candidate sequences to domain analysis using the Pfam database (E-value cutoff of 10^-4) to confirm the presence of the NBS domain [18] [31].
Classification: Analyze the confirmed NBS proteins for additional domains to classify them into subfamilies:
- TIR Domain: Checked via Pfam or NCBI CDD (Pfam: PF01582) [18].
- CC Domain: Predicted using the COILS program with a threshold of 0.9 [31].
- LRR Domain: Identified via Pfam (PF08191) [18].
Chromosomal Mapping: Map the physical locations of identified genes onto chromosomes using the genome annotation file to determine distribution and clustering.

Functional Validation by Virus-Induced Gene Silencing (VIGS)

VIGS is a powerful reverse-genetics tool for rapid functional analysis [3] [4].

Target Sequence Selection: A unique, gene-specific fragment (typically 200-500 bp) of the target gene (e.g., Vm019719) is selected.
Vector Construction: The target fragment is cloned into a VIGS vector (e.g., based on Tobacco Rattle Virus, TRV).
Plant Transformation: The recombinant vector is introduced into plant cells. For Agrobacterium tumefaciens-mediated transformation:
- Grow Agrobacterium harboring the VIGS vector.
- Resuspend bacterial cells in an induction medium (e.g., with acetosyringone).
- Infiltrate the suspension into the leaves or other tissues of the plant (e.g., V. montana seedlings).
Phenotypic Assessment: Inoculate the silenced plants with the pathogen (F. oxysporum). Monitor for the development of disease symptoms (wilting, chlorosis) in comparison to control plants (e.g., transformed with an empty vector).
Molecular Confirmation: Use quantitative PCR (qPCR) to measure the transcript level of the target gene in silenced plants to confirm a reduction in expression.

Visualizing the Experimental Workflow and Resistance Mechanism

The following diagram illustrates the integrated workflow from gene identification to functional validation, as applied in the Vernicia-Fusarium study.

Experimental Workflow for NBS-LRR Gene Characterization

The molecular mechanism conferring resistance in V. montana and its breakdown in V. fordii is summarized in the pathway below.

Molecular Basis of Resistance and Susceptibility in Vernicia

The Scientist's Toolkit: Key Research Reagents

Table 3: Essential Reagents and Resources for NBS-LRR Gene Research

Reagent/Resource	Function/Description	Example Use Case
HMM Profile (NB-ARC, PF00931)	Computational identification of NBS-domain-containing genes from genomic or proteomic data.	Initial genome-wide scan for NBS-LRR genes [3] [18].
VIGS Vector System	A viral vector (e.g., TRV-based) used to silence target genes for rapid functional analysis.	Validating the role of Vm019719 in Fusarium wilt resistance [3] [4].
WRKY Transcription Factor	A plant TF family that binds W-box elements in promoters to regulate gene expression, including defense genes.	Demonstrating transcriptional activation of Vm019719 by VmWRKY64 [3] [7].
Fusarium oxysporum Inoculum	A standardized spore suspension of the fungal pathogen used for controlled infection assays.	Phenotyping resistant (V. montana) and susceptible (V. fordii) genotypes [3] [89].

The comparative analysis of Vernicia fordii and Vernicia montana provides a compelling model of how the evolution of a specific NBS-LRR gene directly determines a disease resistance trait. The susceptibility of V. fordii is not due to a wholesale lack of R genes but to precise gene loss events (TNL class absence) and regulatory mutations (promoter W-box deletion in Vf11G0978) that cripple its defense response [3] [7]. This case study underscores that resistance is a positive trait conferred by functional genes like Vm019719, which can be harnessed for crop improvement.

This research exemplifies a broader evolutionary paradigm where the NBS-LRR gene family undergoes dynamic gains and losses, shaping the resistance profile of plant lineages [4] [31]. The identification of Vm019719 offers a direct resource for marker-assisted breeding in tung trees. Furthermore, the integrated methodology—combining comparative genomics, transcriptomics, and VIGS validation—serves as a blueprint for uncovering resistance genes in other non-model crop species, accelerating the development of durable disease resistance in a changing climate.

The nucleotide-binding site leucine-rich repeat (NBS-LRR) gene family constitutes the largest and most important class of plant disease resistance (R) genes, providing plants with the capacity to recognize diverse pathogens through effector-triggered immunity. Within this family, genes are classified into distinct subfamilies based on their N-terminal domains: TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL). A remarkable evolutionary pattern has emerged from comparative genomic analyses: TNL genes are consistently absent from cereal genomes and largely missing from monocots as a whole, despite their prevalence in dicot species [90] [91].

This absence represents a fundamental genetic divergence between monocot and dicot lineages with significant implications for plant immunity mechanisms. The TNL loss phenomenon provides a compelling model for studying how different evolutionary pressures shape genome content and immune system architecture across plant lineages. Within the broader context of NBS gene loss and gain across plant lineages, the specific disappearance of TNLs from cereals offers insights into the dynamic nature of plant genome evolution in response to ecological and genetic constraints [12].

Comparative Genomic Evidence of TNL Distribution

Documented Patterns of TNL Absence Across Monocots

Table 1: Distribution of NBS-LRR Subclasses Across Representative Plant Species

Species	Classification	Total NBS-LRR Genes	TNL Count	CNL Count	RNL Count	Reference
Oryza sativa (rice)	Monocot	505-587	0	505	Not specified	[10] [90]
Zea mays (maize)	Monocot	306	0	306	Not specified	[90]
Triticum aestivum (wheat)	Monocot	2,747	0	2,747	Not specified	[90]
Setaria italica (foxtail millet)	Monocot	535	0	535	Not specified	[90]
Dioscorea rotundata (yam)	Monocot	167	0	166	1	[92]
Arabidopsis thaliana	Dicot	149-207	~100	~49-107	Not specified	[10] [91]
Solanum melongena (eggplant)	Dicot	269	36	231	2	[33]
Salvia miltiorrhiza	Dicot	196	2	75 (CC)	1	[10]
Pinus taeda (pine)	Gymnosperm	311	~278 (89.3%)	Not specified	Not specified	[10]

Genomic analyses across multiple species consistently demonstrate the absence of TNL genes in monocot genomes. Studies of 12 grass species confirmed that TNL genes are "almost nonexistent in monocots" [90]. Research on Dioscorea rotundata (white Guinea yam) identified 167 NBS-LRR genes, with 166 belonging to the CNL subclass and only one to RNL, while "none of the TNL genes were detected in the D. rotundata genome, which is consistent with reports of other monocot genomes that all lack TNL genes" [92]. Similarly, studies of rice, maize, and wheat genomes have identified hundreds of NBS-LRR genes, all belonging to the CNL subclass without any TNL representatives [90] [91].

In contrast, dicot species consistently maintain significant TNL populations. Arabidopsis thaliana contains approximately twice as many TNL as CNL genes, while Solanum melongena (eggplant) possesses 36 TNL genes alongside 231 CNL genes [33] [91]. Even in dicot species with reduced TNL representation, such as Salvia miltiorrhiza which contains only 2 TNL genes compared to 75 CNL genes, the TNL subclass persists rather than being entirely absent [10].

Evolutionary Timeline of TNL Loss

The evolutionary history of TNL genes suggests they originated prior to the separation of bryophytes and vascular plants more than 500 million years ago [93]. Phylogenetic evidence indicates TNL genes were present in early land plants, including gymnosperms like Pinus taeda, where they represent 89.3% of typical NBS-LRR genes [10]. The current distribution pattern suggests that TNL loss occurred specifically in the monocot lineage after its divergence from dicots approximately 100-200 million years ago [91].

Two evolutionary stages have been proposed for NBS-LRR gene evolution. Stage I featured both CNL and TNL genes with broad specificity that evolved before angiosperm-gymnosperm divergence (~200 mya). Stage II involved gene duplication and diversification after monocot-dicot separation (~100 mya), leading to TNL degeneration in cereals [91]. This timeline is supported by the absence of TNL sequences not only in Poales (cereals) but across most monocot orders, including Zingiberales, Arecales, Asparagales, and Alismatales [91].

Molecular Mechanisms Underlying TNL Loss

Genomic Deletion and Pseudogenization

The absence of TNL genes in monocots likely resulted from large-scale genomic deletion events rather than mere functional divergence. Comprehensive genomic searches have failed to identify even pseudogenized TNL remnants in most monocot genomes, suggesting thorough elimination [91]. This pattern represents a dramatic example of lineage-specific gene family contraction driven by distinct evolutionary pressures between monocot and dicot lineages [12].

Recent studies have revealed that NLR contraction is particularly associated with specific ecological adaptations. "NLR contraction was associated with adaptations to aquatic, parasitic, and carnivorous lifestyles" [12]. The convergent NLR reduction in aquatic plants resembles the lack of NLR expansion during the long-term evolution of green algae before land colonization, suggesting that certain ecological contexts may reduce reliance on this immune pathway.

Co-Evolution with Signaling Components

Evidence suggests TNL loss may be linked to co-evolutionary patterns with downstream signaling components. A comparative genomics study identified "a co-evolutionary pattern between NLR subclasses and plant immune pathway components," suggesting that "immune pathway deficiencies may drive TNL loss" [12]. TNL proteins typically signal through ENHANCED DISEASE SUSCEPTIBILITY 1 (EDS1) family proteins, while CNLs often utilize NON-RACE-SPECIFIC DISEASE RESISTANCE (NDR1) signaling pathways [91].

The absence of TNL genes in monocots may reflect the loss or modification of essential TNL signaling components, creating selective pressure against maintaining non-functional resistance genes. This represents a compelling example of how the integrity of signaling networks can constrain the evolution of receptor gene families.

Experimental Approaches for Studying NBS-LRR Evolution

Genome-Wide Identification Protocols

Protocol 1: HMMER-Based Identification of NBS-LRR Genes

The standard methodology for comprehensive identification of NBS-LRR genes employs Hidden Markov Model (HMM) profiles to detect conserved protein domains [92] [33]. The detailed workflow includes:

Domain Profile Acquisition: Obtain the HMM profile for the NB-ARC domain (PF00931) from the Pfam database or the InterPro database.
Initial Gene Discovery: Perform a genome-wide search using HMMER tools (hmmsearch or HMMER3) with the NB-ARC domain profile against the target proteome. Standard parameters include an E-value threshold of < 10⁻¹⁰ to ensure comprehensive identification [92].
Redundancy Removal and Validation: Eliminate redundant hits and validate the presence of characteristic NBS-LRR domains using:
- Pfam database (http://pfam.xfam.org/) for NBS (PF00931), LRR (PF13855), TIR (PF01582), and RPW8 (PF05659) domains
- SMART database (http://smart.embl-heidelberg.de/) for domain architecture verification
- NCBI Conserved Domain Database (CDD) for additional validation
Classification: Categorize identified genes into subclasses based on N-terminal domains:
- CC domain prediction using COILS (http://toolkit.tuebingen.mpg.de/pcoils) with a threshold E-value of 0.9
- TIR domain verification via Pfam and SMART
- RPW8 domain identification using dedicated HMM profiles
Manual Curation: Manually inspect domain organization and remove partial or questionable sequences to generate a final high-confidence set.

This methodology has been successfully applied across diverse species, from dicots like eggplant (identifying 269 NBS-LRR genes) to monocots like white Guinea yam (167 NBS-LRR genes) [92] [33].

Phylogenetic and Evolutionary Analysis Methods

Protocol 2: Evolutionary Analysis of NBS-LRR Genes

Comparative evolutionary analysis requires reconstruction of gene family relationships across multiple species:

Sequence Alignment: Extract NBS domain sequences from identified genes and perform multiple sequence alignment using MAFFT or MUSCLE with default parameters.
Phylogenetic Reconstruction: Construct phylogenetic trees using maximum likelihood (RAxML) or neighbor-joining (MEGA) methods with appropriate substitution models and 1000 bootstrap replicates to assess node support [90].
Synteny Analysis: Identify conserved syntenic blocks across related species using MCScanX or similar tools to distinguish orthologous from paralogous relationships.
Selection Pressure Analysis: Calculate non-synonymous (Ka) to synonymous (Ks) substitution rates for syntenic gene pairs to identify signatures of selection:
- Ka/Ks > 1 indicates positive selection
- Ka/Ks ≈ 1 indicates neutral evolution
- Ka/Ks < 1 indicates purifying selection
Gene Gain/Loss Reconstruction: Map gene duplication and loss events onto species phylogenies using computational frameworks like NOTUNG or custom parsimony-based approaches.

This phylogenetic framework enables researchers to determine whether TNL absence represents ancestral loss or derived state, and to identify evolutionary rates differences between NBS-LRR subfamilies [6].

Figure 1: Evolutionary Timeline of TNL Loss in Monocots. The diagram illustrates the proposed evolutionary pathway leading to TNL absence in cereal genomes, with loss occurring after monocot-dicot divergence.

Table 2: Essential Research Reagents and Resources for NBS-LRR Evolutionary Studies

Category	Specific Tool/Resource	Application	Key Features
Database Resources	Plant DNA C-values Database (https://cvalues.science.kew.org/)	Genome size reference	Contains genome size data for 10,770 angiosperm species [94]
	Genome Database for Rosaceae (https://www.rosaceae.org/)	Comparative genomics	Curated genomic data for Rosaceae family species [6]
	ANNA (Angiosperm NLR Atlas, https://biobigdata.nju.edu.cn/ANNA/)	NLR-specific database	NLR genes from 300+ angiosperm genomes [12]
Bioinformatics Tools	HMMER Suite	Domain identification	Hidden Markov Model-based protein domain search [92] [33]
	Pfam Database (pfam.xfam.org)	Domain verification	Curated collection of protein domain families [33]
	MCScanX	Synteny analysis	Detection of collinear blocks across genomes [90]
	MEME Suite	Motif discovery	Identification of conserved protein motifs [92]
Experimental Materials	Reference genomes (Arabidopsis, rice, maize)	Comparative standards	Well-annotated model organism genomes [90]
	Pan-genome collections	Diversity capture	Multiple genomes from single species [93]

Biological Implications and Future Research Directions

The absence of TNL genes in cereal genomes represents more than a curious genomic anomaly—it reflects fundamental differences in immune system architecture between monocots and dicots. This divergence has several significant biological implications:

First, the TNL loss suggests cereals rely exclusively on CNL-mediated immunity pathways, potentially constraining their immune recognition capabilities. This specialization may reflect distinct evolutionary pressures experienced by monocots, possibly related to their unique pathogen exposures or developmental constraints [90]. Despite this reduction in receptor diversity, cereals have maintained robust disease resistance through expansion and diversification of their CNL repertoires, as evidenced by the 505 CNL genes identified in rice [10].

Second, this evolutionary pattern demonstrates the remarkable plasticity of plant immune systems. Different plant lineages have arrived at distinct genomic solutions to pathogen defense, with some species maintaining balanced TNL/CNL repertoires while others specialize in CNL-only recognition systems [6]. This suggests multiple evolutionary stable strategies exist for constructing effective immune networks.

Future research should focus on several key areas:

Functional compensation mechanisms: How do CNL genes in cereals compensate for the missing TNL functions?
Ecological correlates: Are there specific environmental factors that predispose lineages to TNL loss?
Signaling network rewiring: How have downstream signaling pathways adapted to TNL absence in monocots?
Biotechnological applications: Can engineering TNL pathways into cereals provide novel resistance capabilities?

Figure 2: Comparative Immune System Architecture in Monocots and Dicots. The diagram illustrates the simplified CNL-only immune system in monocots compared to the dual CNL/TNL system in dicots, highlighting potential differences in downstream signaling pathways.

The absence of TNL genes in cereal genomes represents a definitive case of lineage-specific gene family contraction during plant evolution. This pattern, consistently observed across monocot species, underscores the dynamic nature of plant genome evolution and the diverse strategies employed by different lineages to construct effective immune systems. Within the broader context of NBS gene loss and gain across plant lineages, the TNL loss in cereals exemplifies how ecological adaptation, co-evolution with signaling components, and distinct evolutionary pressures can dramatically reshape genomic content.

Understanding this monocot-dicot divergence provides fundamental insights into plant evolutionary biology while offering practical knowledge for crop improvement strategies. As genomic resources continue to expand across diverse plant taxa, the patterns and mechanisms underlying TNL loss will illuminate broader principles of immune system evolution, potentially guiding future efforts to enhance disease resistance in economically important cereal crops.

Associations Between NBS-LRR Evolution and Secondary Metabolism

The Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family represents the largest class of disease resistance (R) genes in plants, enabling the recognition of diverse pathogen effectors and the activation of robust immune defenses [95]. Recent research has illuminated the complex evolutionary dynamics of this gene family, including frequent gene loss, domain degeneration, and lineage-specific expansion [9] [11] [3]. Concurrently, plants deploy a chemical arsenal of specialized secondary metabolites to combat pathogens. While both systems are crucial for plant immunity, the potential connections between the evolution of NBS-LRR receptors and the regulation of secondary metabolic pathways remain an emerging frontier. This review synthesizes current evidence to explore these associations, framing the discussion within the broader context of NBS gene loss and gain across plant lineages.

Genomic and Evolutionary Dynamics of the NBS-LRR Family

Diversity and Distribution Across Plant Lineages

NBS-LRR genes are characterized by a conserved nucleotide-binding site (NBS) domain and a C-terminal leucine-rich repeat (LRR) domain [95]. Based on their N-terminal domains, they are classified into three major subfamilies: TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), and RNL (RPW8-NBS-LRR) [96] [33]. Genomic analyses reveal striking variation in the number and distribution of these genes across plant species, influenced by both evolutionary history and selective pressures.

Table 1: NBS-LRR Gene Distribution Across Plant Species

Plant Species	Total NBS/NBS-LRR Genes	CNL	TNL	RNL	Key Evolutionary Features	Reference
Arabidopsis thaliana	149-207	55	94	-	Balanced TNL/CNL representation	[95] [86]
Oryza sativa (Rice)	505-653	~653	0	-	Complete absence of TNL genes	[95] [86]
Salvia miltiorrhiza	196 (62 typical)	61	0	1	Marked reduction of TNL/RNL	[86]
Solanum melongena (Eggplant)	269	231	36	2	Uneven chromosomal distribution	[33]
Vernicia montana (Tung Tree)	149	98 (CC-containing)	3	-	Presence of rare CC-TIR-NBS type	[3]
Dendrobium officinale	74	10	0	-	NBS gene degeneration common	[11]
Perilla citriodora	535	104 (CC-containing)	-	1	One unique RPW8-type gene	[96]

Patterns of Gene Loss, Gain, and Domain Degeneration

The evolutionary trajectory of the NBS-LRR family is marked by significant gene turnover and structural diversification. A prominent pattern is the differential loss of the TNL class in monocots, including cereals like rice and maize, and its reduction in some eudicots like Salvia miltiorrhiza and Vernicia fordii [11] [3] [86]. In the orchid genus Dendrobium, NBS genes frequently exhibit type changing and NB-ARC domain degeneration [11]. Furthermore, the loss of LRR domains has been documented. Cultivated peanut (Arachis hypogaea cv. Tifrunner) has fewer LRR domains in its NBS-LRR proteins compared to its wild diploid donors, which may partly explain its lower disease resistance [9]. Similarly, the susceptible tung tree V. fordii lacks LRR1 and LRR4 domains present in its resistant counterpart, V. montana [3]. These degenerative events are often counterbalanced by mechanisms for generating diversity, such as tandem gene duplication, which is a primary driver for the expansion of NBS-LRR genes in eggplant and other species [33].

Linking NBS-LRR Evolution to Secondary Metabolism

Evidence from Genomic and Transcriptomic Analyses

Direct genomic evidence connecting NBS-LRR evolution to secondary metabolism is nascent but growing. A pivotal study in the medicinal plant Salvia miltiorrhiza (Danshen) revealed that the expression of specific SmNBS-LRR genes is closely associated with the production of bioactive secondary metabolites, including tanshinones and phenolic acids [86]. This suggests a potential co-regulation of defense recognition systems and the biosynthesis of antimicrobial compounds. Similarly, in Dendrobium officinale, a medicinal orchid known for its polysaccharides, flavonoids, and alkaloids, transcriptomic analysis following salicylic acid (SA) treatment identified six NBS-LRR genes that were significantly upregulated [11]. One of these genes, Dof020138, was found to be a hub connected to pathogen identification pathways, MAPK signaling, plant hormone signal transduction, and crucially, biosynthetic pathways and energy metabolism pathways [11]. This positions certain NBS-LRR genes as potential nodes integrating immune perception with the metabolic reprogramming necessary for secondary metabolite production.

Proposed Signaling Hubs and Convergent Pathways

The interaction between NBS-LRR-mediated immunity and secondary metabolism is likely mediated through shared signaling pathways. A key player is salicylic acid (SA), a phytohormone central to systemic acquired resistance. The upregulation of NBS-LRR genes by SA treatment in D. officinale provides a direct link [11]. Furthermore, promoter analyses of SmNBS-LRR genes in S. miltiorrhiza have identified an abundance of cis-acting elements related to plant hormones and abiotic stress [86], suggesting that the expression of these immune receptors is tuned by hormonal cues that also regulate metabolic pathways.

Table 2: Key Signaling Components and Metabolic Pathways

Component/Pathway	Function in Immunity	Proposed Link to Secondary Metabolism	Evidence
Salicylic Acid (SA)	Activates Systemic Acquired Resistance	Induces expression of biosynthetic genes for antimicrobial compounds	Upregulates NBS-LRRs in D. officinale [11]
MAPK Signaling	Transduces immune signals	Phosphorylates and activates metabolic enzymes	Dof020138 connected to MAPK pathways [11]
WRKY Transcription Factors	Regulate expression of defense genes	Bind promoters of secondary metabolite gene clusters	VmWRKY64 activates Vm019719 in tung tree [3]
EDS1/PAD4/ADR1 Hub	Central signaling node for TNL/CNL immunity	Potential regulator of metabolic shifts	SmNBS167 clusters with ADR1 [86]

The diagram below illustrates a proposed model of how NBS-LRR activation could be linked to secondary metabolism through shared signaling components.

Experimental Methodologies for Investigation

Genome-Wide Identification and Evolutionary Analysis

A standardized pipeline for identifying NBS-LRR genes is employed across species, leveraging the conserved nature of the NBS domain [33] [86].

Protocol 1: Identification and Classification of NBS-LRR Genes

HMMER Search: Use HMMER software (http://hmmer.org) with the Hidden Markov Model (HMM) of the NB-ARC domain (PF00931) as a query to scan the proteome of the target species. Standard parameters include an E-value cutoff of 10⁻⁵ to 10⁻²⁰ [96] [33].
Domain Verification: Submit candidate genes to domain databases (Pfam, SMART) to verify the presence of key domains (LRR: PF13855, TIR: PF01582, RPW8: PF05659). The CC domain is typically identified using COILS with a threshold E-value of 0.9 [33].
Classification and Phylogenetics: Classify genes into CNL, TNL, RNL, and atypical subtypes based on domain composition. Perform multiple sequence alignment with tools like MAFFT and construct a phylogenetic tree using maximum likelihood methods in IQ-TREE with 1000 bootstrap replicates [96] [86].
Synteny and Duplication Analysis: Use MCScanX to identify syntenic genomic blocks and categorize gene duplication events (tandem, segmental, whole-genome) that drive NBS-LRR family expansion [96].

Functional Characterization and Linkage to Metabolism

Following identification, functional experiments are crucial to validate the role of candidate NBS-LRR genes and connect them to metabolic outputs.

Protocol 2: Functional Validation via Virus-Induced Gene Silencing (VIGS)

The VIGS technique, as utilized to confirm the role of Vm019719 in Fusarium wilt resistance in tung tree [3], can be adapted to probe metabolic links.

Vector Construction: Clone a 200-300 bp specific fragment of the target NBS-LRR gene into a VIGS vector (e.g., TRV2).
Plant Inoculation: Transform the recombinant vector into Agrobacterium tumefaciens and infiltrate the bacteria into young leaves or seedlings of the test plant.
Pathogen Challenge & Metabolite Analysis:
- After confirming gene silencing, challenge plants with the target pathogen.
- Simultaneously, collect tissue samples for metabolite profiling.
- Use techniques like LC-MS/MS to quantify changes in key secondary metabolites (e.g., tanshinones in S. miltiorrhiza) in silenced vs. control plants.
Phenotypic Recording: Assess disease symptoms and correlate with both the silencing efficiency of the NBS-LRR gene and the altered metabolic profile.

Protocol 3: Expression Profiling and Cis-Element Analysis

Treatments: Expose plants to hormonal elicitors (e.g., Salicylic Acid, Methyl Jasmonate) or pathogens, and collect tissue at multiple time points [11] [33].
qRT-PCR: Design gene-specific primers. Perform qRT-PCR with a reference gene (e.g., Actin) to analyze the expression dynamics of NBS-LRR genes.
Promoter Analysis: Extract 1500-2000 bp sequences upstream of the start codon of target NBS-LRR genes. Use platforms like PlantCARE to identify hormone-responsive (e.g., SA, JA) and stress-responsive cis-elements [86].
Correlation Analysis: Correlate the expression patterns of NBS-LRR genes with transcriptomic data of key secondary metabolite biosynthetic genes.

The Scientist's Toolkit: Key Research Reagents and Solutions

Table 3: Essential Reagents for Investigating NBS-LRR Genes and Secondary Metabolism

Category	Specific Item/Kit	Function in Research	Example Use
Bioinformatics Tools	HMMER Suite (HMMsearch/HMMscan)	Initial identification of NBS-domain containing proteins from proteomes.	[9] [3] [33]
	Pfam, SMART Databases	Verification of protein domains (NB-ARC, LRR, TIR, RPW8).	[11] [96] [33]
	MCScanX	Analysis of gene synteny and duplication events.	[96]
	MEME Suite	Identification of conserved protein motifs.	[96]
Molecular Biology	TRIzol/Plant RNA Kits	High-quality RNA extraction for expression studies.	[11] [33]
	qRT-PCR Kits (e.g., SYBR Green)	Quantitative analysis of gene expression patterns.	[3] [33]
	VIGS Vectors (e.g., TRV1, TRV2)	Functional characterization through transient gene silencing.	[3]
Analytical Chemistry	LC-MS/MS Systems	Quantification and identification of secondary metabolites.	Proposed for metabolite profiling
	Salicylic Acid, Methyl Jasmonate	Hormonal elicitors to simulate defense response and study gene expression.	[11] [86]

The evolution of the NBS-LRR gene family, characterized by pervasive gene loss, domain degeneration, and lineage-specific expansion, is intricately linked to plant adaptation. Emerging evidence suggests that this evolutionary narrative extends beyond pathogen recognition to encompass the regulation of plant secondary metabolism. The co-expression of specific NBS-LRR genes with biosynthetic pathways in medicinal plants like S. miltiorrhiza and D. officinale, and their connection through central signaling hubs, reveals a potential co-evolution of receptor diversity and chemical defense arsenals. Future research should prioritize functional studies that simultaneously manipulate NBS-LRR gene expression and quantify metabolic outputs. Exploring the transcriptional networks that connect immune receptors to the promoters of metabolic genes will be crucial. Understanding these associations will not only deepen fundamental knowledge of plant immunity but also provide novel strategies for engineering disease-resistant crops and enhancing the production of valuable medicinal compounds.

Conclusion

The evolutionary dynamics of NBS-LRR genes represent a fundamental adaptive strategy in plant-pathogen arms races. Evidence from diverse plant lineages reveals that independent gene duplication and loss events, rather than simple vertical inheritance, shape the resistance repertoire of modern plants. Distinct evolutionary patterns—from the 'consistent expansion' in some Rosaceae species to the dramatic 'contraction' observed in medicinal Salvia—highlight the lineage-specific nature of this adaptation. The frequent loss of entire subfamilies, particularly TNLs in monocots and some eudicots, demonstrates the plasticity of plant immune systems. Future research should leverage pan-genome analyses and multi-omics integration to resolve the full NBS-LRR diversity within species. For biomedical and agricultural applications, understanding these evolutionary principles enables smarter strategies for durable disease resistance, whether through marker-assisted breeding, genomic selection, or synthetic biology approaches to engineer optimized immune receptors. The functional characterization of key orthogroups conserved across plant lineages presents particularly promising targets for broad-spectrum resistance engineering.