What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

De Novo Protein Design Using Computational Structures

Introduction

De novo protein design aims to create artificial proteins with predetermined structures and functions, moving beyond the limited repertoire of natural proteins [1]. Over the past two decades, the field has transitioned from minimal rational design to sophisticated computational pipelines that leverage deep learning and structural databases [1, 2]. These advances now enable the construction of hyperstable, functional proteins from scratch, with applications in diagnostics, therapeutics, and synthetic biology [3]. For veterinary medicine, the ability to design binding proteins that neutralize toxins or target pathogens offers new avenues for controlling infectious diseases in livestock and companion animals [4, 5].

The fundamental challenge of de novo design lies in navigating the vast sequence-structure space: an astronomical number of amino acid sequences encode stable folds, and even more encode functional interactions [6, 3]. Computational approaches address this by using inverse folding (sequence design from a backbone), backbone generation (creating novel backbone geometries), and iterative validation through structure prediction [3, 7]. This article details the key components of modern de novo protein design pipelines, focusing on deep learning methods for backbone generation, sequence design, and structural validation, with illustrative examples from veterinary applications.

Backbone Generation: From Random Noise to Foldable Geometries

The first step in de novo design is to obtain a protein backbone that will adopt a stable, desired fold. Historically, backbones were either extracted from existing protein structures or constructed using parametric modeling (e.g., coiled-coil geometries) [1, 8]. More recently, generative deep learning models have enabled the creation of entirely novel backbones without prior structural templates [3]. One such method, RFdiffusion, extends the RoseTTAFold architecture to denoise protein backbone coordinates starting from random noise [9]. This diffusion-based approach iteratively refines a random point cloud into a realistic protein backbone, conditioned on target features such as desired symmetries or binding site geometries.

RFdiffusion has been used to design macrocyclic peptides (RFpeptides) that bind to specific protein targets with high affinity [9]. The method generates backbone geometries that, after sequence design, adopt structures closely matching the computational models (Cα RMSD < 1.5 Å) [9]. Similarly, the inversion of AlphaFold2 (AF2) has been used to generate sequences that fold into target backbones, although initial designs often show excessive hydrophobicity on the surface, requiring post-design optimization [6]. Another approach, RamaNet, uses a long short-term memory (LSTM) network trained on ϕ-ψ angles from helical proteins to generate novel helical backbone topologies [10]. While these generated backbones are not perfect, they provide a starting point for further idealization and sequence design [10].

The biophysical principles underlying successful backbone generation include maintaining proper backbone dihedral angles (Ramachandran preferences), avoiding steric clashes, and ensuring that the backbone geometry allows for efficient packing of side chains in the core [11, 12]. In membrane protein design, negative design principles are critical to prevent premature folding of β-hairpins that could lead to off-target aggregates [11]. For transmembrane β-barrels, the local destabilization of certain turns is necessary to allow correct membrane insertion [11]. These constraints inform the training of generative models and the scoring functions used to filter candidate backbones.

Inverse Folding: Sequence Design from a Backbone

Given a target backbone, the next step is to find an amino acid sequence that will fold into that backbone with high stability and, if needed, desired functionality. This is known as inverse folding or structure-based sequence design [3, 13]. The most widely used deep learning tool for this purpose is ProteinMPNN, a message-passing neural network that learns to predict sequences given backbone coordinates [13]. ProteinMPNN achieves high sequence recovery rates and has been used to design sequences for a wide variety of folds, including α-helical barrels, β-barrels, and mixed α/β structures [8, 11, 13].

ProteinMPNN operates by encoding the local geometric environment of each residue (including distances, angles, and hydrogen bonding patterns) and then decoding a probability distribution over amino acids at each position [13]. The design process can be conditioned on fixed residues (e.g., catalytic residues or binding motifs) to preserve function. The output sequences are then scored using energy functions (Rosetta) or structure prediction networks (AlphaFold) to assess how well they are predicted to adopt the target fold [6, 14, 13]. Other inverse folding methods include RosettaDesign (which uses Monte Carlo sampling of rotamers) and ProBID-Net, a deep learning model specialized for designing protein-protein interfaces [14, 13, 12].

For veterinary applications, inverse folding has been used to design mini-protein binders against Clostridioides difficile toxin B (TcdB) [4]. Starting from a computationally designed backbone that complements the toxin's receptor-binding interface, the authors used ProteinMPNN to generate sequences that bind with affinities ranging from 20 pM to 10 nM [4]. Cryo-electron microscopy structures confirmed that the designed binders adopted the predicted conformations [4]. Similarly, de novo designed binders to Helicobacter pylori adhesin BabA have been reported, though detailed experimental characterization is still ongoing [5].

Structural Validation: Computational and Experimental Verification

A critical step in any de novo design pipeline is the validation that the designed sequence actually folds into the intended structure and, if applicable, binds its target. Computational validation typically involves using structure prediction networks such as AlphaFold2 or RoseTTAFold to predict the structure of the designed sequence and compare it to the design model [6, 15, 16]. Metrics such as predicted local distance difference test (pLDDT) and predicted TM-score (pTM) provide confidence estimates [16]. For complexes, interchain predicted TM-score (iPTM) is used to assess binding [17, 15].

Several studies have shown that AlphaFold2 predictions correlate well with experimental structures for de novo designed proteins [6, 15, 16]. For example, the AfCycDesign method uses a cyclic offset to AlphaFold2 to predict structures of cyclic peptides, and crystal structures of eight designed peptides matched the models with RMSD < 1.0 Å [15]. Similarly, the Protein CREATE pipeline uses AlphaFold-Multimer to pre-screen thousands of designed binders before experimental testing [17]. However, computational metrics are imperfect: iPTM does not guarantee experimental binding, and false positives can occur [17].

Experimental validation methods include circular dichroism (CD) spectroscopy to assess secondary structure content [18], thermal denaturation to measure stability (Tm) [6], size exclusion chromatography to confirm monodispersity [8], and high-resolution structural techniques such as X-ray crystallography, NMR spectroscopy, or cryo-electron microscopy [4, 19, 11, 18]. NMR structures have been solved for several de novo designed disulfide-rich peptides, showing close agreement with design models [18]. For binder design, surface plasmon resonance (SPR) or biolayer interferometry (BLI) are used to measure binding affinities [4, 20, 21].

Table 1 summarizes key computational and experimental validation methods used in de novo protein design.

Validation Method	Technique	Information Provided	Example References
Structure prediction (AF2, RoseTTAFold)	Neural network	Predicted structure, pLDDT, iPTM	[6, 15, 16]
Circular dichroism (CD)	Spectroscopy	Secondary structure content	[18]
Thermal denaturation	Fluorescence or CD	Melting temperature (Tm)	[6, 8]
Size exclusion chromatography	Chromatography	Monodispersity, oligomeric state	[8, 11]
X-ray crystallography	Diffraction	Atomic-level structure	[4, 19, 11, 21]
NMR spectroscopy	Magnetism	Solution structure, dynamics	[11, 18]
Cryo-electron microscopy	Electron imaging	Near-atomic structure of complexes	[4]
Surface plasmon resonance/BLI	Optical biosensor	Binding kinetics and affinity	[4, 20, 21]

Integration into a Computational Design Workflow

A typical de novo protein design pipeline integrates the steps described above into an iterative cycle. The workflow begins with target selection and definition of the design objective (e.g., bind a specific surface on a viral protein, or create a stable nanopore). Figure 1 illustrates a general decision tree for de novo design.

flowchart TD
    A[Define target and constraints], > B{Backbone source?}
    B, >|Known fold| C[Parametric modeling or database search]
    B, >|Novel fold| D[Generative backbone model <br/>(RFdiffusion, RamaNet)]
    C, > E[Backbone refinement and filtering]
    D, > E
    E, > F[Inverse folding <br/>(ProteinMPNN, RosettaDesign)]
    F, > G[Computational validation <br/>(AF2, Rosetta energy)]
    G, > H{Valid?}
    H, >|No| I[Iterate: adjust backbone, modify sequence]
    H, >|Yes| J[Experimental expression and characterization]
    J, > K[High-resolution structure determination]
    K, > L[Functional testing]

The design cycle often requires multiple iterations. For example, in the design of transmembrane β-barrels, the initial designs failed to fold, leading the authors to incorporate negative design of β-hairpins, which dramatically improved success rates [11]. Similarly, the design of α-helical barrels required careful computational loop building and sequence optimization to achieve soluble, monomeric proteins [8]. The most successful pipelines generate hundreds to thousands of designs computationally and then screen them using experimental methods [20, 21].

Veterinary and Agricultural Applications

De novo protein design holds significant potential for veterinary medicine. One prominent application is the design of binding proteins that neutralize bacterial toxins. Lv et al. (2024) designed pan-specific mini-protein binders that neutralize multiple subtypes of Clostridioides difficile toxin B (TcdB), a major virulence factor [4]. The designed binders showed affinities down to 20 pM and protected cells and mice from toxin challenge [4]. This approach could be extended to other veterinary pathogens, such as Pasteurella multocida (causative agent of fowl cholera and avian cholera in waterfowl) or Clostridium perfringens (necrotic enteritis in broilers) [2, 22]. For a detailed discussion of bacterial diseases in poultry, see the companion articles on infectious coryza, necrotic enteritis, and avian cholera.

Another area is the design of proteins that can act as diagnostic reagents. De novo designed binders against specific epitopes on viral or bacterial antigens could replace monoclonal antibodies in ELISA or lateral flow assays, offering advantages in stability and cost [23, 24]. The ability to design proteins that undergo conformational changes in response to pH (e.g., histidine-mediated switches) could be exploited for controlled release of therapeutics in the gastrointestinal tract of livestock [22].

Furthermore, de novo design can be used to create synthetic receptors for chimeric antigen receptor (CAR) T-cell therapy in veterinary oncology [25]. While still early, the design of protein binders that recognize tumor-associated antigens on canine or feline cancers could lead to new immunotherapies [25, 26].

Challenges and Future Directions

Despite remarkable progress, several challenges remain. The principal limitation is that computational metrics are not fully predictive of experimental success [17]. Many designs that appear good in silico fail to express or fold in vivo [6, 11]. Additionally, the current methods are largely static; they do not account for protein dynamics, which are critical for function [3]. The design of enzymes, in particular, requires precise positioning of catalytic residues and conformational flexibility, which remains difficult [14, 27].

Another challenge is the design of proteins that are functional in complex environments such as the gut or blood, where proteolysis, aggregation, and off-target interactions can occur [25]. To address this, closed-loop design pipelines that combine computational generation with high-throughput experimental characterization (e.g., phage display coupled with next-generation sequencing) are being developed [17]. These pipelines allow generative models to be updated based on experimental data, improving design success rates iteratively [17].

Future directions include the integration of protein language models for sequence generation without explicit structural templates [28], the expansion of design to include non-canonical amino acids and post-translational modifications, and the creation of large, programmable protein materials [7]. For veterinary applications, the development of de novo designed vaccines that display conserved epitopes from multiple pathogen strains could revolutionize disease control in flocks and herds [11]. The combination of de novo design with cryo-electron microscopy and deep learning will continue to accelerate the field.

Conclusion

De novo protein design using computational structures has matured into a robust discipline capable of producing stable, functional proteins with atomic-level accuracy. The integration of deep learning methods for backbone generation (RFdiffusion), inverse folding (ProteinMPNN), and structural validation (AlphaFold2) has dramatically increased the success rate and accessibility of design pipelines. In veterinary medicine, these tools offer the potential to create custom binders, diagnostics, and therapeutics for a wide range of animal pathogens. As computational methods improve and closed-loop design becomes routine, the gap between in silico predictions and experimental outcomes will narrow, enabling the routine engineering of proteins for veterinary applications.

References

[1] D. Woolfson, "A brief history of de novo protein design: minimal, rational, and computational," Journal of Molecular Biology, 2021. URL

[2] K. Kohli, S. Liu, A. Garapaty et al., "Barreling Forward: Advancements in De Novo Protein Design," The FASEB Journal, 2020. URL

[3] H. Liu, Q. Chen, "Computational protein design with data‐driven approaches: Recent developments and perspectives," WIREs Computational Molecular Science, 2022. URL

[4] X. Lv, Y. Zhang, K. Sun et al., "De novo design of mini-protein binders broadly neutralizing Clostridioides difficile toxin B variants," Nature Communications, 2024. URL

[5] Y. Zhu, M. Isah, X. Zhang, "De novo design of binder proteins targeting Helicobacter pylori adhesin BabA," bioRxiv, 2026. URL

[6] C. A. Goverde, B. Wolf, H. Khakzad et al., "De novo protein design by inversion of the AlphaFold structure prediction network," bioRxiv, 2022. URL

[7] S. Wang, A. J. Ben-Sasson, "Precision materials: Computational design methods of accurate protein materials," Current Opinion in Structural Biology, 2022. URL

[8] K. I. Albanese, R. Petrenas, F. Pirro et al., "Rationally seeded computational protein design of α-helical barrels," bioRxiv, 2023. URL

[9] S. A. Rettie, D. Juergens, V. Adebomi et al., "Accurate de novo design of high-affinity protein-binding macrocycles using deep learning," Nature Chemical Biology, 2025. URL

[10] S. Sabban, M. Markovsky, "RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network," bioRxiv, 2019. URL

[11] A. Vorobieva, P. White, B. Liang et al., "De novo design of transmembrane β-barrels," Science, 2021. URL

[12] D. Sciretti, P. Bruscolini, A. Pelizzola et al., "Computational protein design with side‐chain conformational entropy," Proteins: Structure, Function, and Bioinformatics, 2009. URL *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.

[13] Z. Chen, M. Ji, J. Qian et al., "ProBID-Net: a deep learning model for protein–protein binding interface design," Chemical Science, 2024. URL

[14] J. Huang, X. Xie, Z. Zheng et al., "De Novo Computational Design of a Lipase with Hydrolysis Activity towards Middle-Chained Fatty Acid Esters," International Journal of Molecular Sciences, 2023. URL

[15] S. A. Rettie, K. V. Campbell, A. Bera et al., "Cyclic peptide structure prediction and design using AlphaFold2," Nature Communications, 2025. URL

[16] T. Kosugi, M. Ohue, "Design of Cyclic Peptides Targeting Protein–Protein Interactions Using AlphaFold," bioRxiv, 2023. URL

[17] A. Lourenço, A. M. Subramanian, R. Spencer et al., "Protein CREATE enables closed-loop design of de novo synthetic protein binders," bioRxiv, 2025. URL

[18] G. W. Buchko, S. Pulavarti, V. Ovchinnikov et al., "Cytosolic expression, solution structures, and molecular dynamics simulation of genetically encodable disulfide‐rich de novo designed peptides," Protein Science, 2018. URL

[19] J. Karanicolas, J. Corn, I. Chen et al., "A de novo protein binding pair by computational design and directed evolution," Molecules and Cells, 2011. URL

[20] L. Cao, B. Coventry, I. Goreshnik et al., "Robust de novo design of protein binding proteins from target structural information alone," bioRxiv, 2021. URL

[21] L. Cao, B. Coventry, I. Goreshnik et al., "Design of protein-binding proteins from the target structure alone," Nature, 2022. URL

[22] S. Boyken, M. Benhaim, F. Busch et al., "De novo design of tunable, pH-driven conformational changes," Science, 2019. URL

[23] B. Yu, J. Liu, Z. Cui et al., "De novo design of light-responsive protein–protein interactions enables reversible formation of protein assemblies," Nature Chemistry, 2025. URL

[24] J. Ferrando, L. A. Solomon, "Recent Progress Using De Novo Design to Study Protein Structure, Design and Binding Interactions," Life, 2021. URL

[25] A. Chow, H. Chu, R. Li et al., "Sequence and structural determinants of efficacious de novo chimeric antigen receptors," bioRxiv, 2025. URL

[26] C. Zhu, C. Zhang, T. Zhang et al., "Rational design of TNFα binding proteins based on the de novo designed protein DS119," Protein Science, 2016. URL

[27] B. Gibney, C. Tommos, "De Novo Protein Design in Respiration and Photosynthesis," Journal, 2005. URL

[28] L. Chen, Z. Quinn, M. Dumas et al., "Target sequence-conditioned design of peptide binders using masked language modeling," Nature Biotechnology, 2025. URL

[29] C. A. Goverde, M. Pacesa, N. Goldbach et al., "Computational design of soluble and functional membrane protein analogues," Nature, 2024. URL

[30] C. Truong-Quoc, K. Jeon, J. Kim et al., "De novo design of DNA origami with a generative diffusion model," Nature Communications, 2026. URL

[31] J. P. O. Lima, A. M. da Fonseca, G. S. Marinho et al., "De novo design of bioactive phenol and chromone derivatives for inhibitors of Spike glycoprotein of SARS-CoV-2 in silico," 3 Biotech, 2023. URL

[32] G. Zhang, X. Wang, L. Ma et al., "Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction," IEEE/ACM Transactions on Computational Biology & Bioinformatics, 2020. URL

[33] B. Zheng, Z. Lu, S. Wang et al., "Computational design of superstable proteins through maximized hydrogen bonding," Nature Chemistry, 2025. URL

[34] S. Kaltofen, C. Li, P. Huang et al., "Computational de novo design of a self-assembling peptide with predefined structure," Journal of Molecular Biology, 2015. URL

[35] C. A. Goverde, M. Pacesa, N. Goldbach et al., "Computational design of soluble functional analogues of integral membrane proteins," bioRxiv, 2024. URL