NVIDIA BioNeMo позволяет дообучать биологические AI-модели методом LoRA за часы
NVIDIA выпустила BioNeMo Recipes — готовые пайплайны для дообучения фундаментальных биологических AI-моделей методом LoRA. Два флагмана: ESM2 (белки) и Evo 2…
AI-processed from NVIDIA Developer Blog; edited by Hamidun News
NVIDIA BioNeMo released a set of ready-made "recipes" for fine-tuning foundational biological models using LoRA (Low-Rank Adaptation). The toolkit allows research teams to adapt large language models for proteins and DNA to specific scientific tasks without supercomputing resources.
Foundational Models in Biology
Computational biology is undergoing a transformation similar to what NLP experienced with BERT. Models pretrained on billions of biological sequences capture statistical patterns that are poorly described by classical rules but well captured by transformers. BioNeMo Recipes works with two flagship models.
ESM2 — a language model for proteins from Meta, trained on UniRef50. It analyzed hundreds of millions of amino acid sequences and learned to predict structural and functional properties of proteins; versions range from 8 million to 15 billion parameters.
Evo 2 — a language model for DNA from Arc Institute, trained on 9.3 trillion nucleotides from genomes of 128,000 species. It predicts functional regulatory elements and models the consequences of genomic mutations.
Both classes of models transfer well to specialized tasks: protein function annotation, prediction of subcellular localization, assessment of variant pathogenicity. But full fine-tuning of such models is expensive and time-consuming.
Why LoRA Changes the Calculation
LoRA instead of updating all weights adds compact low-rank matrices to transformer layers — the remaining parameters are frozen. Only these small insertions pass through backpropagation.
Key numbers for biological models:
- Number of trainable parameters reduces by 90–99%
- ESM2 with 3 billion parameters with LoRA fits on 1–2 GPUs instead of dozens of A100s
- Experiment cost drops from thousands to single dollars per GPU-hour
- Training duration reduces from weeks to several hours
- Quality on narrow specialized tasks is comparable to full fine-tuning
For biology this is fundamentally important: laboratory datasets are often small — hundreds or thousands of examples. Fine-tuning the entire large model on such volume leads to loss of generalization, whereas LoRA with a minimal number of new parameters performs noticeably better.
What's in BioNeMo Recipes
BioNeMo Recipes is a set of ready-made configuration pipelines with documentation, examples, and tests. A researcher selects a model, task, and dataset, after which the recipe automatically sets hyperparameters, configures weight loading, and logging.
The toolkit includes:
- Support for LoRA and full fine-tuning for ESM2 and Evo 2
- Integration with NVIDIA NeMo Framework and DGX infrastructure
- Ready-made formats for protein and genomic datasets
- Logging through Weights & Biases and automatic checkpoints
Target audience — biomedical groups and pharmaceutical companies that need to specialize the model for a specific organism, protein type, or disease. Typical tasks: prediction of therapeutic protein toxicity, search for functional sites in the genome, assessment of variant pathogenicity, design of enzymes with specified properties.
What This Means
BioNeMo Recipes lowers the barrier to entry for teams without large ML infrastructure. A pharmaceutical laboratory or academic group with two GPUs can now specialize a foundational model for their conditions. This accelerates the path from scientific hypothesis to computational tool — and, in perspective, from discovery to therapy.
*Meta is recognized as an extremist organization and is banned in the Russian Federation.
Need AI working inside your business — not just in your newsfeed?
I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.