• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Bowdoin Science Journal

  • Home
  • About
    • Our Mission
    • Our Staff
  • Sections
    • Biology
    • Chemistry and Biochemistry
    • Math and Physics
    • Computer Science and Technology
    • Environmental Science and EOS
    • Honors Projects
    • Psychology and Neuroscience
  • Contact Us
  • Fun Links
  • Subscribe

Jenna Lam

Biological ChatGPT: Rewriting Life With Evo 2

May 4, 2025 by Jenna Lam

What makes life life? Is there underlying code that, when written or altered, can be used to replicate or even create life? On February 19th 2025, scientists from Arc Institute, NVIDIA, Stanford, Berkeley, and UC San Francisco released Evo 2, a generative machine learning model that may help answer these questions. Unlike its precursor Evo 1, which was released a year earlier, Evo 2 is trained on genomic data of eukaryotes as well as prokaryotes. In total, it is trained on 9.3 trillion nucleotides from over 130,000 genomes, making it the largest AI model in biology. You can think of it as ChatGPT for creating genetic code—only it “thinks” in the language of DNA rather than human language, and it is being used to solve the most pressing health and disease challenges (rather than calculus homework).

Computers, defined broadly, are devices that store, process, and display information. Digital computers, such as your laptop or phone, function based on binary code—the most basic form of computer data composed of 0s and 1s, representing a current that is on or off. Evo 2 centers around the idea that DNA functions as nature’s “code,” which, through protein expression and organismal development, creates “computers” of life. Rather than binary, organisms function according to genetic code, made up of A, T, C, G, and U–the five major nucleotide bases that constitute DNA and RNA.

Although Evo 2 can potentially design code for artificial life, it has not yet designed an entire genome and is not being used to create artificial organisms. Instead, Evo 2 is being used to (1) predict genetic abnormalities and (2) generate genetic code.

11 Functions of Evo 2 in biology at the cellular/organismal, protein, RNA, and epigenome levels.
Functions of Evo 2 at different levels. Adapted from https://www.biorxiv.org/content/10.1101/2025.02.18.638918v1.full

Accurate over 90% of the time, Evo 2 can predict which BRCA1 (a gene central to understanding breast cancer) mutations are benign versus potentially pathogenic. This is big, since each gene is composed of hundreds and thousands of nucleotides, and any mutation in a single nucleotide (termed a Single Nucleotide Variant, or SNV) could have drastic consequences for the protein structure and function. Thus, being able to computationally pinpoint dangerous mutations reduces the amount of time and money spent testing each mutation in a lab, and paves the way for developing more targeted drugs.

Secondly, Evo 2 can design genetic code for highly specialized and controlled proteins which provide many fruitful possibilities for synthetic biology (making synthetic molecules using biological systems), from pharmaceuticals to plastic-degrading enzymes. It can generate entire mitochondrial genomes, minimal bacterial genomes, and entire yeast chromosomes–a feat that had not been done yet.

A notable perplexity of eukaryotic genomes is their many-layered epigenomic interactions: the complex power of the environment in controlling gene expression. Evo 2 works around this by using models of epigenomic structures, made possible through inference-time scaling. Put simply, inference-time scaling is a technique developed by NVIDIA that allows AI models to take time to “think” by evaluating multiple solutions before selecting the best one.

How is Evo 2 so knowledgeable, despite only being one year old? The answer lies in deep learning.

Just as in Large Language Models, or LLMs (think: ChatGPT, Gemini, etc.), Evo 2 decides what genes should look like by “training” on massive amounts of previously known data. Where LLMs train on previous text, Evo 2 trains on entire genomes of over 130,000 organisms. This training—the processing of mass amounts of data—is central to deep learning. In training, individual pieces of data called tokens are fed into a “neural networks”—a fancy name for a collection of software functions that are communicate data to one another. As their name suggests, neural networks are modeled after the human nervous system, which is made up of individual neurons that are analogous to software functions. Just like brain cells, “neurons” in the network can both take in information and produce output by communicating with other neurons. Each neural network has multiple layers, each with a certain number of neurons. Within each layer, each neuron sends information to every neuron in the next layer, allowing the model to process and distill large amounts of data. The more neurons involved, the more fine-tuned the final output will be. 

This neural network then attempts to solve a problem. Since practice makes perfect, the network attempts the problem over and over; each time, it strengthens the successful neural connections while diminishing others. This is called adjusting parameters, which are variables within a model that can be adjusted, dictating how the model behaves and what it produces. This minimizes error and increases accuracy. Evo 2 was trained with 7b and 40b parameters to have a 1 million token context window, meaning the genomic data was fed through many neurons and fine-tuned many times.

Example neural network
Example neural network modeled using tensorflow adapted from playground.tensorflow.org

The idea of anyone being able to create genetic code may spark fear; however, Evo 2 developers have prevented the model from returning productive answers to inquiries about pathogens, and the data set was carefully chosen to not include pathogens that infect humans and complex organisms. Furthermore, the positive possibilities of Evo 2 usage are likely much more than we are currently aware of: scientists believe Evo 2 will advance our understanding of biological systems by generalizing across massive genomic data of known biology. This may reveal higher-level patterns and unearth more biological truths from a birds-eye view.

It’s important to note that Evo 2 is a foundational model, emphasizing generalist capabilities over task-specific optimization. It was intended to be a foundation for scientists to build upon and alter for their own projects. Being open source, anyone can access the model code and training data. Anyone (even you!) can even generate their own strings of genetic code with Evo Designer. 

Biotechnology is rapidly advancing. For example, DNA origami allows scientists to fold DNA into highly specialized nanostructures of any shape–including smiley faces and China–potentially allowing scientists to use DNA code to design biological robots much smaller than any robot we have today. These tiny robots can target highly specific areas of the body, such as receptors on cancer cells. Evo 2, with its designing abilities, opens up many possibilities for DNA origami design. From gene therapy, to mutation-predictions, to miniature smiley faces, it is clear that computation is becoming increasingly important in understanding the most obscure intricacies of life—and we are just at the start.

 

Garyk Brixi, Matthew G. Durrant, Jerome Ku, Michael Poli, Greg Brockman, Daniel Chang, Gabriel A. Gonzalez, Samuel H. King, David B. Li, Aditi T. Merchant, Mohsen Naghipourfar, Eric Nguyen, Chiara Ricci-Tam, David W. Romero, Gwanggyu Sun, Ali Taghibakshi, Anton Vorontsov, Brandon Yang, Myra Deng, Liv Gorton, Nam Nguyen, Nicholas K. Wang, Etowah Adams, Stephen A. Baccus, Steven Dillmann, Stefano Ermon, Daniel Guo, Rajesh Ilango, Ken Janik, Amy X. Lu, Reshma Mehta, Mohammad R.K. Mofrad, Madelena Y. Ng, Jaspreet Pannu, Christopher Ré, Jonathan C. Schmok, John St. John, Jeremy Sullivan, Kevin Zhu, Greg Zynda, Daniel Balsam, Patrick Collison, Anthony B. Costa, Tina Hernandez-Boussard, Eric Ho, Ming-Yu Liu, Thomas McGrath, Kimberly Powell, Dave P. Burke, Hani Goodarzi, Patrick D. Hsu, Brian L. Hie (2025). Genome modeling and design across all domains of life with Evo 2. bioRxiv preprint doi: https://doi.org/10.1101/2025.02.18.638918.

 

Filed Under: Biology, Computer Science and Tech, Science Tagged With: AI, Computational biology

Auspicious Algae: Using Diatoms to make Disease-fighting Human Antibodies

December 8, 2024 by Jenna Lam

Arrangement of diatoms (art by Klaus Kemp)

Besides appearing like a lovely spread for an I Spy book, the above image holds many scientific secrets and perhaps solutions. Diatoms, known as “jewels of the sea,” are a type of single-celled phytoplankton (aka algae) that create their own glass shell and produce at least 20% of Earth’s atmospheric oxygen. And perhaps, they can contribute to the treatment of many different diseases, including cancer.

As promising specimens of microalgae, they have been co-opted by the biotech industry for their ability to make complex lipids, sugars, and even proteins through a process called recombinant production. Traditionally, these molecules are made through classical systems such as yeasts, bacteria, and other single-celled organisms that are easy to manipulate in a lab. Microalgae, a more recent biotech specimen, is more efficient because it can produce its own energy from sunlight and air alone through photosynthesis, whereas other cells must be fed carbon. Thus, algae propose the possibility of a solar-powered system that can manufacture specific proteins with high efficiency. In 2012, microbiologists Franziska Hempel and Uwe G Maier modified the diatom P. tricornutum through recombinant production to make IgG antibodies, a protein that immune cells use to fight foreign pathogens in the blood.

Diatom expression of antibodies (illustration by author)

To understand how recombinant production works, we’ll look at the central dogma of molecular biology—a name both dramatic and apt. In short, the central dogma states that proteins are made in cells through the flow of information from DNA to a protein. DNA, the keeper of all protein “instructions,” is copied into RNA, the messenger which carries this information to ribosomes, the actual protein “factories.” From here, ribosomes translate the information in the RNA into the form of protein. After this, the protein is modified (post-translational modifications) to be sent off or used within the cell. This entire process—DNA information being transformed into proteins—is called gene expression.

Because proteins are made from whatever information is in DNA, biotechnologists discovered that by altering DNA, you can also alter the proteins created. In recombinant production, foreign DNA (DNA from another cell) is inserted into a host cell’s DNA  (the cell that is making the protein). Through the central dogma, this results in the expression of genes from the foreign DNA to make specific proteins. In the diatom-antibody experiment, Hempel and Maier injected the human DNA  for making CL4mAb IgG antibodies (a type of protein used by the immune system) into diatom DNA, so that human DNA will be expressed into IgG antibodies by the diatom. You can think of DNA as the instructions to make the antibodies, and the diatom as the machine. Once new protein instructions are injected into the machine’s existing instructions, the machine will begin to create the new proteins based on the instructions. The kind of protein produced depends on the specific instructions, on the specific segments of foreign DNA inserted into the diatom’s DNA.

In using diatoms to make recombinant proteins, Hempel and Maier made five promising discoveries:

1) The diatom P. tricornutum very efficiently produces antibodies, accounting for a significant 9% (efficient in the biotech world)  of total soluble protein.

2) It secretes antibodies directly into the extracellular medium . This is a big economic advantage because the cells don’t need to be lysed (broken) to harvest the product.

3) Diatoms don’t naturally secrete many proteins, so the secreted antibodies are already very pure.

4) The antibodies are fully assembled and functional. In fact, the diatom has mechanisms to guarantee that only fully assembled antibodies can leave the cell . This makes it act virtually like a human plasma cell, an immune cell that secretes antibodies. This ability is absent in other recombinant producing species, such as bacteria.

5) The antibodies are stable for at least 2 days. When the diatoms become unproductive, they can easily be stimulated again when the culture medium is replaced.

Due to these findings, diatoms and other species of microalgae on the whole present great economic and scientific potential for making antibodies as well as other proteins. When tested against the Hepatitis B virus, the IgG antibodies were proven functional.

Why would scientists want to make antibodies anyway? In the naturally functioning human body, antibodies are proteins secreted by plasma cells that bind to antigens (specific protein receptors) on the surface of germs and other harmful foreign cells, rendering them harmless. Laboratory-made antibodies—such as the antibodies created by diatoms—are also known as monoclonal antibodies (mAbs) and have similar applications. In Hempel and Maier’s study, specific IgG antibodies were made to target the Hepatitis B virus. In other instances, mAB shape can be modified to bind to certain targets, such as antigens on cancer cells, viruses, and other bacteria. Because antibodies are proteins that bind to receptors unique to specific cells, they are also used to locate certain cells. For example, monoclonal antibodies are used in identifying where there is cancer in the body and even in carrying drugs to cancer cells.

Thus, the efficiency of monoclonal antibody production, as demonstrated in the diatom experiment, is key in treating specific ailments on a microscopic level. Currently, mammal cells are used for 60-70% of recombinant pharmaceuticals, but cultivation is expensive (due to having to feed them) and there’s always the risk of pathogenic contamination. Algae, if modified to be as efficient as mammalian cells, may prove to be a more economically and sustainably suitable alternative. They perform very well in producing recombinant proteins, without needing to be fed. Additionally, any aquarium owner knows that they grow at rapid rates.

It is no secret that global cancer rates have been on the rise. These growing biotechnological methods allows scientists to creatively explore different possibilities of treatment, from nanotechnology to photodynamic therapy, to our beloved monoclonal antibodies. Solutions may be found everywhere, from the tiniest protons to the inconspicuous jewel of the sea. And so the search continues!

 

Sources

Hempel, F., & Maier, U. G. (2012, September 13). An engineered diatom acting like a plasma cell secreting human IGG antibodies with high efficiency – microbial cell factories. BioMed Central. https://microbialcellfactories.biomedcentral.com/articles/10.1186/1475-2859-11-126

NCI Dictionary of Cancer terms. Comprehensive Cancer Information – NCI. (n.d.). https://www.cancer.gov/publications/dictionaries/cancer-terms/def/monoclonal-antibody

 

 

Filed Under: Biology

Primary Sidebar

CATEGORY CLOUD

Biology Chemistry and Biochemistry Computer Science and Tech Environmental Science and EOS Honors Projects Math and Physics Psychology and Neuroscience Science

RECENT POSTS

  • Biological ChatGPT: Rewriting Life With Evo 2 May 4, 2025
  • Unsupervised Thematic Clustering for Genre Classification in Literary Texts May 4, 2025
  • Motor Brain-Computer Interface Reanimates Paralyzed Hand May 4, 2025

FOLLOW US

  • Facebook
  • Twitter

Footer

TAGS

AI AI ethics Alzheimer's Disease antibiotics artificial intelligence bacteria Bathymetry Beavers Biology brain Cancer Biology Cell Biology Chemistry and Biochemistry Chlorofluorocarbons climate change Computer Science and Tech CRISPR Cytoskeleton Depression dreams epigenetics Ethics Genes honors Luis Vidali Marine Biology Marine Mammals Marine noise Medicine memory Montreal Protocol Moss neurobiology neuroscience Nutrients Ozone hole Plants Psychology and Neuroscience REM seabirds sleep student superintelligence Technology therapy

Copyright © 2025 · students.bowdoin.edu