Scientists are approaching the first near-atomic simulations of whole cells

Author:Murphy  |  View: 22633  |  Time: 2025-03-23 19:36:19

Advances in technology and the rise of data science have led to an exciting new era in biological research, as scientists use computational methods to gain a deeper understanding of the inner workings of cells. In fact, computer-based data analysis and mathematical modeling is today ubiquitous in chemistry, Biology, and actually all branches of science. Molecular dynamics simulations, in particular, have been instrumental in revealing the movements and interactions of individual atoms, providing crucial insights into cellular processes. However, these simulations have traditionally been limited to small systems due to their computational demands. Recent breakthroughs, as I present here, are close to enabling the first computational simulations of whole cells, offering a groundbreaking opportunity to model the tiniest complete biological units at near-atomic level. This new approach, rooted in pure physics, is set to revolutionize our understanding of cells and their complex behavior, opening up new avenues for research and discovery in data-driven biology. In this article, I present and discuss the mathematical modeling and data science behind the very first model of a protocell, on which scientists have sampled molecular motions and interactions inside cell-like membrane compartments through computer simulations; and a recent perspective article that shows how integrative approaches can couple -omics data with physics-based simulations to model the smallest cells at full complexity.


Introduction

Computational simulations of (living) matter

Molecular dynamics simulations, a kind of computer simulation in which a piece of matter is described as it evolves over time trying to emulate realistic physics, are an incredibly powerful tool for understanding the inner workings of intracellular machines. These simulations allow researchers to study the movements and interactions of individual atoms over time, providing insights into the molecular basis of cellular processes. However, these simulations are extremely computationally demanding, so they have traditionally been limited to studying small systems, such as individual proteins or just relatively groups of them, as well as small sections of cell membranes.

But technology improves, allowing faster computations that are now enabling the first simulations of whole cells, the tiniest complete biological units. And two recent peer-reviewed articles present daunting amounts of work headed in the direction of simulating whole cells at near-atomic level.

Towards modeling and simulating whole cells in the computer by blending physics and -omics data

In a recent perspective article published in the journal Frontiers in Chemistry, Stevens et al outline how an integrative approach can be used to model an entire cell at full complexity. Indeed, the researchers show how to model a minimal cell, called JCVI-syn3A, at full complexity. (As an interesting but side note, JCVI-syn3A is the minimal cell created by the J. Craig Venter Institute by stripping down a Mycoplasma bacterium, which is itself one of the smallest bacteria that we know about.)

Molecular dynamics simulation of an entire cell

Briefly, the workflow devised by the authors to build computational whole-cell models consists in merging cryo-electron tomography images as a source of 3D information about intracellular architectures at low resolution with experimental or pre-simulated 3D structures of nucleic acids, proteins, lipids and metabolites, plus -omics data describing the numbers of molecules of each kind found in a real cell. All the data is put together by combining mesoscale Modeling with the 3D structures and models, which provide high-resolution, near-atomistic information, plus quantitative data about intracellular processes and compositions. In the last step of the procedure, MARTINI models are generated for all the cellular components using various tools of the MARTINI software packages, and they are then put together in three dimensions considering their appropriate abundances as described by the -omics data.

To give the reader a sense of the sizes and numbers of molecules involved, the model of JCVI-syn3A reported in the article includes over 60,000 individual soluble proteins plus over 2200 protein complexes embedded in the cell envelope ("membrane") and over 500 ribosomes (each of which includes several proteins and RNA molecules), a piece of circular DNA made up of over half a million base pairs, 1.3 million lipids making up the membrane, 1.7 metabolites (small molecules floating around the cytoplasm), and 447 million water beads plus 14 million ions that would constitute the "aqueous" phase inside and outside the cell model. These numbers add up to 561 million MARTINI beads, representing around 6 billion atoms. The size of the resulting cell model is around 0.4 micrometers, just around that of actual JCVI-syn3A cells or the smallest bacterial cells.

You can see this model in a video posted by the first author of the paper:

The first actual simulations of small cell-like compartments

Besides showing how to build up a "static" model of a model cell, Stevens et al discuss how such a model can be used as a starting point for the simulation of the spatiotemporal evolution of the cell's interior using molecular dynamics simulations. They explain that having a starting model for JCVI-syn3A at hand, the current challenge is to perform an actual simulation to see how the system evolves over time. For the moment this is far from trivial, given the huge sizes of even these minimal cells.

One possible solution, although still out of reach, relies in using coarse-grained molecular dynamics simulations, which group atoms into "beads" thus making the systems smaller and simplifying the numbers of calculations involved in each time step of the simulation. One such coarse-grained model is MARTINI, which is developed and maintained by the group of scientists who wrote the perspective, and was recently used by Vermaas et al in a study published in the Journal of Chemical Information and Modeling to develop an efficient workflow for constructing still small but at least cell-scale membrane envelopes, and embedding membrane proteins into them.

You can read this at https://pubs.acs.org/doi/10.1021/acs.jcim.1c01050

Vermaas et al used the MARTINI model to construct two "protocells" consisting of coarse-grained beads to represent cellular membranes, one the size of a cellular organelle and another the size of a small bacterial cell. They then propagated the motions of the MARTINI beads that make up these systems over time, resulting in what is probably the first near-atomistic simulation of a cell-like system.

Details for the geeks: what was actually simulated?

The diameters of the simulated protocells were around 40 nm for the smaller, and around 200 nm for the larger, being around the same order of magnitude as the cell model built and reported in the other study. (Actually given the cubic scaling of volume with length, the 200 nm protocell is around 8 times smaller than the 400 nm, so strictly speaking this is close to one order of magnitude smaller… but probably this can be pardoned at the time, given the very cutting-edge nature of the study).

The two systems were simulated for 500 ns (nanoseconds) in an NPT ensemble, meaning at constant pressure and temperature. The simulations were run with GROMACS, the standard program for MARTINI simulations. Running each of these 500 ns of simulation entailed around 25 million propagations of Newton's equations of motions in time steps of 20 fs (femtoseconds). At each step, the program had to take the positions of all the beads and calculate the net force resulting on each of them from the interactions with all neighbors; then it had to compute the resulting accelerations and integrate them up to velocities and to new positions. I have shown you earlier that there are specialized computers that speed up this algorithm and many of the intermediate calculations that I skipped for simplicity; however, these computers can't handle systems of this type and size:

A family of specialized supercomputers that simulates molecular mechanics like no other

What can we already learn from these simulations?

Leaving that comment aside, the important point is that 500 ns is a decent amount of time by current standards given the system sizes, being "just" perhaps 2 two orders of magnitude less than what you'd normally report these days for a MARTINI simulation on a system of size typically used to study individual macromolecules. Still, 500 ns of MARTINI simulation should be enough to observe some diffusion, membrane stability, and certainly water permeation. Satisfyingly, the simulations reported by Vermaas et al showed that the membrane envelopes of their simulated "protocells" remained stable over time and exhibited water flux only through specific proteins, demonstrating the success of the methodology in creating tight cell-like membrane compartments. The authors also detected substantial lateral diffusion of proteins embedded in the cell membrane, correlating inversely with protein radius as expected. This diffusion resulted in the formation of (presumably) nonspecific interactions between adjacent membrane proteins, leading to the formation of protein microclusters on the cell surface. Although knowing how well this all reflects actual biology needs to be addressed, many proteins are indeed known to cluster at membranes. And it is important to stress that seeing this is in the simulations is enabled by their scales.

If you are curious and want to see these models yourself, you can download the key files from the authors' repository at Zenodo:

Input Data for "Assembly and Analysis of Cell-Scale Membrane Envelopes"

What this all means to biology

Coming from the Physics side, the development of these cell-scale models is an important step toward achieving a more complete understanding of how cells function, thus of great interest to biologists. When the simulation of entire cells becomes fully possible, researchers will be able to explore the complex interplay between the different cellular components and gain new insights into the molecular basis of cellular processes. As exemplified in the perspective article by Stevens et al, such simulations will bridge the gap between the hardcore data-centric subdisciplines of biology and bioinformatics, with the physics-centric world of physically realistic simulations -and with the huge amount of structural data generated by modern machine learning methods like AlphaFold.

What the future might bring up

In the near future, as computational power continues to increase and new Simulation methods are developed, we can expect that someday atomic simulations of entire cells will be feasible, and then a few years later just commonplace. It's not a matter of "if" but about "when", and it comes together with lots of other questions that I won't delve into here (for example but not limited to balancing system size vs. the simulation time required to actually answer the equation). Such simulations will open up new avenues for discovery in the field of cellular biology that are only science fiction today. For example, we could study if and how candidate drugs permeate through the membrane into the cell to reach their targets (or off-targets), how metabolites move around the cell, how the cytoplasm organizes, how proteins engage specific vs. nonspecific interactions, etc.

And all this is without including any chemical reactivity (bond breaking and formation), which requires even more complex simulations than atomistic or coarse-grained ones; in fact, these so-called quantum simulations demand so much computational power that today we can't even think of running them on the scales of whole cells. While this could someday happen, hybrid schemes that blend simulations at multiple levels of resolution (for example coarse-grained, atomistic, and quantum, for different atoms) are more likely to be applied first to whole cells; and even this is still quite far.

Some final thoughts

In an era where machine learning seems to be permeating all the natural sciences and performing far better than regular mathematical models and algorithms rooted in fundamental knowledge, it is interesting to see how these simulations grounded in basic physics can help advance biology in ways never thought of until now. Reading these new articles and writing this blog entry made me feel like "the return of basic science", as opposed to the burst in machine learning methods that often outperform physics-based models but without providing obvious tangible clues about how they achieve such good modeling.

It is of course important to keep in mind that nothing is definitive, and both science-based and black box-like methods can benefit from each other, and even work together. I discussed this recently in broad terms in this article:

After Beating Physics at Modeling Atoms and Molecules, Machine Learning Is Now Collaborating with…

Here, in the context of simulations, ML/AI can help already today by providing starting structural models for proteins whose structures aren't known experimentally. Think here of AlphaFold, ESMFold, and lots of other tools:

Here are all my peer-reviewed and blog articles on protein modeling, CASP, and AlphaFold 2

How Huge Protein Language Models Could Disrupt Structural Biology

Likewise, scientists are exploring ways to use ML/AI to assist molecular mechanics forcefields, which are, in very simple terms, the mathematical objects that describe the fundamental physics in a form such that the equations of motion can be propagated over time. There's no complete such model yet for proteins, but I'm following developments from close so you will hear from me as soon as the first methods come out.

It's never been a better time to do biology on your computer. The "computational microscope", as some call it but I still dub too far-fetched (see this note), could someday be a real instrument that allows us to look at (or rather "simulate realistically", to be more accurate) cells at whatever level of magnification we want, changing dynamically from global views to zooming into the most secret details happening at the atomic level.


Liked this article? Here are a few more

New deep-learned tool designs novel proteins with high accuracy

DeepMind Strikes Back, Now Tackling Quantum Mechanical Calculations

GPT-3-like models with extended training could be the future 24/7 tutors for biology students

Gato, the latest from Deepmind. Towards true AI?


www.lucianoabriata.com I write and photoshoot about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Become a Medium member to access all its stories (affiliate links of the platform for which I get small revenues without cost to you) and subscribe to get my new stories by email. To consult about small jobs, check my services page here. You can contact me here.

Tags: Biology Data Science Modeling Physics Simulation

Comment