ACADEMIA
Institute for Advanced Study astrophysicists deploy AI to show how to 'weigh' galaxy clusters
Scholars from the Institute for Advanced Study in Princeton, New Jersey, have used a machine learning algorithm known as “symbolic regression” to generate new equations that help solve a fundamental problem in astrophysics: inferring the mass of galaxy clusters.
Galaxy clusters are the most massive objects in the Universe: a single cluster contains anything from a hundred to many thousands of galaxies, alongside collections of plasma, hot X-ray emitting gas, and dark matter. These components are held together by the cluster’s gravity. Understanding such galaxy clusters is crucial to pinning down the origin and continuing evolution of our universe.
Perhaps the most crucial quantity determining the properties of a galaxy cluster is its total mass. But measuring this quantity is difficult—galaxies cannot be “weighed” by placing them on a scale. The problem is further complicated by the fact that the dark matter that makes up much of a cluster’s mass is invisible. Instead, scientists infer the mass of a cluster from other observable quantities.
Previously, scholars considered a cluster’s mass to be roughly proportional to another, more easily measurable quantity called the “integrated electron pressure” (or the Sunyaev-Zel’dovich flux, often abbreviated to YSZ). The theoretical foundations of the Sunyaev-Zel’dovich flux were laid in the early 1970s by Rashid Sunyaev, a current Distinguished Visiting Professor in the Institute’s School of Natural Sciences, and his collaborator Yakov B. Zel’dovich.
However, the integrated electron pressure is not a reliable proxy for mass because it can behave inconsistently across different galaxy clusters. The outskirts of clusters tend to exhibit very similar YSZ, but their cores are much more variable. The YSZ/mass equivalence was problematic in that it gave equal weight to all parts of the cluster. As a result, a lot of “scatter” was observed, meaning that the error bars on the mass inferences were large.
Digvijay Wadekar, a current Member of the Institute’s School of Natural Sciences, has worked with collaborators across ten different institutions to develop an AI program to improve the understanding of the relationship between the mass and the YSZ.
Wadekar and his collaborators “fed” their AI program with state-of-the-art cosmological simulations that have been developed by groups at the Harvard & Smithsonian Center for Astrophysics, and at the Flatiron Institute's Center for Computational Astrophysics (CCA) in New York. Their program searched for and identified additional variables that might make inferring the mass from the YSZ more accurate.
AI is useful for identifying new parameter combinations that could be overlooked by human analysts. While it is easy for human analysts to identify two significant parameters in a data set, AI is better able to parse through high volumes often revealing unexpected influencing factors.
More specifically, the AI method that Wadekar and his collaborators employed is known as symbolic regression. “Right now, a lot of the machine learning community focuses on deep neural networks,” Wadekar explained. “These are very powerful but the drawback is that they are almost like a black box. We cannot understand what goes on in them. In physics, if something is giving good results, we want to know why it is doing so. Symbolic regression is beneficial because it searches a given dataset and generates simple, mathematical expressions in the form of simple equations that you can understand. It provides an easily interpretable model.”
Their symbolic regression program (called PySR) handed them a new equation, which was able to better predict the mass of the galaxy cluster by augmenting YSZ with information about the cluster’s gas concentration. Wadekar and his collaborators then worked backward from this AI-generated equation and tried to find a physical explanation for it. They realized that gas concentration is correlated with the noisy areas of clusters where mass inferences are less reliable. Their new equation, therefore, improved mass inferences by providing a way for these noisy areas of the cluster to be “down-weighted”. In a sense, the galaxy cluster can be compared to a spherical doughnut. The new equation extracts the jelly at the center of the doughnut (that introduces larger errors), and concentrates on the doughy outskirts for more reliable mass inferences.
The new equations can provide observational astronomers engaged in upcoming galaxy cluster surveys with better insights into the mass of the objects that they observe. “There are quite a few surveys targeting galaxy clusters which are planned shortly,” Wadekar stated. “Examples include the Simons Observatory (SO), the Stage 4 CMB experiment (CMB-S4), and an X-ray survey called eROSITA. The new equations can help us in maximizing the scientific return from these surveys.”
He also hopes that this publication will be just the tip of the iceberg when it comes to using symbolic regression in astrophysics. “We think that symbolic regression is highly applicable to answering many astrophysical questions,” Wadekar added. “In a lot of cases in astronomy, people make a linear fit between two parameters and ignore everything else. But nowadays, with these tools, you can go further. Symbolic regression and other artificial intelligence tools can help us go beyond existing two-parameter power laws in a variety of different ways, ranging from investigating small astrophysical systems like exoplanets to galaxy clusters, the biggest things in the universe.”