Integrating data to fight cancer: Sharing ideas and best practice
5 December 2024
Andreas Schachner, Department of Applied Mathematics and Theoretical Physics, University of Cambridge
08 July 2021
Accelerate spark data science residency
The past century has seen the most spectacular discoveries and theoretical developments in fundamental physics. Studying how the curvature of spacetime encodes the force of gravitation has advanced our understanding of general relativity, while quantum mechanics allowed us to describe nature at the sub-atomic level. On top of that, quantum field theory emerged as the idea that our world is traversed by fields vibrating in a certain harmony, together materialising the world around us. The language of field theory has proven incredibly useful in deciphering nature to an astonishing level of precision.
In spite of these successes, many questions of fundamental physics remain unanswered. In my PhD, I am focusing on string theory and its potential implications on our understanding of nature. In basic terms, string theory proposes that the world around us is composed of vibrating strings, rather than individual particles. Drawing from the ideas of quantum field theory, mentioned above, string theory allows us to unify an infinite number of fields in an abstract, higher-dimensional space. When we try to connect string theory to our observed four-dimensional Universe (one time and three space directions), the theory becomes inherently entangled with compact geometries (e.g. circles, spheres and generalisations thereof) in the context of so-called string compactifications. These compactifications are the bridge between the high number of dimensions that string theory predicts and the four dimensions we can observe, and so sit at the heart of the physics that shapes the world around us. To study these theories and interpret the physics they produce, we already use a heavy machinery of mathematical tools.
However, there are a huge number of choices involved in the analysis of string theory, giving rise to a seemingly endless number of predictions about the interactions involved - the so-called string landscape. Analysing the full landscape would require us to generate and process datasets with up to 10^500 elements by exploring different solutions of the equations of motion. Ultimately, at least one of them should describe our real would as we know it, but no guiding principle has emerged yet. In practical terms, a complete scan is computationally infeasible. Even if it were feasible, there are computational complexities that make the analysis highly challenging.
Nonetheless, there are strategies we can use to analyse the landscape more effectively. In my PhD, for example, I’ve been able to make progress by identifying which of the aforementioned solutions can be discarded because they are physically irrelevant. Complementing these efforts, we can develop data-driven tools to facilitate our search for viable solutions of string theory. The advent of ‘Big Data’ provides a golden opportunity to shine new light on such datasets and more generally on some of the open questions in mathematical physics. It is this interplay of a triad of fields - mathematics, physics and data science - that has fascinated me for a long time.
The “Accelerate-Spark - Data for Science Residency Programme“ has given me a chance to further develop my data processing and handling skills, which are prerequisites for attacking the above tasks. Over the course of six weeks, I have benefited especially from the tutorials on data preprocessing utilising the Python package Pandas. Here, I acquired new skills in quickly manipulating and filtering data for specific tasks which has sped up significantly the computer-based aspects of my research projects. Similarly, the lectures and assignments on data visualisation using e.g. Bookeh have substantially improved the way I present my data to collaborators and at conferences. In the long run, these competencies are imperative when disseminating my research. Lastly, the interdisciplinary outline of the workshop has led to stimulating discussions among the participants. I believe that this has been a great event for early-career researchers to build a network across various disciplines.
Currently, I am working on a follow-up of one of my earlier papers on applying genetic algorithms – algorithms that solve optimisation problems using a strategy that is inspired by evolutionary biology - to the string landscape. In our latest project, we provide a code for computing general set-ups using auto-differentiation via JAX which is a Python package “for high-performance machine learning research“. Further, we contrast applications of genetic algorithms and reinforcement learning to traversing the humongous space of available solutions. This allows us to investigate correlations among the various physical solutions which can inform physicists about where to look for interesting phenomena. While general results from these investigations are still pending, the progress of this project was heavily influenced by the techniques I learnt over the course of the Residency programme. Most importantly, it provided the very foundation on which I was able to expand my expertise and to develop efficient new applications of AI to mathematical physics.
Overall, the programme has had a significant impact on my research, especially on the way I approach new projects involving data. In the future, I am continuing to promote the interdisciplinary scope of my research by highlighting the beautiful symbiosis of mathematical rigorousness, physical intuition and computer scientific methods.
Andreas Schachner (August 2021)