Integrating data to fight cancer: Sharing ideas and best practice
5 December 2024
Jesse Allardice, (formerly) Department of Physics, University of Cambridge
08 July 2021
Accelerate spark data science residency
As a PhD student in Cambridge, my research focused on developing next-generation technologies for solar photovoltaics, which are vital for sustainable renewable energy. If we’re to make photovoltaics as effective as possible, we need advanced nanomaterials that optimally harvest the Sun’s electromagnetic spectrum. To design these materials, we need better understandings of the quantum dynamics and photo-physics at play.
Working in Physics, I could see lots of cool things being done with machine learning in fields such as protein folding, and I was really eager to find out more. The Accelerate-Cambridge Spark data science for science residency looked like a great opportunity to learn about machine learning in a condensed format - something I had wanted to do. I was starting the write up of my PhD so I really needed an efficient way to learn this to enable me to both understand machine learning and advance my research. The Accelerate-Spark data science for science residency was perfect for this.
I was working with solar energy materials and the tools I worked with spit out really high multidimensional datasets. Even modest experiments on our setups produces large (>20 GB) and high dimensional (64 * 64* x dimensions) datasets, with information that relates to an array of research-relevant physics. The application of advanced data analysis methods could increase productivity through faster data processing and reveal novel physics hidden in the data.
Senior researchers in the field develop a trained eye to spot these features, but it can be hard for new PhD students. It is important to be able to reduce complexity in the data so you can identify the relevant patterns.
I could see how the application of machine learning could help enable this, so during the Residency I worked on a project converting a legacy matlab tool into Python. This tool converts 256-dimensional time series and reduces it to 2-3 dimensions, meaning you can spot patterns or meaningful results much more quickly. My whole Group now uses this tool, which is similar to Principal Components Analysis, but includes physical constraints (so it reflects the dynamics of the physical systems being studied) and contextualised with domain knowledge.
The key benefit I gained from the Residency is that it accelerated my physics research by rapidly speeding up my data manipulation skills with pandas to help open up data and get into it really quickly. It also made more advanced machine learning much less scary to approach.
I wrapped up my PhD a couple of months after the Residency. It was really my experience with Accelerate that encouraged me to focus on creating a new start-up in computer vision and I then moved into a role as a research engineer; something that would not have been possible without the Programme. I was meant to be shooting lasers at silicon but ended up using Natural Language Processing – a technique that allows computers to interpret human language – to understand the structure of open banking language for a FinTech company. I also got involved with Cambridge Spark as a Tutor which helped me continue my learning.
Ultimately, all this experience helped me to convey my interest to secure a position at Apple’s first AI Residency Programme. I can definitely point to the Accelerate course and Cambridge Spark as key factors in getting the position.
Jesse Allardice (July 2021)