Software Engineering

Machine learning has become an indispensable tool for researchers across disciplines, but the software engineering knowhow required for supporting researchers is in short supply. Accelerate Science will help fill this gap.

As machine learning is adopted across disciplines, scientists are faced with new software challenges related to implementing and maintaining machine learning models. These challenges are becoming a bottleneck to progress. In response, Accelerate Science’s software engineering programme will offer tailored software support to unblock these bottleneck.

AI Clinic: Accelerate’s AI Clinic offers expert advice to Cambridge University researchers using AI in their research, helping resolve engineering issues they might encounter when implementing machine learning methods.

AI Clinic

Welcome to the Accelerate Science AI Clinic! This new initiative seeks to support Cambridge University researchers using AI in their research, by helping resolve engineering issues they might encounter when implementing machine learning methods.

Training courses and workshops

Accelerate's 1-day workshops are created by the Accelerate Programme's Machine Learning Engineers and researchers. You can sign up to attend the workshops when we run them, or work through the online material at your own pace.

Related Content

How can we … use machine learning for more accurate brain age estimation and early neurodegeneration detection?

9 June 2025

How can we … use AI to speed up coeliac disease diagnosis?

30 May 2025

Navigating distribution shifts for early cancer detection

21 May 2025

How can we … use machine learning to diagnose movement disorders earlier?

15 May 2025

Python Programming For Science

This self-learning module introduces the fundamentals of Python, some of the kinds of data it can handle, and how to store that data. Designed for researchers across disciplines, it supports learners to rapidly learn how to code in the context of working with real-world data. To access the module, register at the link.

Introducing Data Science for Research

The Accelerate Programme offers PhDs and postdocs disciplines across Cambridge University the opportunity to participate in a ‘Data for Science’ training course. This structured Accelerate-Cambridge Spark Introducing Data Science for Research will equips scientists with modern practical data analysis skills using Python in a virtual instructor-led accelerated masterclass.

Machine Learning for Science Jupyter Notebooks

This gallery brings together Jupyter notebooks produced by participants in the Data Science for Science Residency. They contain information and code about the data science techniques that participants have used in research areas that range from genetics to astronomy.

AI and Large Language Models

A collection of resources for researchers interested in using Large Language Models (LLMs) in their research.

Accelerate has released code for working with Large Language Models in research; you can find it in our large-language-models GitHub repository. This code covers a range of ways to use and tune LLMs, including calling APIs, finetuning models, and creating more complex solutions like Retrieval Augmented Generation (RAG). This code is freely available for researchers to build on in their work.

An Introduction to Diffusion Models in Generative AI

A collection of resources for researchers interested in using Diffusion Models in their research.

With the increase in AI-generated imagery using models such as Dall-E, Midjourney and Sora and research applications such as AlphaFold, there has been a surge in workflows incorporating models like Stable Diffusion. These models have potential in research applications including drug discovery, weather forecasting, synthetic speech and medical imaging.

An Introduction to Docker

A collection of resources for researchers interested in using Docker.

Writing research software in Python presents numerous challenges to reproducibility - what version of Python is being used? What about the versions of PyTorch, Scikit Learn or Numpy? Should we use Conda, or venv, or Poetry to manage dependencies and environments? How can we control randomness? Do I have the right version of Cuda Toolkit? In principle, given the same data, and same algorithms and methodology, we should be able to reproduce the results of any given experiment to within an acceptable degree of error. Dealing with the above questions introduces significant problems to reproducing experiments in machine learning. This workshop will explore the use of Docker to help alleviate almost all of these questions. Furthermore, combining Docker, git and GitHub can be a powerful workflow, helping to minimise your tech stack, and declutter your python development experience.

Publishing and Packaging Python Code for Research

A collection of resources for researchers interested in publishing and packaging python code for research.

Accelerate Science have developed a one day course for researchers interested in building knowledge of workflows and tools they can use to package and publish their code. Releasing software outputs from your research is an important step for open science and enables other researchers to utilise your code and for your work to have further impact. These materials explore the importance of sharing code in line with FAIR principles for research software. The course provides a step by step guide to publishing your code using the package Poetry and will walk though an example project.